![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: capa, chess, cuz, greatest, karpov, kasparov, kramnik, lie, order, players, puters |
|
|
Thread Tools | Display Modes |
|
#61
|
|||
|
|||
|
"JohnnyT" wrote in message . .. David Richerby wrote: Do you have any guess (or, shock!, data) on how often errors occur in WC games that an engine (given reasonable time) would score down by say 100cp? I will say, that in my own practical experience, running through games. That in the same, and not unusual positions, that Fritz 8,9,and 10 have evaluated positions over 100cp different than Rybka 2.3.1 And that different moves have been suggested. That alone should provide enough of a question as to the results here. The fact is that we don't know when the engines will be strong enough to represent the "truth". In theory, the engine being too strong could be a source of error in the analysis, as much as the engines being too weak could. For example, the best move leads to a win in 20 moves based on a complicated calculation that no human considers. The second best move wins more slowly but in a way that strong GMs might be able to see. Player makes the best move (for the wrong reasons) overlooking the alternate way to win. That's evidence of weaker, not stronger, play. This happens all of the time if you look at scholastic games. Crafty sees the win of a rook at 8-ply and deems it superior to winning a piece at 3-ply. But the 8-ply analysis is essentially irrelevant to the game because the kids are not able to calculate that deeply. I will say that I do not use Crafty for day-to-day analysis so I don't have an opinion other than that you need to remember in ELO that the difference between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think of it as TWICE as good. Or more likely to win MOST of the time. It is a HUGE difference. |
| Ads |
|
#62
|
|||
|
|||
|
JohnnyT wrote:
[...] you need to remember in ELO that the difference between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think of it as TWICE as good. Or more likely to win MOST of the time. It is a HUGE difference. Specifically, an Elo-rating gap of 300 points (and it's the difference that's significant so, yes, 2800 vs 3100 gives the same results as 2500 vs 2800, gives the same results as 1100-1400) corresponds to the stronger player being expected to score roughly 85%. The approximate values are tabulated below, by approximating the real data on FIDE's website[1]. Rating diff. Score ---------------------- 600 99% 500 96% 400 92% 300 85% 250 81% 200 76% 150 70% 100 64% 75 60% 50 57% 25 54% Dave. [1] http://www.fide.com/official/handbook.asp?level=B0210 Beware that the table is rather hard to read as the columns are too narrow. The expected score is to the left of the rating difference in each case. -- David Richerby Poetic Toy (TM): it's like a fun www.chiark.greenend.org.uk/~davidr/ child's toy but it's in verse! |
|
#63
|
|||
|
|||
|
I will try not to laugh too hard.
The point of this WHOLE argument was comparing WORLD championship skills throughout the ages by comparing play to Crafty. I point out that the two strongest programs can be worlds apart, even by the magic 100cp measure in the same common positions. That people on the surface get confused by the huge and substantial difference between ~3100 and the 2500 quoted for Crafty, and that it is much farther than they would imagine. And you state that in this world championship case. The case through the ages. Is that the software could be too strong, and you use scholastics to try and prove that. I just can't give it to you here. You might have an argument is some other argument with a different set of facts. But it just has nothing to say here. David Kane wrote: \ In theory, the engine being too strong could be a source of error in the analysis, as much as the engines being too weak could. For example, the best move leads to a win in 20 moves based on a complicated calculation that no human considers. The second best move wins more slowly but in a way that strong GMs might be able to see. Player makes the best move (for the wrong reasons) overlooking the alternate way to win. That's evidence of weaker, not stronger, play. This happens all of the time if you look at scholastic games. Crafty sees the win of a rook at 8-ply and deems it superior to winning a piece at 3-ply. But the 8-ply analysis is essentially irrelevant to the game because the kids are not able to calculate that deeply. \ |
|
#64
|
|||
|
|||
|
David Richerby wrote:
JohnnyT wrote: [...] you need to remember in ELO that the difference between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think of it as TWICE as good. Or more likely to win MOST of the time. It is a HUGE difference. Specifically, an Elo-rating gap of 300 points (and it's the difference that's significant so, yes, 2800 vs 3100 gives the same results as 2500 vs 2800, gives the same results as 1100-1400) corresponds to the stronger player being expected to score roughly 85%. The approximate values are tabulated below, by approximating the real data on FIDE's website[1]. Rating diff. Score ---------------------- 600 99% 500 96% 400 92% 300 85% 250 81% 200 76% 150 70% 100 64% 75 60% 50 57% 25 54% Dave. [1] http://www.fide.com/official/handbook.asp?level=B0210 Beware that the table is rather hard to read as the columns are too narrow. The expected score is to the left of the rating difference in each case. Thank you, a complete explanation like this would be a good FAQ item. (Is there a FAQ?) I think this is important when looking at things like the computer rankings so you can understand how measurably stronger than the field Rybka is. And how far behind Crafty is is. It is substantial, and gives tremendous credence to the argument that the engine is substantially too weak to answer these questions in the survey. Even if the questions are worth asking. And I was just trying to add some anecdotal evidence that Fritz and Rybka are often a 100cp apart in positions, and that value is not a significant enough measure to say that Crafty is suitable. And that indeed is even more weight that Crafty is unsuitable. |
|
#65
|
|||
|
|||
|
On Apr 30, 2:44 pm, JohnnyT wrote:
I think this is important when looking at things like the computer rankings so you can understand how measurably stronger than the field Rybka is. And how far behind Crafty is is. It is substantial, and gives tremendous credence to the argument that the engine is substantially too weak to answer these questions in the survey. Even if the questions are worth asking. And I was just trying to add some anecdotal evidence that Fritz and Rybka are often a 100cp apart in positions, and that value is not a significant enough measure to say that Crafty is suitable. And that indeed is even more weight that Crafty is unsuitable.- Hide quoted text - - Show quoted text - Once again you, and others like you, fail to understand what normalization of results mean. You do not have to find the 'best' chess program to rate human champions--as long as Crafty, a second or third or fourth best chess playing program, or a not bad chess program, scores everybody the same. That is normalization. In fact, the biggest potential problem with Crafty is that (without knowing how it works, I'm guessing) it might have a random number generator for picking the best move out of a series of candidate moves that uses a different 'seed' for the rand(), which means it might not score the identical position the same way two times in a row, since it will pick a slightly different move if the random number generating seed is different (often this seed is the system clock, or the last keyboard key the user pressed). One way to stop this in computer programming is to make sure the 'seed' never changes. Without knowing how Crafty is coded I can't tell you if this is an actual problem, but I sense intutitvely that even if such a problem exists, most of the time it won't make a big deal in the normalization since most of the time candidate moves are reasonably close to one another in efficacy. A larger question looms from this thread: have you people not learned anything after nearly a generation of computer chess? That the 'puter is never wrong? (with a few exceptions, that prove the rule) My gawd, you people act like those philosophers in the 1960s that said computers will never win in chess because a chess program cannot be stronger than the person who wrote the program. Idiotcy! My next thread will be cross-posted to alt.young-earth and alt.creationism if this ignorance keeps up. RL |
|
#66
|
|||
|
|||
|
In article om,
raylopez99 wrote: You do not have to find the 'best' chess program to rate human champions--as long as Crafty, a second or third or fourth best chess playing program, or a not bad chess program, scores everybody the same. That is normalization. Sure, you can make this evaluation. The problem is when you then judge what that evaluation means. If Crafty scores Karpov as better than Tal, does that mean Karpov was a stronger player than Tal? Absolutely not. And yet the title of this thread implies that's exactly what people are using Crafty for. The problem isn't in creating an objective measure. The problem - as with so many statistics - is that given this measure, it appears that some people in this thread have no idea what it really means. It would be interesting, for example, to take a decisive match between two players of different styles, won by the bigger risk-taker (say, Capablanca-Alekhine, or Tal-Botvinnik 1, or one of the decisive Kasparov-Karpov matches) and run them through this Crafty evaluation, and see how just those games measured it. I suspect that Crafty might judge the more conservative player (Capa, Botvinnik, Karpov) as "better" despite the fact that he lost the match (but I don't have the tools to test this out. Does anyone?) It's very easy to hypothesize a situation where Crafty gives a better score to a player who loses a game than it does to the player who wins the game. This non-trivial flaw doesn't invalidate the Crafty rankings, but it does punch a big hole in the notion that they accurately reflect who's stronger. -Ron |
|
#67
|
|||
|
|||
|
I am leaving all that below. And uhm...
You do realize that what makes a strong program is that the moves they make are "different". It is precisely the quality of that difference that shows the strength. That the fact that Rybka has some sort of score that has some ridiculously high elo vs crafty, that it plays many many things differently. But seriously, let that set in. THE MOVES ARE DIFFERENT in the same position. Not a little, or rarely, but often, and results wise, more correctly. So if you ask questions based on the move, you are asking questions of an engine that while good, is not world champion class. Of course they will be wrong. And it is amazing how many times you find different moves. Wild Blunders I suppose can be measured. But there is enough question because of the tool, and it is easy to demonstrate those differences, that it calls this into question. Simply they had another option, if they wanted to change the tool. They could have done the same thing, but instead of manipulating crafty, they could have manipulated say Arena or Winboard. Then the questions could have been asked to Crafty, Fruit, Toga, Shredder, Rybka, and other VERY strong UCI engines. The Differences between the engines in style and strength and the world champions would have made for a much more interesting set of answers, and would have killed this argument before it started. But from the last paragraph, maybe you were joking about the whole thing here, and you are jesting in wild agreement. raylopez99 wrote: Once again you, and others like you, fail to understand what normalization of results mean. You do not have to find the 'best' chess program to rate human champions--as long as Crafty, a second or third or fourth best chess playing program, or a not bad chess program, scores everybody the same. That is normalization. In fact, the biggest potential problem with Crafty is that (without knowing how it works, I'm guessing) it might have a random number generator for picking the best move out of a series of candidate moves that uses a different 'seed' for the rand(), which means it might not score the identical position the same way two times in a row, since it will pick a slightly different move if the random number generating seed is different (often this seed is the system clock, or the last keyboard key the user pressed). One way to stop this in computer programming is to make sure the 'seed' never changes. Without knowing how Crafty is coded I can't tell you if this is an actual problem, but I sense intutitvely that even if such a problem exists, most of the time it won't make a big deal in the normalization since most of the time candidate moves are reasonably close to one another in efficacy. A larger question looms from this thread: have you people not learned anything after nearly a generation of computer chess? That the 'puter is never wrong? (with a few exceptions, that prove the rule) My gawd, you people act like those philosophers in the 1960s that said computers will never win in chess because a chess program cannot be stronger than the person who wrote the program. Idiotcy! My next thread will be cross-posted to alt.young-earth and alt.creationism if this ignorance keeps up. RL |
|
#68
|
|||
|
|||
|
"raylopez99" wrote in message ps.com... On Apr 30, 2:44 pm, JohnnyT wrote: I think this is important when looking at things like the computer rankings so you can understand how measurably stronger than the field Rybka is. And how far behind Crafty is is. It is substantial, and gives tremendous credence to the argument that the engine is substantially too weak to answer these questions in the survey. Even if the questions are worth asking. And I was just trying to add some anecdotal evidence that Fritz and Rybka are often a 100cp apart in positions, and that value is not a significant enough measure to say that Crafty is suitable. And that indeed is even more weight that Crafty is unsuitable.- Hide quoted text - - Show quoted text - Once again you, and others like you, fail to understand what normalization of results mean. You do not have to find the 'best' chess program to rate human champions--as long as Crafty, a second or third or fourth best chess playing program, or a not bad chess program, scores everybody the same. That is normalization. Perhaps you just don't understand what we are getting at... In this study they crippled crafty down to 12 ply. Now What we are trying to get at is that many (if not all of them) of the world champions saw well beyond 12 ply. Crafty would mark these moves as Errors, where a stronger program would note that they were not. This is due to the playing styles of each of the world champions. Tal and Alekhine were calculating machines and would often complicate a position... The crippled crafty would simply mark these as errors. It doesn't surprise me that Capablanca came out as the best in this study because he would often gain a slight edge, simplifying and using his fantastic endgame skills to win. Even a crippled crafty would not mark these as errors. A larger question looms from this thread: have you people not learned anything after nearly a generation of computer chess? That the 'puter is never wrong? lol but different programs are definately wrong. I have used several programs to examine my games from Chessmaster 10k, Fritz 7, Fritz 8 and Rybka. Its amazing to see the difference in what each program believes is 'right'. Guess which one it closest to chosing the 'right' answer? I put my money on the strongest program. |
|
#69
|
|||
|
|||
|
On Apr 30, 12:10 pm, raylopez99 wrote:
Not true at all. Crafty could easily tell you which programs far stronger than itself played the most perfect chess. This is not debatable. Not only is it debatable, it's not true. No it is true. No, it's not. Would you like to debate the point? For instance, the winning program between two chess programs playing each other by definition will produce at least one less error than the losing program--and Crafty could, at some point, appreciate this. Er, how? If Crafty is less able than the losing program, how can it reliably see the error the losing program couldn't? Easy. The evaluation function of Crafty will indicate that the losing program, which we've said is much stronger than Crafty, scored, over the length of the game, worse than the winning program. So you say. How about some hard evidence? To give a simple example: two programs, A and B, both much stronger than Crafty, play a slugfest game that extends over 100 moves. Play is evenly matched, and Crafty scores both programs about the same up to this point. Perhaps; perhaps not. However, at the 101st move, program A sees a winning 10 move combination--that happens to be a mating net-- that is just outside the 8 move horizon of program B. Hold on there! If the game was a tactical slugfest, as you said, then how on earth did the dumb program ever manage to hold its own against the deeper-sighted one for 100 moves? This seems rather unlikely. Program A enters into the combination and after say the 5th move, Crafty, with a mere five move chess horizon, also "sees" the winning combination. Unless the game is being scored backwards, from end to beginning, this means that Crafty would have penalized the winning program *five times* for a move which won perforce! Until it "sees" the mate, none of the moves of the combination make any sense to a patzer. Of course program B also has seen this combination wins after the second move but let's say is programmed with a contempt factor not to resign but to play to the end. Things are getting uglier all the time. Now, not only is the dumb program so lucky as to somehow survive a tactical slugfest for 100 moves, but in addition, it did so despite the handicap of a contempt factor which of course distorts its meager vision. How likely is that? Program A checkmates program B after the 10 move combination. This statement is the only part of your example so far which makes any rational sense. Crafty will reward Program A and penalize Program B for this play, even though it is much weaker than either program A or B. Whoopie. So it got lucky at the very end. Instead of rationalizing or "justifying" the use of a weak program like crippled-Crafty to judge the quality of play of the world champions, why not simply admit that it was quite unnecessary in view of the fact that there now exists a far superior program, which is widely available. In order to do this sort of thing with most players, just use any modern computer and any strong program. But in order to do it with the world championships, get a FAST computer and the TOP program, put lots of memory in the computer and give it lots of time to think. So simple! -- help bot |
|
#70
|
|||
|
|||
|
On Apr 30, 1:33 pm, "David Kane" wrote:
That alone should provide enough of a question as to the results here. The fact is that we don't know when the engines will be strong enough to represent the "truth". Sure we do. It will happen gradually, as the endgame table bases grow to include, first, all of the end game, and later, the late middle game, and so forth. In theory, the engine being too strong could be a source of error in the analysis, as much as the engines being too weak could. For example, the best move leads to a win in 20 moves based on a complicated calculation that no human considers. The second best move wins more slowly but in a way that strong GMs might be able to see. Player makes the best move (for the wrong reasons) overlooking the alternate way to win. That's evidence of weaker, not stronger, play. Only if you dump understanding/motive into the formula. As I see it, the way things were done is that every game was judged, move by move -- not plan by plan. The whole point was to be as objective as possible. This happens all of the time if you look at scholastic games. Crafty sees the win of a rook at 8-ply and deems it superior to winning a piece at 3-ply. But the 8-ply analysis is essentially irrelevant to the game because the kids are not able to calculate that deeply. But you can't determine which is the stronger player by adjusting to their weaknesses. You must remain objective, unbiased. (This seems to be why game results are used, rather than any voting on the quality of play). No matter how weak or how strong, we ought to take the results straight, with no sugar-coating. If we wish to do a purely subjective analysis, that is another matter. I will say that I do not use Crafty for day-to-day analysis so I don't have an opinion other than that you need to remember in ELO that the difference between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think of it as TWICE as good. Or more likely to win MOST of the time. It is a HUGE difference. Yeah, yeah -- that's what they WANT us to believe! But we all know that in that game where world champion Kramnik allowed mate-in-one on himself, not one of us would have been so daft. (Don't take my word for it -- go to GetClub and look at my games. Not ONE overlooked mate on the move.) And in the match where Deeper Blue defeated GM Kasparov, which ought to have put it in the vicinity of almost 3100, it still made daft errors, now and then. One game saw the computer recklessly leaving its King wide open to a perp. while winning, and another showed the notorious horizon-effect resulting in the simple giveaway of a free pawn (and with it, the game). IMO, in order to more accurately visualize what we think of as perfect chess, we need to set the bar well above the 3100 mark -- perhaps 4 or 5 thousand will do *for now*. And in terms of ratings, the difference between 2800 and 2500 is 300 points -- precisely the same as between 1800 and 1500. The real difference here is not in the vastness of the gap, but in the difficulty of getting from point A (2500) to point B (2800). It's a bit like climbing Mt. Everest, whereas going from 1500 to 1800 is more like climbing a tree and then jumping over to the rooftop while barefoot. IMO, the authors I saw unwisely sacrificed quality of analysis for the sake of repeatability, which merits the term pseudo-science. -- help bot |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | February 19th 06 05:44 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | rec.games.chess.misc (Chess General) | 0 | January 7th 06 01:24 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | January 7th 06 01:22 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | December 29th 05 07:04 PM |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | October 19th 05 05:37 AM |