![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: capa, chess, cuz, greatest, karpov, kasparov, kramnik, lie, order, players, puters |
|
|
Thread Tools | Display Modes |
|
#221
|
|||
|
|||
|
In article . com,
help bot wrote: [...] Suppose Crafty_12 ply penalizes either move as inferior to the other -- how does this style issue contribute to ranking the world champs *accurately*? ... In which case this position is outside the [-2..2] range, and is discarded by the G&B methodology, really for exactly the reasons you gave. So Crafty12 would not penalise your move. Once again, you have demonstrated a complete, utter inability to read my comments *in context*. Bit prickly aren't we? ... Look back at my original post. I was (obviously) replying to this comment by Ray Lopez: [...] ... Indeed you were. But then you asked a very specific question about Crafty12 and the ranking of WCs, which can only ["in context"] relate to the work of Guid and Bratko that is the initial topic of this thread. I gave you the answer: G&B did consider the situation you describe, and took steps to ensure that it did not bias their results. As for the G&B methodology, it was never described in any detail in any of the articles which I read by following the links earlier in this thread. Clearly, if I had wished to skewer their "methodology", I would probably want to know what it was. But having already learned that the reason for the sloppiness was a shortage of time and a complete disregard for quality work, I have no interest in further details regarding the authors' methodology. Yet you are willing to "skewer" their methodology to the extent of "sloppiness" and "complete disregard for quality work"? Even though most, if not all, of the criticisms in this thread are addressed by the authors in a peer-reviewed paper? And if you have "no interest in further details", why did you ask about them in relation to your above question [and then take umbrage at my answer to it]? -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
| Ads |
|
#222
|
|||
|
|||
|
In article .com,
Martin Brown wrote: [...] So I'm guessing that when both sides think they are winning [by something significant, not by 20cp or so], one of them has overlooked something of tactical importance, which is why, after a bit, it turns into a tactical win. It started from the opening in this game. Shredder +0.90 Crafty -0.27 peaking at +1.40 vs -0.10 then converging a bit until the fateful 17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before again diverging to 1.24, 3.27. Ah. Perhaps I have a misunderstanding about what you did or said? I understood that you had played some fast[ish] games between Shredder and Crafty, and then passed on to us Shredder's analysis [at much longer time limits] of the game? So that, for example, the analysis would have been/looked the same even if the game had been a GM encounter at classical time limits or you vs me at 5-min chess, or any other source? So were the black scores not *Shredder's* evaluations rather than *Crafty's*? Otherwise, I find the exact agreements, eg at 1.22 and later at 2.58, for several moves very suspicious. And if so, then this is not an example of both *sides* thinking they were winning, but rather of *Shredder* thinking both sides were winning? [As you mentioned later, there are parity problems in many engines that cause this, esp in gambits, but if Shredder is particularly prone to it, it doesn't help its case to be a reliable annotator.] It was pretty clear in this game that Crafty simply did not know which way was up! Absolutely. Computers seem to be prone to that sort of game, though. Once they don't understand a position, they tend to go *really* pear-shaped. For the sake of balance in a best of 3 engine match at this time penalty was 1 win 1 draw 1 loss for each engine. Don Beal used to say that you need matches of 100+ games to find which engine is better -- he had cases where one side was losing 17-0 or thereabouts but hauled back to win [and this in the days before "learning"]. BTW is there a way to get the graphical display of time taken and engine score shown in Chessbase window or does the game have to be put into the playing window to see that info? Pass. I've never entered games with that info in the first place. Remember that is Crafty working at roughly the same search depth setting as was being used to judge the play of world champoin chess players. It may be a bit unfair to make it play the opening (where its performance is very poor). Ah. I assumed 8s/move wouldn't be enough to reach the depth used by G&B [roughly 6h/game on 2.5GHz machines] .... But that is still enough to have some confidence in finding gross evaluation errors of 50cp or more (which is what Crafty at 12 ply does). Yes, but you still seem to be missing something. 100cp is a pawn, and you can understand that very directly. 50cp is what? It will matter if at some point we swap a 50cp advantage for a pawn-up with 50cp compensation, but until then it's an arbitrary measure. I was using that as an example. Yes, but you still seem not to have understood! Look, suppose some engine gives 1.23 as its evaluation. That means that somewhere down the tree there is a position, reached by "best play" as far as the current collection of static evaluations goes, which has a static evaluation of 1.23. *That* evaluation is a sum of various factors -- +1.00 because we have an extra pawn, +0.17 because we control an open file, +0.47 because of king safety, -0.13 because the opposing knight is well-placed, +- this, that and the other, possibly including all manner of complexity and joint factors, etc. Only the extra pawn is "gold standard" currency. Everything else is there either because BobH or some other programmer has decreed that an open file is worth 0.17 or because a "learning" program has currently settled on that as the value. None of it is reliable [else we wouldn't need the tree search at all], none of it seems to matter very much [or changing the 0.17 to 0.16 would dramatically change the strength of the program], none of it relates very closely to how a GM would assess that position. The only merit of this whole scheme is that pragmatically it works. You may recall the Beal&Smith result that a completely random static evaluation works surprisingly well. So when you say that Shredder and Crafty have a discrepancy of [eg] 50cp in some position, what you mean is that Shredder has looked at many millions of lines, 99.999+% of which are utter rubbish by *any* standards, picked on one line as best for both sides, chosen the leaf position in that line, pulled a number more-or-less out of a hat for that position; that Crafty has done the same [but almost certainly chosen different lines as best in most of the positions]; and that the two numbers differ by 0.5. The miracle of computer chess is that quite often the numbers agree within 0.2 or so. But there is no useful, objective, value in those numbers. Indeed, we already know that an evaluation of 1.23 is wrong by either 1.23 [if the position is actually drawn] or by worse than that if the position is won/lost and the "machine infinity" for won positions is greater than 2.46. Go figure. Looking more carefully at that game there were long sustained periods where Crafty was more than 100cp off the mark and about 10 moves where it was more than 200cp out (and in the middlegame). This doesn't bode well for its ability to score GM level play. What matters is not that [esp as basically the difference over those moves was whether Crafty in those positions was totally and utterly lost or merely utterly and totally lost, and the G&B scheme would have stopped counting by then], but whether Crafty/Shredder mis-assesses the correct move ordering, and if so by how much. [...] If Crafty12 is so rotten, it's been amazingly lucky. I don't think it is that rotten. Just that it misses a lot of the rare but absolutely key GM moves and marks them down because it doesn't understand them. It probably gets the 95% of the routine moves exactly right, but it is the handful of other moves that make all the difference. If I am reading the annotations correctly, then in the game you gave, Shredder and Crafty each played 16 moves out of the 41 that were sub-optimal according to Shredder. That doesn't seem a handful of other moves to me, and it makes it seem unlikely that a GM would have played nearly all the moves, routine or otherwise, the same way as Crafty. [Of course, most of the 16+16 were "in the noise", but it still suggests that with a 10cp noise level, there is a lot of scope for Crafty/Shredder to make quite different moves from GMs. Indeed, G&B's Fig.7 shows most WCs playing the same move as C12 about 50% of the time.] According to the metrics line Crafty was at average search depth 12.2 and Shredder at 13.1 (but in 1/3 the time) during this test match. OK, but this is not helping your thesis! Summarising, we now have that C12 deviates by around 0.35 from Shredder, by between 0.10 and 0.15 from almost all WCs, and by between 0.06 and 0.09 from recent strong computers in matches vs humans, despite an expected 0.1 or so error from random noise. Accepting that Shredder/Rybka/ etc are technically stronger than Crafty, this nevertheless suggests that Crafty is doing very well at emulating these other players and engines, and by inference at assessing how good they are. Can we define a set of rules then [...] Something that might be interesting. There are books out there with titles like "How Good is Your Chess" [and regular articles in a number of magazines] where strong players have annotated games with point scores ["Score 7 for Nxe5, 3 for Bg5, -3 if you blundered by Re1, 1 for routine development by Nc3, 0 for anything else"]. We could set engines doing these tasks at various rates, see how they score, see how they rate the alternatives, and perhaps -- if someone would sink some time and/or money into this -- get some GMs to comment both on the original scoring and on the computer results. I still can't take centipawns seriously enough to want to invest effort into tracking down 20cp discrepancies .... -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
|
#223
|
|||
|
|||
|
On May 16, 6:59 am, (Dr A. N. Walker) wrote:
Once again, you have demonstrated a complete, utter inability to read my comments *in context*. Bit prickly aren't we? ... Nah. I just wanted to make it clear that you have *repeatedly* misinterpreted words I have written by taking them out of their proper context. Generally, this indicates a psychological "need" to distort in order to accommodate one's peculiar agenda or to evade valid criticisms and divert the discussion to some army of straw men. Look back at my original post. I was (obviously) replying to this comment by Ray Lopez: [...] ... Indeed you were. But then you asked a very specific question about Crafty12 and the ranking of WCs, which can only ["in context"] relate to the work of Guid and Bratko that is the initial topic of this thread. Not in my opinion it isn't. When I entered this thread, it consisted of a couple of links to articles which I later downloaded and read in full. Now, what you are suggesting is that those articles are not the subject of this thread, but the original paper upon which these "summaries" were based was, and that just ain't so. You might just as well argue that the thread is about my game at RedHotPawn -- because it was discussed at some later point. Look back at the links. So, why don't I go and find that paper and read it in full like the summaries? Simple: it has already been shown that a myriad of excuses substitute for any real desire for *quality* work; hence the choice of a 12 ply Crafty; and hence the moronic sample size in certain cases like, say, GM Fischer. In sum, it looks like a waste of my time (see below). I gave you the answer: G&B did consider the situation you describe, and took steps to ensure that it did not bias their results. Bully for them. Now, if only they had taken similar steps regarding adequate sample size, choosing a chess program of adequate strength, and of course, allocating sufficient time for such a task as attempting to decide which of the world champions was the best, the strongest, the greatest, or even the most accurate. The fact that we are back to discussing the "G&B" end of things once again shows that you have missed the point of what I was actually writing about; it had to do with positional moves and tactical moves allegedly being "one and the same thing", you should recall. As for the G&B methodology, it was never described in any detail in any of the articles which I read by following the links earlier in this thread. Clearly, if I had wished to skewer their "methodology", I would probably want to know what it was. But having already learned that the reason for the sloppiness was a shortage of time and a complete disregard for quality work, I have no interest in further details regarding the authors' methodology. Yet you are willing to "skewer" their methodology to the extent of "sloppiness" and "complete disregard for quality work"? Yes, I am. (As far as I can see, any bum off the street could read their paper, copy their methods, and by simply setting Crafty to *13 plys*, best their results in terms of quality). Even though most, if not all, of the criticisms in this thread are addressed by the authors in a peer-reviewed paper? Are their "peers" up to our standards here, I wonder? I mean, do they know squat about chess? Do they have the slightest inkling as to what it would take in order to *accurately* rank the world chess champions relative to one another? I seriously doubt it. I recall reading in one of the links a long list of criticisms, about half of which were unanswerable, unless you count the list of pathetic excuses given by a few apologists. I must admit, some of these had not occurred to me, but they were far from comprehensive in scope. In any case, I had more than enough of my own criticisms. And if you have "no interest in further details", why did you ask about them in relation to your above question [and then take umbrage at my answer to it]? You're not making any logical sense here; I asked nothing about their methodology; on the contrary, I already think I have seen enough excuses regarding them that I can reasonably conclude that the authors made no serious attempt to accurately rank the world champions. For instance, a *serious* attempt might start off by determining the proper sample size for such a project; this step was obviously skipped (or worse, bungled). Secondly, it is important in order to be fair, to not take any single match against any single opponent, and try to compare against someone else's results where they both won and lost, plus faced a variety of opposition. For instance, you can't fairly compare Fischer-beating-Spassky to Botvinnik-vs.-all-comers, because (gasp!) GM Fischer may have been more accurate in that single match (since he won) than *any* world champ was in any series of a win plus a loss. If you do this, you are (quite absurdly) rewarding those who, instead of letting nature take its course, bow out order to protect their record from acquiring any tarnish over time. Third, if indeed, there is a time issue resulting from the large number of games, one could arbitrarily chop GM Steinitz out of the running. How dare I suggest such a thing? Look at Dr. Elo's rating lists; while GM Steinitz was a giant figure in the history of chess, his strength was clearly superseded by others, and if the goal is to try and measure strength, accuracy, or any other such aspect of the play, then we can safely rule him out as the winner; already, such players as Paul Morphy were excluded, so why not just one more? Rather than worry ourselves about whether or not others are going to whine that they cannot duplicate our exact results, the first order of business should be to get *meaningful* results ourselves. For my money, I'll take the strongest chess program in the world and if necessary, start off by eliminating GM Steinitz and his predecessors to save time; then I want each contender to have roughly the same number of games in the test -- preferably a large enough sample so that no single game will have much of an effect on the final outcome. As others have suggested, it is best to have each match scored individually, so we can learn where the champions were at their best and at their worst. Even so, I am not entirely comfortable with the idea that even a program rated 2900+ can *accurately* rank the play of the world champions to the degree necessary. I would feel more comfortable if the program had a sizable lead over even the strongest of them, and if it were known that this lead was not entirely due to its Titanic *tactical superiority* over all humans. The thing to remember is this: the match games of the world champions are slowly increasing in number; but at the same time, computers are gaining in both speed and strength at a more rapid pace. No hurry -- do it right. For the sake of maximizing human interest in the project, you could start off with GMs Fischer and Tal and report the results as they come in. I like the idea, but this is not something one can just "whip off", like a Greco sac. -- help bot |
|
#224
|
|||
|
|||
|
On May 16, 2:50 pm, (Dr A. N. Walker) wrote:
In article .com, Martin Brown wrote: [...] So I'm guessing that when both sides think they are winning [by something significant, not by 20cp or so], one of them has overlooked something of tactical importance, It started from the opening in this game. Shredder +0.90 Crafty -0.27 peaking at +1.40 vs -0.10 then converging a bit until the fateful 17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before again diverging to 1.24, 3.27. Ah. Perhaps I have a misunderstanding about what you did or said? I understood that you had played some fast[ish] games between Shredder and Crafty, and then passed on to us Shredder's analysis [at much longer time limits] of the game? So that, for Yes. But the fastish game was calculated roughly to approximate an average 12 ply lookup for Crafty and a 12ply lookup for Shredder (I miscalculated the time penalty and Shredder got to 13+ ply at times in this game). example, the analysis would have been/looked the same even if the game had been a GM encounter at classical time limits or you vs me at 5-min chess, or any other source? So were the black scores not *Shredder's* evaluations rather than *Crafty's*? Otherwise, I find the exact agreements, eg at 1.22 and later at 2.58, for several moves very suspicious. And if so, then this is not an example of No. That is a quirk of how annotating a game by working back from the end position it can utilise deep cache move evaluations from the transposition table provided that that game follows a reasonably strong well explored line according to the engines own evaluation. Same happens in GM games since the cache often contains useful info. both *sides* thinking they were winning, but rather of *Shredder* thinking both sides were winning? No. Shredder scores it as a consistent win for white. There are some important differences between the fast score and the deeper analysis though. Here is the original game with the engines scores and thinking time. [Event "AOI, Blitz:4'+2""] [Site "East Rounton"] [Date "2007.05.14"] [Round "1"] [White "Shredder 10"] [Black "Crafty 19.01"] [Result "1-0"] [ECO "D25"] [WhiteElo "9999"] [BlackElo "9999"] [Annotator "0.30;0.36"] [PlyCount "83"] [TimeControl "240+2"] {Intel(R) Pentium(R) 4 CPU 3.00GHz 2992 MHz W=13.1 ply; 354kN/s B=12.2 ply; 835kN/s; 2 TBAs} 1. d4 {Both last book move 0.30/16 12} Nf6 {0.36/12 27} 2. Nf3 {(Bf4) 0.27/15 11} d5 {0.32/12 26} 3. c4 {(e3) 0.27/16 11} dxc4 { (e6) -0.26/12 25} 4. e3 {(Nc3) 0.34/14 20} b5 {(Bf5) -0.21/12 24} 5. a4 { 0.79/14 13} c6 {-0.27/12 24} 6. axb5 {(Be2) 0.90/13 5} cxb5 {-0.11/13 23} 7. Nc3 {0.72/14 17} Qb6 {(Bd7) -0.26/12 23} 8. b3 {(Ne5) 1.01/13 12} e6 { (b4) -0.15/11 22} 9. bxc4 {0.81/13 15} b4 {(Bb4) -0.16/12 22} 10. c5 { (Qa4+) 1.04/13 14} Qb7 {-0.22/11 21} 11. Rb1 {1.00/12 8} Nc6 {-0.06/11 21} 12. e4 {(Bc4) 1.14/12 13} a6 {(Be7) -0.14/9 20} 13. Bc4 {(Bf4) 1.40/13 22} Qc7 { -0.09/10 19} 14. Ne2 {(e5) 1.18/12 7} Nxe4 {-0.50/10 21} 15. O-O {0.98/11 5} f5 {(Bb7) -0.43/10 19} 16. Bf4 {0.67/11 11} Qd7 {0.40/11 24} 17. Bb3 { (Ne5) 1.02/10 6} Be7 {(Qb7) 0.60/11 18} 18. Ba4 {2.77/12 5} Bf6 {2.43/11 21} 19. Rxb4 {2.35/13 6} Nxb4 {2.61/12 17} 20. Ne5 {2.33/13 7} Bxe5 {2.35/14 17} 21. Bxd7+ {2.38/14 4} Bxd7 {2.22/14 5} 22. Bxe5 {2.38/15 6} O-O {2.21/13 16} 23. f3 {2.35/14 3} Nf6 {2.43/13 16} 24. Nf4 {(Qd2) 2.35/14 5} a5 {2.15/11 15} 25. Qb3 {2.48/13 5} Ra6 {(Rfe8) 2.16/11 15} 26. Kf2 {(Re1) 2.67/11 7} Nfd5 { (Kf7) 1.43/12 15} 27. Nxd5 {2.96/11 1} exd5 {1.68/12 14} 28. Ke3 { (Rc1) 2.86/12 3} Rg6 {(Bb5) 1.09/11 14} 29. Rg1 {(g3) 2.94/11 3} Bb5 { 1.11/11 14} 30. g4 {(Kd2) 2.81/11 4} Re8 {(f4+) 1.16/11 14} 31. h3 { (Kd2) 2.99/11 4} Bc4 {(fxg4) 1.24/10 13} 32. Qa4 {3.27/12 2} Nc6 {1.82/12 15} 33. gxf5 {3.60/12 2} Rxg1 {2.24/13 13} 34. Qxc6 {4.44/11 2} Re1+ { (Kf7) 2.80/12 13} 35. Kf4 {5.16/12 3} Kf7 {3.25/13 12} 36. Qc7+ {(f6) 5.65/13 2 } Re7 {4.90/14 37} 37. Qxa5 {5.93/13 2} Rc1 {(Rg1) 4.95/12 12} 38. Qd8 { 7.15/11 2} Re8 {(Bb5) 6.19/11 22} 39. Qg5 {9.31/12 2} Rxe5 {7.01/13 10} 40. dxe5 {10.19/12 3} d4 {7.94/12 37} 41. Qd8 {11.44/11 3} h6 {9.39/11 15} 42. Qd7+ {11.41/10 2} 1-0 It was pretty clear in this game that Crafty simply did not know which way was up! Absolutely. Computers seem to be prone to that sort of game, though. Once they don't understand a position, they tend to go *really* pear-shaped. Crafty allowed itself to get stiffed with a double threat knight skewered against queen, king and losing a pawn. At move 17 it should have seen that getting the queen off d7 was a priority rather than just developing the bishop. Shredder was expecting Qb7 (which is probably weaker than Qa7 or Qe7). Then it pounced. For the sake of balance in a best of 3 engine match at this time penalty was 1 win 1 draw 1 loss for each engine. Don Beal used to say that you need matches of 100+ games to find which engine is better -- he had cases where one side was losing 17-0 or thereabouts but hauled back to win [and this in the days before "learning"]. I may give it a try overnight to see. BTW is there a way to get the graphical display of time taken and engine score shown in Chessbase window or does the game have to be put into the playing window to see that info? Pass. I've never entered games with that info in the first place. I don't enter the info so much as allow blundercheck 20s to run on all my games. And for most of the GM games that I decide look interesting too - always fun to compare old human annotation in books to modern engines. Remember that is Crafty working at roughly the same search depth setting as was being used to judge the play of world champoin chess players. It may be a bit unfair to make it play the opening (where its performance is very poor). Ah. I assumed 8s/move wouldn't be enough to reach the depth used by G&B [roughly 6h/game on 2.5GHz machines] .... I don't understand why they got quite such bad performance, but I suspect the full width fixed ply search with the cache cleared between moves for reproducibility probably played a part. Complex middlegames slow down a lot on Crafty. In this game most times the engines played the move that the other engine had expected fairly often - I watched the first game in realtime (Crafty about 20s/move for 40 moves). Yes, but you still seem not to have understood! Look, suppose some engine gives 1.23 as its evaluation. That means that somewhere down the tree there is a position, reached by "best play" as far as the current collection of static evaluations goes, which has a static evaluation of 1.23. *That* evaluation is a sum of various factors -- +1.00 because we have an extra pawn, +0.17 because we control an open file, +0.47 because of king safety, -0.13 because the opposing knight is well-placed, +- this, that and the other, possibly including all manner of complexity and joint factors, etc. Only the extra pawn is "gold standard" currency. Everything else is there either because BobH or some other programmer has decreed that an open file is worth 0.17 or because a "learning" program has currently settled on that as the value. None of it is reliable [else we wouldn't need the tree I agree entirely so far. But what is interesting here is that the current generation of top programs appear to have tuned the evaluation function weights for self consistency to maximise the efficiency of alpha-beta cutoffs in the tree. none of it relates very closely to how a GM would assess that position. Indeed, Although to me Shredder feels closer to human assessment than other engines. (and it still struggles with endgame transitions like all of them) by 0.5. The miracle of computer chess is that quite often the numbers agree within 0.2 or so. But there is no useful, objective, value in In a lot of cases they agree about the best line though even if they score it differently. Looking more carefully at that game there were long sustained periods where Crafty was more than 100cp off the mark and about 10 moves where it was more than 200cp out (and in the middlegame). This doesn't bode well for its ability to score GM level play. What matters is not that [esp as basically the difference over those moves was whether Crafty in those positions was totally and utterly lost or merely utterly and totally lost, and the G&B scheme would have stopped counting by then], but whether Crafty/Shredder mis-assesses the correct move ordering, and if so by how much. The best way I can think of to test this is on the key positions where things went awry. The majority of positions where they pretty much agree on the continuation line don't provide any discrimination. [...] If Crafty12 is so rotten, it's been amazingly lucky. I don't think it is that rotten. Just that it misses a lot of the rare but absolutely key GM moves and marks them down because it doesn't understand them. It probably gets the 95% of the routine moves exactly right, but it is the handful of other moves that make all the difference. If I am reading the annotations correctly, then in the game you gave, Shredder and Crafty each played 16 moves out of the 41 that were sub-optimal according to Shredder. That doesn't seem a handful of other moves to me, and it makes it seem unlikely that a GM would have played nearly all the moves, routine or otherwise, the same way as Crafty. [Of course, most of the 16+16 were "in the noise", but it still suggests that with a 10cp noise level, there is a lot of scope for Crafty/Shredder to make quite different moves from GMs. Indeed, G&B's Fig.7 shows most WCs playing the same move as C12 about 50% of the time.] I think that may not be as remarkable as it sounds. And in essence it highlights one of the problems of having Crafty (or Fritz for that matter) scoring GM level games. It will automatically penalise anyone who raises or maintains the complexity of the postion by keeping the tension and does not swap off material when it is safe to do so. I reckon that is why it scores Capablanca and Kramnik so highly - take a look at fig 8. According to the metrics line Crafty was at average search depth 12.2 and Shredder at 13.1 (but in 1/3 the time) during this test match. OK, but this is not helping your thesis! Summarising, we now have that C12 deviates by around 0.35 from Shredder, by between 0.10 and 0.15 from almost all WCs, and by between 0.06 and 0.09 from recent strong computers in matches vs humans, despite an expected 0.1 or so error from random noise. Accepting that Shredder/Rybka/ etc are technically stronger than Crafty, this nevertheless suggests that Crafty is doing very well at emulating these other players and engines, and by inference at assessing how good they are. I agree that the numbers do not seem to add up. Can we define a set of rules then [...] Something that might be interesting. There are books out there with titles like "How Good is Your Chess" [and regular articles in a number of magazines] where strong players have annotated games with point scores ["Score 7 for Nxe5, 3 for Bg5, -3 if you blundered by Re1, 1 for routine development by Nc3, 0 for anything else"]. We could set engines doing these tasks at various rates, see how they score, see how they rate the alternatives, and perhaps -- if someone would sink some time and/or money into this -- get some GMs to comment both on the original scoring and on the computer results. OK. This sounds like an amusing idea. And not too onerous. How about GM Daniel Kings HGIYC piece from May's Chess magazine? I expect that the intricacies of the Ragozin Defence will give some engines a very serious headache. with fixed Ply 1, Ply 12 and 60s/move searches as the test conditions? I still can't take centipawns seriously enough to want to invest effort into tracking down 20cp discrepancies .... Neither can I. But I am curious to identify the types of position where choice of the right engine (or other program) is important for analysing the position correctly. Engines can have very different playing styles. Regards, Martin Brown PS Goofgle dropped it on the floor again so perhaps I will be third time lucky. |
|
#225
|
|||
|
|||
|
In article om,
help bot wrote: Once again, you have demonstrated a complete, utter inability to read my comments *in context*. Bit prickly aren't we? ... Nah. I just wanted to make it clear that you have *repeatedly* misinterpreted words I have written by taking them out of their proper context. [...] I think I'd prefer others to judge this rather than enter into a "did", "didn't", "'tis", "'tisn't" slanging match; shall we just agree to differ? Look back at my original post. I was (obviously) replying to this comment by Ray Lopez: [...] ... Indeed you were. But then you asked a very specific question about Crafty12 and the ranking of WCs, which can only ["in context"] relate to the work of Guid and Bratko that is the initial topic of this thread. Not in my opinion it isn't. When I entered this thread, it consisted of a couple of links to articles which I later downloaded and read in full. The very first "word" in this thread was one of those links -- to the Chessbase article about *the work of Guid and Bratko*; and the thread title clearly relates to that work. Has anyone mentioned Crafty12 *other than* in relation to it? Now, what you are suggesting is that those articles are not the subject of this thread, but the original paper upon which these "summaries" were based was, and that just ain't so. I made no such suggestion. But you asked a question which was not answered in the Chessbase articles, but was in the original paper. Did you want to know the answer or not? You might just as well argue that the thread is about my game at RedHotPawn -- because it was discussed at some later point. Look back at the links. If I asked a question in this thread about RHP, then you might quite reasonably think that it was sparked by your mentions in this thread. But whereas RHP has been a minor part of this thread, the work of G&B, and more specifically the use of Crafty12 in that work, has been very prominent. So, why don't I go and find that paper and read it in full like the summaries? Simple: it has already been shown that a myriad of excuses substitute for any real desire for *quality* work; hence the choice of a 12 ply Crafty; and hence the moronic sample size in certain cases like, say, GM Fischer. In sum, it looks like a waste of my time (see below). From a reading merely of the two CB articles, I would very likely agree with you [apart from the emotive words]. But the full article presents at least a somewhat different picture. That doesn't mean that you should read it; life is too short to read everything that might possibly be of interest, and the paper is not that marvellous. But nor is it total rubbish, and at least those who intend to use words like "moronic" in relation to it perhaps ought to critique what they actually did rather than the "red top" version of it. [...] The fact that we are back to discussing the "G&B" end of things once again shows that you have missed the point of what I was actually writing about; it had to do with positional moves and tactical moves allegedly being "one and the same thing", you should recall. I recall that perfectly well. But having written about that, you then asked a question about Crafty12 and the ranking of WCs. Yet you are willing to "skewer" their methodology to the extent of "sloppiness" and "complete disregard for quality work"? Yes, I am. (As far as I can see, any bum off the street could read their paper, copy their methods, and by simply setting Crafty to *13 plys*, best their results in terms of quality). So? That [mutatis mutandis] applies to a very large number, perhaps the majority, of scientific papers. We all have to take decisions about how much computer time or other resource it is worth pouring in to some experiment. Crafty13 would have occupied their roomful of computers for several months, and would *probably* not have shown anything new. If you, or anyone else, think that Rybka or some other engine [inc Crafty13] would show different results, then you have enough information to "copy their methods" and "best their results". Go ahead. My expectation is that you will get the same results, to good approximation. If so, then you will have confirmed to each other than the methodology is doing something objective, even if not what G&B claim. If not, then you can publish a paper [or at least a letter in ICGAJ] showing that G&B are wrong, and gain credit for it. Even though most, if not all, of the criticisms in this thread are addressed by the authors in a peer-reviewed paper? Are their "peers" up to our standards here, I wonder? It's a bit of a stretch to assume that they are not. ICGAJ may not be "Nature", but it's the leading journal for computer game theory, and some pretty bright people write and review for it. And if you have "no interest in further details", why did you ask about them in relation to your above question [and then take umbrage at my answer to it]? You're not making any logical sense here; I asked nothing about their methodology; Then you need to explain what your question *was* about. Are you *really* interested in Crafty12 and the ranking of WCs for any reason *other than* to discuss G&B's work? [...] For my money, I'll take the strongest chess program in the world and if necessary, start off by eliminating GM Steinitz and his predecessors to save time; [...] No-one is preventing you. -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
|
#226
|
|||
|
|||
|
"Dr A. N. Walker" wrote in message ... In article om, help bot wrote: Yes, I am. (As far as I can see, any bum off the street could read their paper, copy their methods, and by simply setting Crafty to *13 plys*, best their results in terms of quality). So? That [mutatis mutandis] applies to a very large number, perhaps the majority, of scientific papers. We all have to take decisions about how much computer time or other resource it is worth pouring in to some experiment. Crafty13 would have occupied their roomful of computers for several months, and would *probably* not have shown anything new. If you, or anyone else, think that Rybka or some other engine [inc Crafty13] would show different results, then you have enough information to "copy their methods" and "best their results". Go ahead. My expectation is that you will get the same results, to good approximation. If so, then you will have confirmed to each other than the methodology is doing something objective, even if not what G&B claim. If not, then you can publish a paper [or at least a letter in ICGAJ] showing that G&B are wrong, and gain credit for it. If you are proposing a methodology (ranking players according to move analysis), you can't simply pull an algorithm out of thin air and pretend that it means something. The burden is on the authors to *show* that it is meaningful. *They* should have done (at least partial) analyses at much deeper ply, or on weaker players (if computational time was severely limited), if they want their method to have any credibility. They should also have looked for the correspondence with this ranking method and alternate ranking methods (e.g. ELO) especially in those cases where the alternate method has a high degree of credibility (contemporary players playing actively in a pool) The two most basic questions anyone should have upon reading this work are 1. How many moves do you need to analzye? 2. How deeply do you need to analyze them? Neither are addressed by the paper in any meaningful way. There is no way that that can be characterized as anything other than a serious defect. The excuse that it might have been hard to address (which I don't believe, by the way) is no excuse at all. |
|
#227
|
|||
|
|||
|
On May 17, 4:27 pm, "David Kane" wrote:
If you are proposing a methodology (ranking players according to move analysis), you can't simply pull an algorithm out of thin air and pretend that it means something. The burden is on the authors to *show* that it is meaningful. *They* should have done (at least partial) analyses at much deeper ply, or on weaker players (if computational time was severely limited), if they want their method to have any credibility. They should also have looked for the correspondence with this ranking method and alternate ranking methods (e.g. ELO) especially in those cases where the alternate method has a high degree of credibility (contemporary players playing actively in a pool) The two most basic questions anyone should have upon reading this work are 1. How many moves do you need to analzye? 2. How deeply do you need to analyze them? Neither are addressed by the paper in any meaningful way. There is no way that that can be characterized as anything other than a serious defect. The excuse that it might have been hard to address (which I don't believe, by the way) is no excuse at all. In the articles at the links I discussed earlier, the authors said little or nothing about what constitues an adequate sample size. It seems to be a bit unfair to try and compare, head to head, the results of GM Fischer in a single, won match (he didn't win every match, you know) with, say, the varied results of someone like GM Steinitz, who kept taking on all comers until he *finally* found one he couldn't beat. In any case, my idea is that closely matching what Crafty_12_plys thinks are the optimal moves is no guarantee of quality results. I would be far more comfortable with closely matching a program whose own rating is markedly *superior* to the humans it is trying to rank. Also, the idea of a fixed ply depth is somewhat annoying, unless that number is around 20+. Believe it or not, a few of my games have seen me calculate (or plan) far beyond only 12 plys, and I fully expect the world champions to be capable of seeing almost as far. ;D --- In a recent thread (consisting of just one posting), a game between GMs Fischer (as White) and Spassky was linked to. In that game, BF started out well, gaining a Maroczy bind _style_ of position, but it soon became apparent that he was not able to figure out any active plan, despite a nice space advantage and the apparent bind. GM Spassky soon broke free from his cramped position, but at the cost of a pawn which the American eagerly gobbled. Nevertheless, GM Spassky was able to intrude into White's half of the board with Queen and, ultimately, both Rooks, and it looked like a draw by repetition was in the cards, the only question being who would be on the receiving end of a perpetual check. In the end, however, GM Spassky unwisely traded off one of his three attackers, and then let GM Fischer's pawns get down the board. Stopping these pawns got him into a temporary bind, and from there into a (just barely) lost Rook and pawn ending. To me, it looked like a bit of luck; especially in comparison to games I have seen which were won by superior strategy, not "shaking the tree" until something pops loose, and the more so since at times, it looked like GM Fischer was on the run. I wonder how a long, close game such as this would end up scoring by a chess engine. I mean, say that GM Fischer's intention was to *wait* until the inevitable ...b5, and then be in good position to commence fighting. Or say that GM Spassky's real problem was that his opponent was already winning the match, and he desperately needed to claw his way back into it by winning as Black. No chess engine would take any of this into account in scoring the moves, so what we are attempting is merely to estimate the accuracy or optimality of the moves played, while the players were engaged in a different sort of contest altogether; one where optimality was not the issue; winning was. Yet another annoying issue is the player who habitually gets into time-pressure situations, where he (and in many cases, also his opponent) will be forced to whip off several quick moves in order to make time control. Such players would likely get penalized for this style of time (mis-)management. Does this mean they aren't great chess players? How many small "errors" equate to one large one? And what if they are so small that the opponent doesn't even notice? I know of at least one game where two top GMs quickly played through an opening line but one of them got his move order mixed up, falling into a fatal trap; even so, his opponent never noticed, and just made his own reply by rote. Because of who they were, the commentators just assumed the opening moves were A-okay, but one of the spectators knew better and wrote up an article on the event, pinpointing the double-blunder. How does this score? Who decides the penalty, and is it "adjusted" if the players in question are among the favorites or the most despised? Can every conceivable possibility be considered and entered into the equation beforehand, so there will be no "tweaking" which might allow human bias to rear its ugly head? I seriously doubt it. In fact, one of the articles I read went in with the loaded question: is Gary Kasparov the greatest player of all time? One can hardly expect any sort of objectivity with an approach like that. -- help bot |
|
#228
|
|||
|
|||
|
On May 17, 1:27 pm, "David Kane" wrote:
If you are proposing a methodology (ranking players according to move analysis), you can't simply pull an algorithm out of thin air and pretend that it means something. The burden is on the authors to *show* that it is meaningful. They have you idiot. It was a peer reviewed paper. *They* should have done (at least partial) analyses at much deeper ply, or on weaker players (if computational time was severely limited), if they want their method to have any credibility. You don't understand 'normalization' do you, dimwit? Read all 220+ replies, especially mine and Dr. Walker's, and commit to memory. Then and only then post here again. I see you flunked out of school, or should have. They should also have looked for the correspondence with this ranking method and alternate ranking methods (e.g. ELO) especially in those cases where the alternate method has a high degree of credibility (contemporary players playing actively in a pool) Means nothng. And the list presented does correlate very well with ELO. Jeff Sonas' work found that Capa was #1 using ELO, and Kramnik beat Kasparov and has a high Elo. Not the brightest bulb in the room, are ya? The two most basic questions anyone should have upon reading this work are 1. How many moves do you need to analzye? 2. How deeply do you need to analyze them? Shiite for brains: 1/ one move is sufficient, but logic tells you more moves will give greater and finer "granularity". So with only one move lookahead you could only rank "patzers" (like you) from "non- patzers". With Crafty's 6+ move event horizon, you can get very good granularity. Perhaps not as good as Rybka's but very good. And, again, the principle of normalization says you do NOT need to look further ahead than the best players. Why am I wasting my time with you? Your own social worker says you're hopeless. The excuse that it might have been hard to address (which I don't believe, by the way) is no excuse at all.- You are an excuse. Quit wasting Dr. Andy's time. He is badgered as it is by the idiot Help Bot, and now you have to chime in. This is my very last post here. Sorry I even started this thread with the retarded hoi polloi of this forum. RL |
|
#229
|
|||
|
|||
|
"raylopez99" wrote in message oups.com... This is my very last post here. Sorry I even started this thread with the retarded hoi polloi of this forum. The thread is actually an interesting one (as is the original work, though flawed) and has contained a number of interesting posts. Unfortunately those haven't originated from you - you simply lack the brain power to understand the criticisms. |
|
#230
|
|||
|
|||
|
On May 18, 5:14 am, raylopez99 wrote:
On May 17, 1:27 pm, "David Kane" wrote: If you are proposing a methodology (ranking players according to move analysis), you can't simply pull an algorithm out of thin air and pretend that it means something. The burden is on the authors to *show* that it is meaningful. They have you idiot. It was a peer reviewed paper. Uh oh. It looks like someone forgot to learn the difference between ad hominem and reason. (Maybe they will let RL back into school despite this glaring mental handicap?) *They* should have done (at least partial) analyses at much deeper ply, or on weaker players (if computational time was severely limited), if they want their method to have any credibility. You don't understand 'normalization' do you, dimwit? Read all 220+ replies, especially mine and Dr. Walker's, and commit to memory. Then and only then post here again. Wow. In addition to lacking reasoning skills, this poor sap now thinks he is "in charge". LOL! I see you flunked out of school, or should have. Well, at least Fishead knows about the *existence* of schools -- that's a start. They should also have looked for the correspondence with this ranking method and alternate ranking methods (e.g. ELO) especially in those cases where the alternate method has a high degree of credibility (contemporary players playing actively in a pool) Means nothng. And the list presented does correlate very well with ELO. Nonsense. Every posting I have read here claims the same thing: that they ranked GM Capablanca above such players as GMs Lasker, Fischer, Kasparov, etc. -- all of whom were higher, not lower rated. It looks like a flaw from that particular angle. Jeff Sonas' work found that Capa was #1 using ELO, and Kramnik beat Kasparov and has a high Elo. Not the brightest bulb in the room, are ya? Look at the official ratings, Fishead; if anyone should have stood out, it was GMs like Lasker, Fischer, and Kasparov -- NOT GMs Capablanca and Kramnik. Those two may stand apart because of a stylistic issue, but not in terms of results, which are what chess ratings are based on. I think it is fairly obvious that players like GM Tal for instance, who won by playing suboptimal moves, are getting penalized for not closely approximating the computeresque style of play. This reveals a deeper issue he why rank the world champs on anything other than the "game" they were playing, which was trying to win, not to play "perfectly"? The two most basic questions anyone should have upon reading this work are 1. How many moves do you need to analzye? 2. How deeply do you need to analyze them? Shiite for brains: 1/ one move is sufficient, Uh oh. Apparently there must be a missing cap, 'cause it looks to me like most of his brains have somehow "leaked out". Either that, or he was shorted at the fish factory that made him. but logic tells you more moves will give greater and finer "granularity". Hmm. Quite an improvement here. Maybe it's just an intermittent mental short circuit? So with only one move lookahead you could only rank "patzers" (like you) from "non- patzers". I take it Fishead is assuming the program has the ability to do check-and-capture extensions, on top of the numbers he is actually discussing; it might be helpful if he were to make this point clearer. With Crafty's 6+ move event horizon, you can get very good granularity. Just how good is "very good", though? We need it to be good enough to *accurately* rank the world champions, and that is a tall order. In fact, were a human to try this, his results would be summarily dismissed as mere opinion -- at least by those who objected to the results. Perhaps not as good as Rybka's but very good. Oh, I like that. Did you see the way he struggled to keep a straight face while pretending not to know for certain that Rybka was better-equipped for this sort of thing than Crafty_12_plys? Very nice. Oh, but he seems to have overlooked the central idea: that no *evidence* was presented to show that Crafty_12_plys was good enough for the job. Damned lawyers. Always asking for stuff that doesn't even exist! And, again, the principle of normalization says you do NOT need to look further ahead than the best players. Nobody had mentioned that straw man position, until just NOW. Why am I wasting my time with you? Because you have nothing constructive to do? I think going back to school would be your best try; after all, until you learn to think more clearly, you're not going to get very far in life. Your own social worker says you're hopeless. Things could be worse; you might have told us about your probation officer or your prison's warden. Now we can relax, knowing that you are in the hands of a professional, and getting th help you so desperately need. Say hello to Skippy for us. The excuse that it might have been hard to address (which I don't believe, by the way) is no excuse at all.- You are an excuse. Quit wasting Dr. Andy's time. Hey -- maybe you could get a job protecting those posters who are unable to defend themselves by unleashing the floodgates of ad hominem? It's right up your alley, and you already have internet access. Just a thought... . He is badgered as it is by the idiot Help Bot, Wrong. An idiot is someone with an IQ in a range far beyond my mental prowess; this just goes to show that you rant and rave without first getting the relevant facts (which, of course, we already knew). and now you have to chime in. Again. He posted here before, but it must have slipped out of your mind. Try retracing your steps; the missing cap could be anywhere. If only your brain were more solid, and not quite so watery. This is my very last post here. Nobody believes you, any more than they did Sanny. The reason is obvious: you keep making pie-in-the-sky promises, but never deliver the goods. In fact, I believe the odds-makers in Vegas actually *increase* the odds of follow-up post each time you make another such promise as this. Have you considered a job as a petty politician? Sorry I even started this thread with the retarded hoi polloi of this forum. Take some consolation in the fact that you are not alone; the other fish (including koi and pollack) feel your pain. Plus, a few of them may have standards similar to your own (catfish, other bottom-dwellers). -- help bot |