![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: capa, chess, cuz, greatest, karpov, kasparov, kramnik, lie, order, players, puters |
|
|
Thread Tools | Display Modes |
|
#201
|
|||
|
|||
|
On May 10, 3:36 pm, raylopez99 wrote:
On May 10, 1:22 am, Martin Brown wrote: Well if it has to be open source then Fruit 2.1 (~2780) might be another alternative to try against Crafty (~2670). An extra 100 points and a bit less materiallistic evaluation would be closer to human GM level play. Fruit 2.2.1 just about stumbled onto that tricky line that Phil Innes sets so much stall by engines not finding before I pulled the plug. Lots of commercial chess programs will have a database of "tricky" positions with "model" answers, just to fool people into rating them higher. Tricks of the trade. Don't try to teach me about computer chess. You clearly haven't a clue what you are talking about. They are optimised to pass certain well known tests but that was part of my motivation to start a new thread asking for "interesting" new positions where engines score things radically differently. A situation that persists even with the very top engines run for times which would in principle make them beyond superGM. Fruit just scraped this test after about half an hour. It looked like it was stuck in the obvious rut for a long while. So in other words you would be happy to see different results if we ran the experiment again next year, Exact reproducibility probably isn't so important here. Getting the maximum accuracy of the move evaluation function for the limited amount of time available is the key. Fixed depth does not do that. I disagree. Normalization, see one of my posts in this 200 post thread. You cannot "normalise" base metal into gold. Although you do seem to believe that if you repeat a lie often enough it becomes true. Crafty is a rather conservative chess engine. It has a *very* good quiessence search at terminal nodes, but a relatively poor search extension strategy. As a result it tends to get set in its ways and miss important much stronger lines having settled into a comfortable path that looks superficially OK. It leads to a false sense of security in that the evaluations with increasing search depth remain too stable (ie it doesn't learn much new with each successive ply). Even the very best engines cannot agree to within 50cp on some key GM positions after 2 days and at ply 22+. In fact on the Kasparov-Anand Riga 1995 Shredder now thinks that only 11. ... Kf8 (0.05/23) is playable Whereas Rybka reckons that most of the obvious continuation lines are playable but prefers one of: 11. .... g6 (-0.08) or 11. .... OO (-0.07/23) or 11. .... Kf8 (-0.03/22) AFAIK Kf8 is a novelty that has not been played in top level games. It doesn't look pretty and the engines only see it at deep ply levels. I would not be at all surprised if a few of the 70-80 centipawn "blunders" turned out well at greater depth and a few non-blunders turned out to be dubious. Swings, roundabouts. BICBW. I don't think they are swings and roundabouts though. GM level games are littered with precisely the sort of positions that chess engines find really difficult to score accurately. And they usually occur at pivotal moments. A pivotal moment is immaterial if you use normalization. As I Like hell it is. The GM gets penalised by |Machine_best_move - GM_move| for every move where the computer fails to understand what he is doing and when he wins the exchange by playing the move Crafty when finally sees it he gets nothing. They filter out (correctly) all moves evaluated outside the range [ -2.0 , 2.0 ] to avoid penalising the winning GM for playing for a safe win rather than a risky optimal move or the loser for playing fast and lose bluff moves. Scoring GMs with Crafty penalises anyone who doesn't play *exactly* like Crafty with a fixed 12 ply search strategy. No amount of "normalisation" bull**** will get you out of this hole. explained in a post in this thread, the fact that a player enters a 60 move mating net set by his opponent, unseen by Crafty with a 14 ply move horizon, is immaterial since at some point Crafty will see the mating net (namely, 7 moves before checkmate) and rate the losing player lower than the winning player. No it won't their scoring system was entirely based on evaluating the difference between the move played and the machines idea of the best move. And restricted to move 12 with evaluation bounded in -2.0, 2.0. OK, perhaps the implication is that they should have stopped there and then. But if historical Elo ratings are of interest, then I see no reason why another objective measure of *something* need not be. They *do* have an objective measure. It *does* seem that their results correlate well with *some* quality that we can recognise in the play of Capablanca, Petrosian, Tal, etc. Their methodology is at least interesting, even if flawed. Agreed. The experiment is worth repeating with a much stronger engine. Yes, agreed. As I posted 47.5 posts ago, for very close, nearly tied rankings, the stronger chess program might make a difference. But for clear demarcation breakpoints, such as between Capa and Kramnik versus Karpov and Kasparov, a stronger chess engine doesn't matter. I reckon there were a fair proportion of Karpov & Kasparovs moves that Cratfy didn't even begin to understand and it marked them down for it. The blunder rates are much more believable (although some of them will be wrong). And, chess being 99% tactics (say many GMs, including Tarrach or Teichman), the player with the lowest blunder rate is often the best champion. Blunders = Function(overall strength). In fact, a study You keep on repeating this lie. It isn't true and it never will be. It is precisely because of the importance of long range strategic planning that machine chess isn't massively stronger than it is. Tactics are a necessary part of strong chess but they are not sufficient on their own. from a few years ago found that the difference in most moves between a patzer and a GM was not so much in the unexpected move made, but rather in the fact GMs blundered far less than a Class C player. What a surprise - who would ever have guessed that? Regards, Martin Brown |
| Ads |
|
#202
|
|||
|
|||
|
On May 10, 4:23 pm, "David Kane" wrote:
"Martin Brown" wrote in message oups.com... Agreed. But equally when the experiment has a systematic error due to using a relatively shallow fixed depth (but reproducible) searching to score the moves played it doesn't take much intuition to conclude that an engine that cannot annotate club level games accurately at that level is completely out of its depth on superGMs. I'd wager that this method would give generally meaningful results for club players, *despite* the fact that it will inaccurately analyze certain positions. It's a pity that the authors did not apply the method to the games of players with different ELO. That's I agree entirely. Crafty would have enough headroom over most club players that the relatively small errors it made would not matter compared to the fairly gross blunders that determine the outcome of games at our level. I do not believe this is true when it tries to rate world class players who are intrinsically much stronger and have significantly better stategic intuition and positional understanding than the engine. an easy and obvious extension that would have gone a long way to validating the worth of the method. The argument that the method is refuted by finding one position that the computer analyzes incorrectly is false. There are analogous I am not saying that. Although it is more fun to study interesting key positions in top level GM games with deep engine analysis than to focus on the mundane obvious wood pushing moves that no GM will ever get wrong. issues in ELO rating: which games should be rated, and what is the significance of each game. If you look back to the start of this thread I originally said I thought the engine probably had managed something like an accurate assessment. That was before I read the bit in the original paper that said it was hobbled to 12 ply fixed search. I still think that is true as far as blunder rate is concerned, but not so for accuarcy in non- blunder play. I revised my view after experimenting with Crafty at 12ply annotating a few of my already Shredder10 (30s/move)annotated games. The experiment is not difficult to do. Perhaps you will see a different result? Regards, Martin Brown PS Aplogies if this is posted twice, but from here it looks like Google dropped it on the floor again. |
|
#203
|
|||
|
|||
|
On May 10, 5:03 pm, Ron wrote:
In article .com, raylopez99 wrote: Yes, agreed. As I posted 47.5 posts ago, for very close, nearly tied rankings, the stronger chess program might make a difference. But for clear demarcation breakpoints, such as between Capa and Kramnik versus Karpov and Kasparov, a stronger chess engine doesn't matter. It appears, RayLopez, that you missed an earlier post of mine which had two questions related to this very point. Since I'm sure it was an innocent omission - it's easy to miss a single post in a long thread, I'll repeat the questions here. 1) Would you feel equally confident if we only gave crafty 11 ply? 10? 8? 4? Where do you draw the line? What non-arbitrary criteria are you using to suggest that 12-ply is meaningful whereas 3 ply, obviously, would not be? 2) What objective criteria are you using to define "extremely close" such that you don't trust the computer's ability to rank players properly? I'm very curious to hear your answers to these questions. -Ron Ron, Don't confuse the PSEUDO-chess scientists and programmers answers on this thread with REAL answers. Keep in mind I program as a hobby, have an IQ of over 140, and am a successful and quite wealthy businessman. My opponents *think* they have something to offer, but they don't realize that AI (Artificial Intelligence) research has largely abandoned chess as the experimental "fruit fly" of AI, roughly 15 years ago. Bridge and GO are the hot areas where AI is being now applied, not to mention the quest to build a true Turing machine that passes the Turning Test. Another point: my opponents *think* they know the answer, but what is their basis? Little better than a guess. In fact, little better than my guess. But at least I base my guess/ hypothesis on having studied chess and chess programmers as far back as 1990. I used to subscribe to Ply mag, published by an outfit in Canada (some university up there), and have read articles and papers on how real chess programming works. My opponents are still upset Garry lost to Deep Blue 2, and are 'fighting for the human race' or some such nonsense. Now to get to the point of your questions: I don't know. My intuition, like Bot states, says that ply will not matter unless players are "close", and from a visual inspection of the ratings in the summary of the original article that started this thread shows, "close" is between Capa and Kramnik, Karpov and Kasparov, and then the "third tier". But more plies might not make a difference (that is, won't change the relative rating) between say Capa and Kasparov, or anybody in the third tier vs. Karpov, etc. In truth, nobody in this thread really knows, and indeed further research is needed. But the burden of persuasion is on Camp #1 to make their case--that so called "positional sacrifice" positions are rather common in a game of chess and that chess is NOT largely tactics (these are the assumptions behind their claims--I claim the contrary). History has shown otherwise. Indeed, on the last point, Kramnik missed a mate in one last year. Chess is largely tactics, and that's why it is fair to have a chess engine rate the champions. You can make 30 brilliant "deep" positional moves in chess, have a clearly winning position, and still lose a chess game in a mate in one. That is chess. A PC would score you poorly in such a game, even though you were "brilliant" up until your blunder (and perhaps unappreciated by the PC, though I have argued in this thread that PCs are in fact not so bad at rating positions that require positional moves, even exchange sacs). In fact, Camp #1's arguments are better if we were trying to rate "correspondence chess" champions rather than OTB champions, since in correspondence chess tactics are much less important than deep positional moves. But that was not the inquiry of the original article ranking of champions: it was for OTB world championship play. However, that said, I would not be surprised that even for correspondence chess players, rating such players with Fritz 5.31 at 5 seconds a move would give you a pretty clear indication of the best correspondence chess players, since good positional moves and good tactical moves are largely one and the same in chess (again, this goes to chess being 99% tactics). RL |
|
#204
|
|||
|
|||
|
"Martin Brown" wrote in message ups.com... On May 10, 4:23 pm, "David Kane" wrote: "Martin Brown" wrote in message oups.com... Agreed. But equally when the experiment has a systematic error due to using a relatively shallow fixed depth (but reproducible) searching to score the moves played it doesn't take much intuition to conclude that an engine that cannot annotate club level games accurately at that level is completely out of its depth on superGMs. I'd wager that this method would give generally meaningful results for club players, *despite* the fact that it will inaccurately analyze certain positions. It's a pity that the authors did not apply the method to the games of players with different ELO. That's I agree entirely. Crafty would have enough headroom over most club players that the relatively small errors it made would not matter compared to the fairly gross blunders that determine the outcome of games at our level. I do not believe this is true when it tries to rate world class players who are intrinsically much stronger and have significantly better stategic intuition and positional understanding than the engine. an easy and obvious extension that would have gone a long way to validating the worth of the method. The argument that the method is refuted by finding one position that the computer analyzes incorrectly is false. There are analogous I am not saying that. Although it is more fun to study interesting key positions in top level GM games with deep engine analysis than to focus on the mundane obvious wood pushing moves that no GM will ever get wrong. issues in ELO rating: which games should be rated, and what is the significance of each game. If you look back to the start of this thread I originally said I thought the engine probably had managed something like an accurate assessment. That was before I read the bit in the original paper that said it was hobbled to 12 ply fixed search. I still think that is true as far as blunder rate is concerned, but not so for accuarcy in non- blunder play. The proof of the pudding is in the eating. A claim that an analytical method is meaningful must be supported with evidence, and that is true whether you are talking about "average error analyzed by 12 ply Crafty" or some sophisticated calculation based on 20-ply Hydra analyses. The paper lacks any supporting evidence and therefore its conclusions are dubious. However, I consider the method highly interesting and worthy of discussion. I revised my view after experimenting with Crafty at 12ply annotating a few of my already Shredder10 (30s/move)annotated games. The experiment is not difficult to do. Perhaps you will see a different result? Regards, Martin Brown PS Aplogies if this is posted twice, but from here it looks like Google dropped it on the floor again. |
|
#205
|
|||
|
|||
|
In article .com,
Martin Brown wrote: [... I]f you have 36 computers and a spare month available, feel free. OK. But without doing that for the moment. What settings do you use to analyse annotate your own games? I don't. I enter them via ChessBase with Fritz running. In positional terms, I trust my own judgement more than Fritz, so I'm really using the computer only for blunder-checking. If Fritz doesn't see anything before I move on, tough. [Of course, it is doing this not only for the moves actually played, but also for my own annotations, plus any off-the-wall ideas I feel like investigating, so it is not expected to spot things "early".] So it usually gets a few seconds for "routine" moves, much longer for positions that seem "interesting" [either directly to me, or because Fritz seems to be finding something]. I would be prepared to bet it is nothing like as shallow as 12 ply fixed + quiessence. You might lose your bet, or at least part of it. It takes Fritz a reasonable time to get past 12 ply [of course, that's usually something like "12/27"] in the middle-game, and I very rarely wait for it to reach a depth that is "nothing like as shallow". The ending is different, of course. [The G&B experiment:] It will penalise GMs that have formed plans extending beyond 12 ply if there is no obvious gain made inside its quiessence horizon. And it hardly ever sees material sacrifices for gains in positional advantage or tempo. I have rarely used Crafty. But Fritz usually at least sees some compensation -- eg you sacrifice a pawn and see a 0.6 drop in the evaluation, even if Fritz has no idea of the true worth of the sacrifice. The experience I *did* have with Crafty, some years ago, was that it seemed to produce better evaluations than Fritz, but it was less tactically aware, so it was much less use *to me* [as well as weaker in the Elo sense], paradoxically despite perhaps being a better match to actual IM/GM play. But computer chess has moved on a long way since then. There is also, of course, Bronstein's dictum -- "Against computer, is advantage to be pawn down" [as he played a gambit against MChess]. His point was that the computer completely mis- understood his play, expecting him to be trying to regain the pawn, and thereby not seeing his steadily increasing advantage in other aspects of the position. [...] GM level games are littered with precisely the sort of positions that chess engines find really difficult to score accurately. And they usually occur at pivotal moments. This is true. But -- until someone runs the experiment -- this does not necessarily mean that Crafty-12 makes a worse pig's ear of this than a much stronger engine. What matters to the experiment is not whether Crafty's evaluation of the position is the same as the GM's or is better/worse that [eg] Rybka's. We are accumulating the difference between Crafty's [or Rybka's] score for its own and for the GM's move. If, for example, Crafty completely misunderstands a pawn sacrifice, then there is a 1-pawn "mistake" in Crafty's assessment of [eg] Spassky's play. If Spassky does this every other game [he surely doesn't do it more than that!], that's a 0.013 or so systematic error in Spassky's results. That could take him above Kasparov and Karpov in the rankings, but gets him nowhere near Kramnik and Capablanca [who are 0.03 ahead]; on the other hand, K&K have their own share of "mysterious" pawn sacrifices, so quite probably Spassky would stay below them. Suppose also that Crafty has rather "static" positional evaluations; in that case, it may well be that Crafty sees much less difference between its own preference and Spassky's in most relatively quiet positions than perhaps it should, or than Rybka does. Crafty may in that case be misjudging Spassky's moves, and his positions, but not in a way that makes his play seem bad; whereas Rybka may be seeing and "understanding" more, but be penalising Spassky much more for any discrepancies [which may or may not be "real"]. It's not easy. We [someone!] should run the experiment before jumping to conclusions. This may be a computer-chess version of the fact that it is not always the best practitioners who make the best teachers [or examiners]. [...] I think it mostly has found the players with the lowest blunder rate fairly convincingly. Yep. That's why my overall view is that their results are probably not too far out, despite the obvious problems with the methodology. If you were asked to rank the WCs in order of the accuracy -- not necessarily the quality or success -- of their play in WC matches, then who would argue with Capablanca and Kramnik at the top, Karpov and Kasparov next, then very little difference down to Smyslov, with Tal, Euwe and Steinitz somewhat worse? The only surprise is perhaps iron man Botvinnik below Tal; but MMB lost three WC matches, so perhaps we're not seeing him at his best. If Crafty-12 is too "stupid" to have reached this conclusion in a rational way, then it's been very lucky [or else the chess world at large is equally stupid]. -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
|
#206
|
|||
|
|||
|
Dr A. N. Walker wrote:
I don't. I enter them via ChessBase with Fritz running. In positional terms, I trust my own judgement more than Fritz, so I'm really using the computer only for blunder-checking. If Fritz doesn't see anything before I move on, tough. [Of course, it is doing this not only for the moves actually played, but also for my own annotations, plus any off-the-wall ideas I feel like investigating, so it is not expected to spot things "early".] So it usually gets a few seconds for "routine" moves, much longer for positions that seem "interesting" [either directly to me, or because Fritz seems to be finding something]. I do something very similar, and it in general works pretty well for me. However, I use the add kibitzer command and have Rybka running as well as Fritz. Though I used to run Toga as my second engine until I broke down and purchased Rybka as well. (For me, I didn't want to use a Chessbase product for this purpose. But I have nothing but prejudice for that decision, it may make as much or more sense to use something like junior, shredder or zap! for this). This adds one more interest point. When the two engines diverge dramatically. Or I disagree, or whatever. Many times these extra points of understanding, and extra viewpoints will more than make up in extra knowledge than in any noise that is created. The whole Kibitzer feature is one of my favorite features of the Chessbase family. |
|
#207
|
|||
|
|||
|
On May 11, 10:58 am, (Dr A. N. Walker) wrote:
[...] I think it mostly has found the players with the lowest blunder rate fairly convincingly. Yep. That's why my overall view is that their results are probably not too far out, despite the obvious problems with the methodology. If you were asked to rank the WCs in order of the accuracy -- not necessarily the quality or success -- of their play in WC matches, then who would argue with Capablanca and Kramnik at the top, Karpov and Kasparov next, then very little difference down to Smyslov, with Tal, Euwe and Steinitz somewhat worse? The only surprise is perhaps iron man Botvinnik below Tal; but MMB lost three WC matches, so perhaps we're not seeing him at his best. If Crafty-12 is too "stupid" to have reached this conclusion in a rational way, then it's been very lucky [or else the chess world at large is equally stupid]. -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. I fully adopt Andy Walker's opinion here as my own. This will be my last post in this thread, unless Camp #1 provokes me. RL |
|
#208
|
|||
|
|||
|
Dr A. N. Walker wrote:
If you were asked to rank the WCs in order of the accuracy Wow, that word. That is the key of the whole thing. Lack of blunder is by a long way, in my mind, and I think in many's, a long long way from the word "accuracy". And some of the questions had to do with #1 move correlation. Which again raises the question of "accuracy". And not of blunders. I think that Crafty-12 as an arbiter of accuracy, is a pretty tough row to hoe. |
|
#209
|
|||
|
|||
|
On May 11, 6:58 pm, (Dr A. N. Walker) wrote:
In article .com, Martin Brown wrote: OK. But without doing that for the moment. What settings do you use to analyse annotate your own games? I don't. I enter them via ChessBase with Fritz running. In positional terms, I trust my own judgement more than Fritz, so I'm really using the computer only for blunder-checking. If In that case it is certainly worth downloading and running something like Fruit2.2.1 (evaluation free for 14days) as a kibitzer to see the sort of things that you are missing. If you only buy one new chess engine a year I would still recommend Shredder10 (or 11 if it comes out soon) - the ultra compact and fast ram based endgame tablebases for 34 and 345 pieces make it well worth having. Fritz doesn't see anything before I move on, tough. [Of course, Fritz does miss some important tactical motifs - especially at 12 ply. If you have the entire game entered then using blundercheck inside the chess program GUI takes only about 10s per move to reach 12 ply if you play reasonably accurately. It stalls each time you deviate and the cache ceases to be useful. Roughly Crafty19.19 takes 1-3mins to reach 12ply in this mode but in 60s Shredder10 typically reaches 15-16ply in all but the most complex positions. I guess my way of doing it comes from the fact I have muddled along without a proper database for a long time and have still not adjusted to using Chessbase for manipulating my own games. I still haven't found where the blundercheck button is hidden in Chessbase - its not on the tools menu that I can see. because Fritz seems to be finding something]. Worth running another engine alongside it for a while. I find Fritz blundercheck a bit dull YMMV. I would be prepared to bet it is nothing like as shallow as 12 ply fixed + quiessence. You might lose your bet, or at least part of it. It takes Fritz a reasonable time to get past 12 ply [of course, that's usually something like "12/27"] in the middle-game, and I very rarely wait for it to reach a depth that is "nothing like as shallow". The ending is different, of course. You should definitely try one of the other engines. And/or take half a dozen games and annotate them with blundercheck set to something like 30s/move with one of Fruit/Shredder/Rybka. [The G&B experiment:] It will penalise GMs that have formed plans extending beyond 12 ply if there is no obvious gain made inside its quiessence horizon. And it hardly ever sees material sacrifices for gains in positional advantage or tempo. I have rarely used Crafty. But Fritz usually at least sees some compensation -- eg you sacrifice a pawn and see a 0.6 drop in the evaluation, even if Fritz has no idea of the true worth of the sacrifice. The experience I *did* have with Crafty, some years ago, was that it seemed to produce better evaluations than Fritz, but it I have run a few tests on in this case randomly chosen matches with somewhat interesting results. Sort of what I expected but with a few surprises thrown in as well. AFAIK Neither of these games are known engine traps. The first was precomputer chess very short 25 move minature Boris Spassky vs Jan Timman, Amsterdam 1977 (with Powerbooks strong.cbh loaded). The first annotation was a big shock! Black was already a rook down out of the opening book and almost inexorably set on a path leading to a forced queen sacrfice to avoid a mate. I thought strong.cbh was supposed to contain only the strongest opening lines for balanced play - and not lines where one side is already dead in the water. I have found the odd similar one in the Sicilian too (including one highly rated line leading to immediate loss of a piece). Are there any tools around to debug opening books and run a sanity check on the nodes to remove branches where one player is already more than the exchange down? I created myself a nul opening book to force annotation back to the begining of the game. Ideally to mimic the experiment one culled to exactly 24 ply would be perfect, but I don't know how to do that in Chessbase. The second game was a Kasparov vs Ivanchuk 1995 Riga game [E62] 53 moves. I chose it as a long balanced game leading to a draw in the endgame. Crafty19.19 really struggled with this one at 12ply. Not only did it fail to find the win for Kasparov at move 43. hxg5 instead of Kf3, but it ground my machine to a complete standstill considering move 20. ...Qg7 and although it found 20. ... Rb8 (preferred Qg5) took nearly as long (over 30mins) on this single move as Shredder 12ply took for the entire game! was less tactically aware, so it was much less use *to me* [as well If you want to see interesting tactical awareness that you can learn from then you definitely want Shredder10. I am not yet convinced by Rybka it may be immensely strong in ELO rating but some of the lines it finds are well - inhuman. as weaker in the Elo sense], paradoxically despite perhaps being a better match to actual IM/GM play. But computer chess has moved on a long way since then. Indeed. Despite the clear fact the Rybka benchmarks stronger in engine- engine matches it seems to lack something in the endgame/endgame transition stage. I guess it matters little how it plays the endgame if it usually wins in the middlegame. [...] GM level games are littered with precisely the sort of positions that chess engines find really difficult to score accurately. And they usually occur at pivotal moments. This is true. But -- until someone runs the experiment -- this does not necessarily mean that Crafty-12 makes a worse pig's ear of this than a much stronger engine. What matters to the experiment is not whether Crafty's evaluation of the position is the same as the GM's or is better/worse that [eg] Rybka's. We are accumulating the difference between Crafty's [or Rybka's] score for its own and for the GM's move. The problem here is that Crafty is frequently out by more than 50cp on key variations and has been in all the GM games I have fed it so far. Admittedly the first two were engine showpieces but the second pair were randomly chosen high level games. You can see it happen most prominently in the longer game where it misses the crucial winning line and mis scores a host of moves systematically wrong because it doesn't understand what is going on. If, for example, Crafty completely misunderstands a pawn sacrifice, then there is a 1-pawn "mistake" in Crafty's assessment of [eg] Spassky's play. If Spassky does this every other game [he surely doesn't do it more than that!], that's a 0.013 or so systematic error in Spassky's results. That could take him above Kasparov and Karpov in the rankings, but gets him nowhere near Kramnik and Capablanca [who are 0.03 ahead]; on the other hand, K&K have their own share of "mysterious" pawn sacrifices, so quite probably Spassky would stay below them. I don't think it is quite so clear cut. I do think that a fair proportion of the "errors" that the G&B analysis says the GMs have made are in reality just the rms error of Crafty's evaluation which is something like 30cp multiplied by the number of times they do something that it doesn't expect. Suppose also that Crafty has rather "static" positional evaluations; in that case, it may well be that Crafty sees much less difference between its own preference and Spassky's in most relatively quiet positions than perhaps it should, or than Rybka does. Crafty may in that case be misjudging Spassky's moves, and his positions, but not in a way that makes his play seem bad; whereas Rybka may be seeing and "understanding" more, but be penalising Spassky much more for any discrepancies [which may or may not be "real"]. It's not easy. We [someone!] should run the experiment before jumping to conclusions. This may be a computer-chess version of the fact that it is not always the best practitioners who make the best teachers [or examiners]. Although this is possible. An engine that cannot detect important wins and tactical lines is not a good choice, and hobbling it to 12ply even if it was the only way to do the experiment makes matters even worse. .. [...] I think it mostly has found the players with the lowest blunder rate fairly convincingly. Yep. That's why my overall view is that their results are probably not too far out, despite the obvious problems with the methodology. If you were asked to rank the WCs in order That was my initial impression too until I started tormenting engines with a few top level games to see how well Crafty 12ply fared. The initial results are not good. OK I admit it is possible that that 4 games I picked are totally unrepresenatitive, but I think it more likely that the same sorts of errors are present in almost every GM game. We could eliminate this possibility if a few more people would pick a game and annotate it with their favourite engine hobbled to 12ply, favourite engine 60s/move and Crafty12ply. I am not sure the resulting games are exciting enough to post here - multiple annotations in PGN look a real mess. But a summary of the outcome would be OK. It is time to turn the question around slightly. Can anyone find a GM level game where Crafty at 12ply avoids missing important winning lines and obtains reasonable blundercheck agreement to within say 20cp against any other top rated engine run for 60s/move? So far all the games I have tested have shown serious discreprancies (50cp). Regards, Martin Brown |
|
#210
|
|||
|
|||
|
On May 11, 6:58 pm, (Dr A. N. Walker) wrote:
In article .com, Threading got messed up. First copy of my reply to you was dropped on the floor by Google and the repeat post has been incorrectly threaded under David Kane above. Regards, Martin Brown |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | February 19th 06 05:44 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | rec.games.chess.misc (Chess General) | 0 | January 7th 06 01:24 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | January 7th 06 01:22 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | December 29th 05 07:04 PM |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | October 19th 05 05:37 AM |