View Single Post
  #209  
Old May 14th 07, 12:01 PM posted to rec.games.chess.misc,rec.games.chess.computer
Martin Brown
external usenet poster
 
Posts: 686
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 11, 6:58 pm, (Dr A. N. Walker) wrote:
In article .com,
Martin Brown wrote:

OK. But without doing that for the moment. What settings do you use to
analyse annotate your own games?


I don't. I enter them via ChessBase with Fritz running.
In positional terms, I trust my own judgement more than Fritz,
so I'm really using the computer only for blunder-checking. If


In that case it is certainly worth downloading and running something
like Fruit2.2.1 (evaluation free for 14days) as a kibitzer to see the
sort of things that you are missing. If you only buy one new chess
engine a year I would still recommend Shredder10 (or 11 if it comes
out soon) - the ultra compact and fast ram based endgame tablebases
for 34 and 345 pieces make it well worth having.

Fritz doesn't see anything before I move on, tough. [Of course,


Fritz does miss some important tactical motifs - especially at 12 ply.
If you have the entire game entered then using blundercheck inside the
chess program GUI takes only about 10s per move to reach 12 ply if you
play reasonably accurately. It stalls each time you deviate and the
cache ceases to be useful. Roughly Crafty19.19 takes 1-3mins to reach
12ply in this mode but in 60s Shredder10 typically reaches 15-16ply in
all but the most complex positions.

I guess my way of doing it comes from the fact I have muddled along
without a proper database for a long time and have still not adjusted
to using Chessbase for manipulating my own games. I still haven't
found where the blundercheck button is hidden in Chessbase - its not
on the tools menu that I can see.

because Fritz seems to be finding something].


Worth running another engine alongside it for a while. I find Fritz
blundercheck a bit dull YMMV.

I would be prepared to bet it is nothing like as shallow as 12 ply
fixed + quiessence.


You might lose your bet, or at least part of it. It takes
Fritz a reasonable time to get past 12 ply [of course, that's usually
something like "12/27"] in the middle-game, and I very rarely wait
for it to reach a depth that is "nothing like as shallow". The ending
is different, of course.


You should definitely try one of the other engines. And/or take half a
dozen games and annotate them with blundercheck set to something like
30s/move with one of Fruit/Shredder/Rybka.

[The G&B experiment:]

It will penalise GMs that have formed plans extending beyond 12 ply if
there is no obvious gain made inside its quiessence horizon. And it
hardly ever sees material sacrifices for gains in positional advantage
or tempo.


I have rarely used Crafty. But Fritz usually at least sees
some compensation -- eg you sacrifice a pawn and see a 0.6 drop in
the evaluation, even if Fritz has no idea of the true worth of the
sacrifice. The experience I *did* have with Crafty, some years ago,
was that it seemed to produce better evaluations than Fritz, but it


I have run a few tests on in this case randomly chosen matches with
somewhat interesting results. Sort of what I expected but with a few
surprises thrown in as well. AFAIK Neither of these games are known
engine traps.

The first was precomputer chess very short 25 move minature Boris
Spassky vs Jan Timman, Amsterdam 1977 (with Powerbooks strong.cbh
loaded). The first annotation was a big shock! Black was already a
rook down out of the opening book and almost inexorably set on a path
leading to a forced queen sacrfice to avoid a mate. I thought
strong.cbh was supposed to contain only the strongest opening lines
for balanced play - and not lines where one side is already dead in
the water. I have found the odd similar one in the Sicilian too
(including one highly rated line leading to immediate loss of a
piece).

Are there any tools around to debug opening books and run a sanity
check on the nodes to remove branches where one player is already more
than the exchange down?

I created myself a nul opening book to force annotation back to the
begining of the game. Ideally to mimic the experiment one culled to
exactly 24 ply would be perfect, but I don't know how to do that in
Chessbase.

The second game was a Kasparov vs Ivanchuk 1995 Riga game [E62] 53
moves. I chose it as a long balanced game leading to a draw in the
endgame. Crafty19.19 really struggled with this one at 12ply. Not only
did it fail to find the win for Kasparov at move 43. hxg5 instead of
Kf3, but it ground my machine to a complete standstill considering
move 20. ...Qg7 and although it found 20. ... Rb8 (preferred Qg5) took
nearly as long (over 30mins) on this single move as Shredder 12ply
took for the entire game!

was less tactically aware, so it was much less use *to me* [as well


If you want to see interesting tactical awareness that you can learn
from then you definitely want Shredder10. I am not yet convinced by
Rybka it may be immensely strong in ELO rating but some of the lines
it finds are well - inhuman.

as weaker in the Elo sense], paradoxically despite perhaps being a
better match to actual IM/GM play. But computer chess has moved on
a long way since then.


Indeed. Despite the clear fact the Rybka benchmarks stronger in engine-
engine matches it seems to lack something in the endgame/endgame
transition stage. I guess it matters little how it plays the endgame
if it usually wins in the middlegame.

[...] GM level games
are littered with precisely the sort of positions that chess engines
find really difficult to score accurately. And they usually occur at
pivotal moments.


This is true. But -- until someone runs the experiment --
this does not necessarily mean that Crafty-12 makes a worse pig's
ear of this than a much stronger engine. What matters to the
experiment is not whether Crafty's evaluation of the position is
the same as the GM's or is better/worse that [eg] Rybka's. We
are accumulating the difference between Crafty's [or Rybka's]
score for its own and for the GM's move.


The problem here is that Crafty is frequently out by more than 50cp on
key variations and has been in all the GM games I have fed it so far.
Admittedly the first two were engine showpieces but the second pair
were randomly chosen high level games. You can see it happen most
prominently in the longer game where it misses the crucial winning
line and mis scores a host of moves systematically wrong because it
doesn't understand what is going on.

If, for example, Crafty completely misunderstands a pawn
sacrifice, then there is a 1-pawn "mistake" in Crafty's assessment
of [eg] Spassky's play. If Spassky does this every other game
[he surely doesn't do it more than that!], that's a 0.013 or so
systematic error in Spassky's results. That could take him
above Kasparov and Karpov in the rankings, but gets him nowhere
near Kramnik and Capablanca [who are 0.03 ahead]; on the other
hand, K&K have their own share of "mysterious" pawn sacrifices,
so quite probably Spassky would stay below them.


I don't think it is quite so clear cut. I do think that a fair
proportion of the "errors" that the G&B analysis says the GMs have
made are in reality just the rms error of Crafty's evaluation which is
something like 30cp multiplied by the number of times they do
something that it doesn't expect.

Suppose also that Crafty has rather "static" positional
evaluations; in that case, it may well be that Crafty sees much
less difference between its own preference and Spassky's in most
relatively quiet positions than perhaps it should, or than Rybka
does. Crafty may in that case be misjudging Spassky's moves, and
his positions, but not in a way that makes his play seem bad;
whereas Rybka may be seeing and "understanding" more, but be
penalising Spassky much more for any discrepancies [which may or
may not be "real"].

It's not easy. We [someone!] should run the experiment
before jumping to conclusions. This may be a computer-chess
version of the fact that it is not always the best practitioners
who make the best teachers [or examiners].


Although this is possible. An engine that cannot detect important wins
and tactical lines is not a good choice, and hobbling it to 12ply even
if it was the only way to do the experiment makes matters even worse.
..
[...] I think it mostly has found the players with the lowest
blunder rate fairly convincingly.


Yep. That's why my overall view is that their results
are probably not too far out, despite the obvious problems with
the methodology. If you were asked to rank the WCs in order


That was my initial impression too until I started tormenting engines
with a few top level games to see how well Crafty 12ply fared. The
initial results are not good. OK I admit it is possible that that 4
games I picked are totally unrepresenatitive, but I think it more
likely that the same sorts of errors are present in almost every GM
game.

We could eliminate this possibility if a few more people would pick a
game and annotate it with their favourite engine hobbled to 12ply,
favourite engine 60s/move and Crafty12ply. I am not sure the resulting
games are exciting enough to post here - multiple annotations in PGN
look a real mess. But a summary of the outcome would be OK.

It is time to turn the question around slightly. Can anyone find a GM
level game where Crafty at 12ply avoids missing important winning
lines and obtains reasonable blundercheck agreement to within say 20cp
against any other top rated engine run for 60s/move? So far all the
games I have tested have shown serious discreprancies (50cp).

Regards,
Martin Brown

Ads
 

Mortgages - Online Advertising - Cheap Loan - Mobile Phones - Mortgage