View Single Post
  #224  
Old May 17th 07, 01:00 PM posted to rec.games.chess.misc,rec.games.chess.computer
Martin Brown
external usenet poster
 
Posts: 686
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 16, 2:50 pm, (Dr A. N. Walker) wrote:
In article .com,
Martin Brown wrote:

[...] So I'm guessing that when both sides think they
are winning [by something significant, not by 20cp or so], one
of them has overlooked something of tactical importance,


It started from the opening in this game. Shredder +0.90 Crafty -0.27
peaking at +1.40 vs -0.10 then converging a bit until the fateful
17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before
again diverging to 1.24, 3.27.


Ah. Perhaps I have a misunderstanding about what you did
or said? I understood that you had played some fast[ish] games
between Shredder and Crafty, and then passed on to us Shredder's
analysis [at much longer time limits] of the game? So that, for


Yes. But the fastish game was calculated roughly to approximate an
average 12 ply lookup for Crafty and a 12ply lookup for Shredder (I
miscalculated the time penalty and Shredder got to 13+ ply at times in
this game).

example, the analysis would have been/looked the same even if the
game had been a GM encounter at classical time limits or you vs
me at 5-min chess, or any other source? So were the black scores
not *Shredder's* evaluations rather than *Crafty's*? Otherwise, I
find the exact agreements, eg at 1.22 and later at 2.58, for several
moves very suspicious. And if so, then this is not an example of


No. That is a quirk of how annotating a game by working back from the
end position it can utilise deep cache move evaluations from the
transposition table provided that that game follows a reasonably
strong well explored line according to the engines own evaluation.
Same happens in GM games since the cache often contains useful info.

both *sides* thinking they were winning, but rather of *Shredder*
thinking both sides were winning?


No. Shredder scores it as a consistent win for white. There are some
important differences between the fast score and the deeper analysis
though. Here is the original game with the engines scores and thinking
time.

[Event "AOI, Blitz:4'+2""]
[Site "East Rounton"]
[Date "2007.05.14"]
[Round "1"]
[White "Shredder 10"]
[Black "Crafty 19.01"]
[Result "1-0"]
[ECO "D25"]
[WhiteElo "9999"]
[BlackElo "9999"]
[Annotator "0.30;0.36"]
[PlyCount "83"]
[TimeControl "240+2"]

{Intel(R) Pentium(R) 4 CPU 3.00GHz 2992 MHz W=13.1 ply; 354kN/s
B=12.2 ply;
835kN/s; 2 TBAs} 1. d4 {Both last book move 0.30/16 12} Nf6 {0.36/12
27} 2. Nf3
{(Bf4) 0.27/15 11} d5 {0.32/12 26} 3. c4 {(e3) 0.27/16 11} dxc4 {
(e6) -0.26/12 25} 4. e3 {(Nc3) 0.34/14 20} b5 {(Bf5) -0.21/12 24} 5.
a4 {
0.79/14 13} c6 {-0.27/12 24} 6. axb5 {(Be2) 0.90/13 5} cxb5 {-0.11/13
23} 7.
Nc3 {0.72/14 17} Qb6 {(Bd7) -0.26/12 23} 8. b3 {(Ne5) 1.01/13 12} e6 {
(b4) -0.15/11 22} 9. bxc4 {0.81/13 15} b4 {(Bb4) -0.16/12 22} 10. c5 {
(Qa4+) 1.04/13 14} Qb7 {-0.22/11 21} 11. Rb1 {1.00/12 8} Nc6 {-0.06/11
21} 12.
e4 {(Bc4) 1.14/12 13} a6 {(Be7) -0.14/9 20} 13. Bc4 {(Bf4) 1.40/13 22}
Qc7 {
-0.09/10 19} 14. Ne2 {(e5) 1.18/12 7} Nxe4 {-0.50/10 21} 15. O-O
{0.98/11 5} f5
{(Bb7) -0.43/10 19} 16. Bf4 {0.67/11 11} Qd7 {0.40/11 24} 17. Bb3 {
(Ne5) 1.02/10 6} Be7 {(Qb7) 0.60/11 18} 18. Ba4 {2.77/12 5} Bf6
{2.43/11 21}
19. Rxb4 {2.35/13 6} Nxb4 {2.61/12 17} 20. Ne5 {2.33/13 7} Bxe5
{2.35/14 17}
21. Bxd7+ {2.38/14 4} Bxd7 {2.22/14 5} 22. Bxe5 {2.38/15 6} O-O
{2.21/13 16}
23. f3 {2.35/14 3} Nf6 {2.43/13 16} 24. Nf4 {(Qd2) 2.35/14 5} a5
{2.15/11 15}
25. Qb3 {2.48/13 5} Ra6 {(Rfe8) 2.16/11 15} 26. Kf2 {(Re1) 2.67/11 7}
Nfd5 {
(Kf7) 1.43/12 15} 27. Nxd5 {2.96/11 1} exd5 {1.68/12 14} 28. Ke3 {
(Rc1) 2.86/12 3} Rg6 {(Bb5) 1.09/11 14} 29. Rg1 {(g3) 2.94/11 3} Bb5 {
1.11/11 14} 30. g4 {(Kd2) 2.81/11 4} Re8 {(f4+) 1.16/11 14} 31. h3 {
(Kd2) 2.99/11 4} Bc4 {(fxg4) 1.24/10 13} 32. Qa4 {3.27/12 2} Nc6
{1.82/12 15}
33. gxf5 {3.60/12 2} Rxg1 {2.24/13 13} 34. Qxc6 {4.44/11 2} Re1+ {
(Kf7) 2.80/12 13} 35. Kf4 {5.16/12 3} Kf7 {3.25/13 12} 36. Qc7+ {(f6)
5.65/13 2
} Re7 {4.90/14 37} 37. Qxa5 {5.93/13 2} Rc1 {(Rg1) 4.95/12 12} 38. Qd8
{
7.15/11 2} Re8 {(Bb5) 6.19/11 22} 39. Qg5 {9.31/12 2} Rxe5 {7.01/13
10} 40.
dxe5 {10.19/12 3} d4 {7.94/12 37} 41. Qd8 {11.44/11 3} h6 {9.39/11 15}
42. Qd7+
{11.41/10 2} 1-0

It was pretty clear in this game that
Crafty simply did not know which way was up!


Absolutely. Computers seem to be prone to that sort of
game, though. Once they don't understand a position, they tend
to go *really* pear-shaped.


Crafty allowed itself to get stiffed with a double threat knight
skewered against queen, king and losing a pawn. At move 17 it should
have seen that getting the queen off d7 was a priority rather than
just developing the bishop. Shredder was expecting Qb7 (which is
probably weaker than Qa7 or Qe7). Then it pounced.

For the sake of balance in a best of 3 engine match at this time
penalty was 1 win 1 draw 1 loss for each engine.


Don Beal used to say that you need matches of 100+ games
to find which engine is better -- he had cases where one side
was losing 17-0 or thereabouts but hauled back to win [and this
in the days before "learning"].


I may give it a try overnight to see.

BTW is there a way to get the graphical display of time taken and
engine score shown in Chessbase window or does the game have to be put
into the playing window to see that info?


Pass. I've never entered games with that info in the first place.


I don't enter the info so much as allow blundercheck 20s to run on all
my games. And for most of the GM games that I decide look interesting
too - always fun to compare old human annotation in books to modern
engines.

Remember that is Crafty working at roughly the same search depth
setting as was being used to judge the play of world champoin chess
players. It may be a bit unfair to make it play the opening (where its
performance is very poor).


Ah. I assumed 8s/move wouldn't be enough to reach the
depth used by G&B [roughly 6h/game on 2.5GHz machines] ....


I don't understand why they got quite such bad performance, but I
suspect the full width fixed ply search with the cache cleared between
moves for reproducibility probably played a part. Complex middlegames
slow down a lot on Crafty. In this game most times the engines played
the move that the other engine had expected fairly often - I watched
the first game in realtime (Crafty about 20s/move for 40 moves).

Yes, but you still seem not to have understood! Look, suppose
some engine gives 1.23 as its evaluation. That means that somewhere
down the tree there is a position, reached by "best play" as far as
the current collection of static evaluations goes, which has a static
evaluation of 1.23. *That* evaluation is a sum of various factors --
+1.00 because we have an extra pawn, +0.17 because we control an open
file, +0.47 because of king safety, -0.13 because the opposing knight
is well-placed, +- this, that and the other, possibly including all
manner of complexity and joint factors, etc. Only the extra pawn is
"gold standard" currency. Everything else is there either because
BobH or some other programmer has decreed that an open file is worth
0.17 or because a "learning" program has currently settled on that
as the value. None of it is reliable [else we wouldn't need the tree


I agree entirely so far. But what is interesting here is that the
current generation of top programs appear to have tuned the evaluation
function weights for self consistency to maximise the efficiency of
alpha-beta cutoffs in the tree.

none of it relates very closely to how a GM would assess that position.


Indeed, Although to me Shredder feels closer to human assessment than
other engines.
(and it still struggles with endgame transitions like all of them)

by 0.5. The miracle of computer chess is that quite often the numbers
agree within 0.2 or so. But there is no useful, objective, value in


In a lot of cases they agree about the best line though even if they
score it differently.

Looking more carefully at that game
there were long sustained periods where Crafty was more than 100cp off
the mark and about 10 moves where it was more than 200cp out (and in
the middlegame). This doesn't bode well for its ability to score GM
level play.


What matters is not that [esp as basically the difference over
those moves was whether Crafty in those positions was totally and
utterly lost or merely utterly and totally lost, and the G&B scheme
would have stopped counting by then], but whether Crafty/Shredder
mis-assesses the correct move ordering, and if so by how much.


The best way I can think of to test this is on the key positions where
things went awry. The majority of positions where they pretty much
agree on the continuation line don't provide any discrimination.

[...] If Crafty12 is so rotten, it's been
amazingly lucky.

I don't think it is that rotten. Just that it misses a lot of the rare
but absolutely key GM moves and marks them down because it doesn't
understand them. It probably gets the 95% of the routine moves exactly
right, but it is the handful of other moves that make all the
difference.


If I am reading the annotations correctly, then in the game
you gave, Shredder and Crafty each played 16 moves out of the 41 that
were sub-optimal according to Shredder. That doesn't seem a handful
of other moves to me, and it makes it seem unlikely that a GM would
have played nearly all the moves, routine or otherwise, the same way
as Crafty. [Of course, most of the 16+16 were "in the noise", but
it still suggests that with a 10cp noise level, there is a lot of
scope for Crafty/Shredder to make quite different moves from GMs.
Indeed, G&B's Fig.7 shows most WCs playing the same move as C12
about 50% of the time.]


I think that may not be as remarkable as it sounds. And in essence it
highlights one of the problems of having Crafty (or Fritz for that
matter) scoring GM level games. It will automatically penalise anyone
who raises or maintains the complexity of the postion by keeping the
tension and does not swap off material when it is safe to do so. I
reckon that is why it scores Capablanca and Kramnik so highly - take a
look at fig 8.

According to the metrics line Crafty was at average search depth 12.2
and Shredder at 13.1 (but in 1/3 the time) during this test match.


OK, but this is not helping your thesis! Summarising, we
now have that C12 deviates by around 0.35 from Shredder, by between
0.10 and 0.15 from almost all WCs, and by between 0.06 and 0.09 from
recent strong computers in matches vs humans, despite an expected
0.1 or so error from random noise. Accepting that Shredder/Rybka/
etc are technically stronger than Crafty, this nevertheless suggests
that Crafty is doing very well at emulating these other players and
engines, and by inference at assessing how good they are.


I agree that the numbers do not seem to add up.

Can we define a set of rules then [...]


Something that might be interesting. There are books out
there with titles like "How Good is Your Chess" [and regular
articles in a number of magazines] where strong players have
annotated games with point scores ["Score 7 for Nxe5, 3 for Bg5,
-3 if you blundered by Re1, 1 for routine development by Nc3,
0 for anything else"]. We could set engines doing these tasks
at various rates, see how they score, see how they rate the
alternatives, and perhaps -- if someone would sink some time
and/or money into this -- get some GMs to comment both on the
original scoring and on the computer results.


OK. This sounds like an amusing idea. And not too onerous.
How about GM Daniel Kings HGIYC piece from May's Chess magazine?
I expect that the intricacies of the Ragozin Defence will give some
engines a very serious headache.
with fixed Ply 1, Ply 12 and 60s/move searches as the test conditions?

I still can't take centipawns seriously enough to want
to invest effort into tracking down 20cp discrepancies ....


Neither can I. But I am curious to identify the types of position
where choice of the right engine (or other program) is important for
analysing the position correctly. Engines can have very different
playing styles.

Regards,
Martin Brown

PS Goofgle dropped it on the floor again so perhaps I will be third
time lucky.

Ads
 

Bleach - Loans - Bad Credit Loan - Loans - ADD Coaches