Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)
On May 14, 7:37 pm, (Dr A. N. Walker) wrote:
In article . com,
Martin Brown wrote:
In positional terms, I trust my own judgement more than Fritz,
so I'm really using the computer only for blunder-checking. If
In that case it is certainly worth downloading and running something
like Fruit2.2.1 (evaluation free for 14days) as a kibitzer to see the
sort of things that you are missing. [...]
What sorts of things do you think I am missing that Fruit
[or any other strong engine] might show me? We are perhaps somewhat
I tend to like having a faster engine analyse the game at a deeper
level. On paper there is little to chose between Fritz & Shredder but
in reality they give a very different view of some positions where the
strongest move is not a capture.
at cross-purposes, in that I'm primarily interested [when entering
my own games] *only* in the stupid tactical things I've missed. If
Fritz doesn't show it in 10 seconds, then it wasn't that stupid after
all.
Or perhaps it was, but Fritz is blinded to it by a tempting looking
swapoff line. The sorts of things that it will predictably miss are
the passive looking single moves that set up a future forcing
combination or longer term positional advantage. The one that Phil
Innes challenged me with earlier in the thread being a canonical
example. The key move there Nh4 aiming for a weak spot on g6 where it
becomes a major thorn in black's side is beyond hope of Fritz ever
seeing it. Rybka & Shredder both find it quickly and Fruit managed it
after half an hour just before I lost patience with it.
[...] Roughly Crafty19.19 takes 1-3mins to reach
12ply in this mode but in 60s Shredder10 typically reaches 15-16ply in
all but the most complex positions.
OK, but different engines mean different things by "12 ply".
I agree that ply has a somehwhat random meaning, but here we are
talking of a notional full depth search with pruning to at least that
depth with various extensions. It is in the choice of lines to extend
that different engines can give radically diferent results. Fritz &
Shredder/Rybka are poles apart. Rybka doesn't show how far search
extensions go.
[My own program would mean "anything from 6-ply upwards", as it has
variable depth-reduction, depending how "interesting" a move is, and
particularly boring moves count double; I know this is eccentric.]
I grant you that the notional evaluation displayed in blundercheck as
n.nn/MM MM is decidedly +- 1 or 2. Presumably some miscounting of
singular extensions or cache lookups.
The problem here is that Crafty is frequently out by more than 50cp on
key variations and has been in all the GM games I have fed it so far.
Hang about! The *true* value of any position is "won",
"drawn" or "lost", so "out by 50cp" is meaningless except as an
evaluation against some amorphous scale of "slight advantage",
"definite advantage", "surely a winning advantage" and so on. If
Crafty is getting the *material* wrong, that's probably serious,
but otherwise 50cp simply means that Crafty has a different scale
for positional edges. It's not, of itself, right or wrong *unless*
it causes Crafty to lose a drawn position or lose/draw a won one.
If you run an engine engine match with the stronger engine penalised
on time to give Crafty a chance you can watch as the game unfolds.
Both sides claim to be winning for a while until one gets a deep
tactical edge over the other. I did one last night which illustrates
my point - here annotated here by the victor at 60s/move.
[Event "AOI, Blitz:4'+2""]
[Site "East Rounton"]
[Date "2007.05.14"]
[Round "1"]
[White "Shredder 10"]
[Black "Crafty 19.01"]
[Result "1-0"]
[ECO "D25"]
[WhiteElo "9999"]
[BlackElo "9999"]
[Annotator "0.30;0.36"]
[PlyCount "83"]
[TimeControl "240+2"]
{Intel(R) Pentium(R) 4 CPU 3.00GHz 2992 MHz W=13.1 ply; 354kN/s
B=12.2 ply;
835kN/s; 2 TBAs} 1. d4 {Both last book move 0.30/16 12} Nf6 {0.36/12
27} 2. Nf3
{(Bf4) 0.27/15 11} d5 {0.32/12 26} 3. c4 {(e3) 0.27/16 11} dxc4 {
(e6) -0.26/12 25} 4. e3 {(Nc3) 0.34/14 20} b5 {(Bf5) -0.21/12 24} 5.
a4 {
0.79/14 13} c6 {-0.27/12 24} 6. axb5 {(Be2) 0.90/13 5} cxb5 {-0.11/13
23} 7.
Nc3 {0.72/14 17} Qb6 {(Bd7) -0.26/12 23} 8. b3 {(Ne5) 1.01/13 12} e6 {
(b4) -0.15/11 22} 9. bxc4 {last book move 0.81/13 15} b4 {(Bb4)
1.22/16 22} 10.
c5 {(Qa4+) 1.22/16 14} Qb7 {1.22/14 21} 11. Rb1 {1.37/15 8} Nc6
{1.22/15 21}
12. e4 {(Bc4) 1.22/13 13} a6 {(Be7) 1.51/14 20} 13. Bc4 {(Bf4) 1.33/14
22} Qc7
{1.33/14 19} 14. Ne2 {(e5) 1.17/16 7} Nxe4 {0.66/14 21} 15. O-O
{0.66/13 5} f5
{(Bb7) 1.79/15 19} ({Shredder 10:} 15... Bb7 16. Bf4 Qd7 17. Qb3 Be7
18. d5
exd5 19. Bxd5 Qf5 20. Bxc6+ Bxc6 21. Nfd4 Qxc5 22. Nxc6 Qxc6 23. Rfd1
{0.66/13}
) 16. Bf4 {1.65/13 11} Qd7 {2.01/15 24} 17. Bb3 {(Ne5) 1.77/13 6} Be7
{
(Qb7) 2.58/16 18} ({Shredder 10:} 17... Qa7 18. Ba4 Bd7 19. Qb3 Qb7
20. Bxc6
Qxc6 21. Ne5 Qb5 22. Nxd7 Qxd7 23. Qxb4 Kf7 24. Qc4 Be7 {1.77/13}) 18.
Ba4 {
2.58/14 5} Bf6 {2.58/16 21} 19. Rxb4 {2.58/14 6} Nxb4 {2.58/16 17} 20.
Ne5 {
2.58/18 7} Bxe5 {2.58/18 17} 21. Bxd7+ {2.58/18 4} Bxd7 {2.58/18 5}
22. Bxe5 {
2.58/16 6} O-O {2.58/16 16} 23. f3 {2.55/17 3} Nf6 {2.55/15 16} 24.
Nf4 {
(Qd2) 2.51/15 5} a5 {2.59/15 15} 25. Qb3 {2.58/14 5} Ra6 {(Rfe8)
2.87/15 15}
26. Kf2 {(Re1) 2.86/14 7} Nfd5 {(Kf7) 2.86/14 15} 27. Nxd5 {2.96/16 1}
exd5 {
2.97/16 14} 28. Ke3 {(Rc1) 2.96/14 3} Rg6 {(Bb5) 3.15/16 14} 29. Rg1 {
(g3) 2.96/15 3} Bb5 {2.98/14 14} 30. g4 {(Kd2) 2.63/16 4} Re8 {(f4+)
3.06/14 14
} 31. h3 {(Kd2) 2.93/14 4} Bc4 {(fxg4) 3.65/16 13} 32. Qa4 {3.62/14 2}
Nc6 {
5.63/19 15} ({Shredder 10:} 32... Rf8 33. Rb1 fxg4 34. hxg4 Nc6 35.
Rb7 Rc8 36.
Rb6 Kh8 37. f4 {3.62/14}) 33. gxf5 {5.63/17 2} Rxg1 {5.63/17 13} 34.
Qxc6 {
5.63/15 2} Re1+ {(Kf7) 7.00/17 13} 35. Kf4 {7.12/18 3} Kf7 {7.00/18
12} 36.
Qc7+ {(f6) 7.00/20 2} Re7 {7.00/18 37} 37. Qxa5 {7.00/16 2} Rc1 {
(Rg1) 9.36/17 12} ({Shredder 10:} 37... Rg1 38. c6 g5+ 39. fxg6+ Rxg6
40. c7
Rc6 41. Qa7 Ba6 42. Qa8 Ree6 43. Qd8 h6 {7.00/16}) 38. Qd8 {9.41/16 2}
Re8 {
(Bb5) 16.78/17 22} ({Shredder 10:} 38... Rg1 39. Bd6 Rge1 40. Bxe7
Rxe7 41. c6
Re8 42. c7 Ba6 43. Qxd5+ Kf8 44. Qd6+ Re7 45. f6 gxf6 46. Qd8+ Kf7
{9.41/16})
39. Qg5 {16.78/16 2} Rxe5 {17.52/16 10} 40. dxe5 {19.80/16 3} d4
{21.73/16 37}
41. Qd8 {20.98/14 3} h6 {#151/15 15} ({Shredder 10:} 41... h5 42. Qd7+
Kf8 43.
c6 Bf7 44. e6 Bxe6 45. fxe6 Kg8 46. e7 g5+ 47. Kxg5 Re1 48. Qe8+
{20.98/14})
42. Qd7+ {#150/19 2} 1-0
Both engines playing with no opening book. Shredder10 has found a
plausible Queens Gambit sideline [D25] ab initio out to move 9. The
evaluation looks like a picket fence for large parts of the game.
Admittedly the first two were engine showpieces but the second pair
were randomly chosen high level games. You can see it happen most
prominently in the longer game where it misses the crucial winning
line and mis scores a host of moves systematically wrong because it
doesn't understand what is going on.
... But does it miss the crucial winning line because it
has *tactical* shortcomings, or because it misunderstands how to
play positionally? "Missing a winning line" sounds more like the
former [or you might have said (eg) "misses the winning plan"].
How are you judging "systematically wrong"?
Incapable of seeing deep enough to catch significant move refutations,
or in some cases unable to see them at all no matter how much time it
is given.
There is no objective meaning to be attached to "White is 1.23
centipawns ahead" other than "Rybka/Fruit/Crafty gives this as
its evaluation".
I reckon the rms noise on most lines is always around 10cp no matter
how deep you go. A few quiet lines may have smaller rms errors, but
the active ones tend to bounce around a bit. But that is still enough
to have some confidence in finding gross evaluation errors of 50cp or
more (which is what Crafty at 12 ply does).
[...] I do think that a fair
proportion of the "errors" that the G&B analysis says the GMs have
made are in reality just the rms error of Crafty's evaluation which is
something like 30cp multiplied by the number of times they do
something that it doesn't expect.
Possibly; and we won't know unless/until someone does the
experiment. But in that case, the actual figures for most WCs of
around 13cp/move, and less for the WCs who most of us would regard
as the most "accurate" in their positional judgement and tactical
awareness, are surprisingly low. Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.
But they may well differ even less from the output of a stronger
engine. Certainly of the GM games I have tried Crafty12ply on it has
seen "improvements" that stronger engines at deeper ply levels can
easily refute.
If the GM makes the move Crafty expects it doesn't matter how wrong
the evaluation is. The identity X + (-X) = 0 * is very helpful. It is
only when the GM makes a different move that evaluation errors hurt
the scoring.
* Thanks to USPO Xerox have a patent on this blindingly obvious
identity as applied to JPEG decoding.
It is time to turn the question around slightly. Can anyone find a GM
level game where Crafty at 12ply avoids missing important winning
lines and obtains reasonable blundercheck agreement to within say 20cp
against any other top rated engine run for 60s/move? So far all the
games I have tested have shown serious discreprancies (50cp).
I don't think this is an interesting question *unless* we can
produce an objective meaning [beyond Crafty/Rybka] of 20cp.
Annotating a few more GM games with both Crafty12pl, your favourite
engine 12ply, and your favourite engine 60s/move would go a long way
to settling the dispute of whether or not Crafty12ply scoring is
adequate. I have tried the experiment and so far found Crafty-12ply
wanting. YMMV
I think it is worth trying to agree a test protocol that could be used
to produce say 100 top level games consistently annotated by multiple
engines. Then we might be able to get some half decent stats. Hunches
really don't cut it.
Regards,
Martin Brown
|