In article . com,
Martin Brown wrote:
In positional terms, I trust my own judgement more than Fritz,
so I'm really using the computer only for blunder-checking. If
In that case it is certainly worth downloading and running something
like Fruit2.2.1 (evaluation free for 14days) as a kibitzer to see the
sort of things that you are missing. [...]
What sorts of things do you think I am missing that Fruit
[or any other strong engine] might show me? We are perhaps somewhat
at cross-purposes, in that I'm primarily interested [when entering
my own games] *only* in the stupid tactical things I've missed. If
Fritz doesn't show it in 10 seconds, then it wasn't that stupid after
all.
[...] Roughly Crafty19.19 takes 1-3mins to reach
12ply in this mode but in 60s Shredder10 typically reaches 15-16ply in
all but the most complex positions.
OK, but different engines mean different things by "12 ply".
[My own program would mean "anything from 6-ply upwards", as it has
variable depth-reduction, depending how "interesting" a move is, and
particularly boring moves count double; I know this is eccentric.]
[...]
The problem here is that Crafty is frequently out by more than 50cp on
key variations and has been in all the GM games I have fed it so far.
Hang about! The *true* value of any position is "won",
"drawn" or "lost", so "out by 50cp" is meaningless except as an
evaluation against some amorphous scale of "slight advantage",
"definite advantage", "surely a winning advantage" and so on. If
Crafty is getting the *material* wrong, that's probably serious,
but otherwise 50cp simply means that Crafty has a different scale
for positional edges. It's not, of itself, right or wrong *unless*
it causes Crafty to lose a drawn position or lose/draw a won one.
....
Admittedly the first two were engine showpieces but the second pair
were randomly chosen high level games. You can see it happen most
prominently in the longer game where it misses the crucial winning
line and mis scores a host of moves systematically wrong because it
doesn't understand what is going on.
... But does it miss the crucial winning line because it
has *tactical* shortcomings, or because it misunderstands how to
play positionally? "Missing a winning line" sounds more like the
former [or you might have said (eg) "misses the winning plan"].
How are you judging "systematically wrong"? Merely because a
strong engine gives different numbers, or because [eg] Crafty
gives the "wrong" number of centipawns to a positional feature?
There is no objective meaning to be attached to "White is 1.23
centipawns ahead" other than "Rybka/Fruit/Crafty gives this as
its evaluation".
[...] I do think that a fair
proportion of the "errors" that the G&B analysis says the GMs have
made are in reality just the rms error of Crafty's evaluation which is
something like 30cp multiplied by the number of times they do
something that it doesn't expect.
Possibly; and we won't know unless/until someone does the
experiment. But in that case, the actual figures for most WCs of
around 13cp/move, and less for the WCs who most of us would regard
as the most "accurate" in their positional judgement and tactical
awareness, are surprisingly low. Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.
It is time to turn the question around slightly. Can anyone find a GM
level game where Crafty at 12ply avoids missing important winning
lines and obtains reasonable blundercheck agreement to within say 20cp
against any other top rated engine run for 60s/move? So far all the
games I have tested have shown serious discreprancies (50cp).
I don't think this is an interesting question *unless* we can
produce an objective meaning [beyond Crafty/Rybka] of 20cp.
--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.