View Single Post
  #220  
Old May 16th 07, 10:18 AM posted to rec.games.chess.misc,rec.games.chess.computer
Martin Brown
external usenet poster
 
Posts: 686
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 15, 7:00 pm, (Dr A. N. Walker) wrote:
In article .com,
Martin Brown wrote:

[...] The one that Phil
Innes challenged me with earlier in the thread being a canonical
example. The key move there Nh4 aiming for a weak spot on g6 where it
becomes a major thorn in black's side is beyond hope of Fritz ever
seeing it. Rybka & Shredder both find it quickly [...]


Yes, but I don't *need* Fritz to see it -- I need Fritz
to confirm to me that Nh4 isn't getting trapped by ... g5 and/or
that it wasn't doing something important relative to d4/e5 [not,
of course, difficult in this particular position, but arguing
much more generally]. That's why I think we are somewhat at
cross purposes.


Perhaps, but I still think that you might find trying another engine
instead of Fritz an interesting experiment (and if you have a dual
core CPU you can run a second engine effectively for free as long as
you don't want to do anything else at the same time). Fruit2.2.1 is
free for the first 14 days trail period and it shows one alternative
game view.

You see Rybka/whatever as a strong player going
through your game and pointing out the best moves;


Yes. I am having the engine look for deep tactical themes that I would
like to be able to recognise and find in the future. Most of the time
the engine agrees with me (and I am not a GM) but when it sees
something I or my opponent missed completely that is interesting and
worth a second look. Blundercheck is my preferred method. "Analysis"
is too verbose and not so informative.

I see Fritz
as a slightly annoying spectator saying [Harry Enfield voice:]
"You can't do that, you've just dropped a piece" [/HE] as I look
at the things I or my opponent might have done.


OK

If you run an engine engine match with the stronger engine penalised
on time to give Crafty a chance you can watch as the game unfolds.
Both sides claim to be winning for a while until one gets a deep
tactical edge over the other.


Sure. But you surely aren't claiming that Crafty is
so stupid that it thinks doubled pawns are good and centralised
pieces are bad? So I'm guessing that when both sides think they
are winning [by something significant, not by 20cp or so], one
of them has overlooked something of tactical importance, which
is why, after a bit, it turns into a tactical win.


It started from the opening in this game. Shredder +0.90 Crafty -0.27
peaking at +1.40 vs -0.10 then converging a bit until the fateful
17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before
again diverging to 1.24, 3.27. It was pretty clear in this game that
Crafty simply did not know which way was up!

For the sake of balance in a best of 3 engine match at this time
penalty was 1 win 1 draw 1 loss for each engine.

BTW is there a way to get the graphical display of time taken and
engine score shown in Chessbase window or does the game have to be put
into the playing window to see that info?

I did one last night which illustrates
my point - here annotated here by the victor at 60s/move.
[Event "AOI, Blitz:4'+2""] [...]


OK, so around 8-second chess, and an amusing crunch.
It looks as though Crafty-sans-book has not the foggiest idea
about developing and getting castled, and is also tactically
unaware. But not relevant to the present debate!


Remember that is Crafty working at roughly the same search depth
setting as was being used to judge the play of world champoin chess
players. It may be a bit unfair to make it play the opening (where its
performance is very poor).

[There are some interesting discrepancies between
evaluations on successive ply in the annotations, but these
too are not that relevant, unless we find that Shredder is
much more or less prone to these than other engines.]


A lot of engines have some move parity issues depending on turn to
move.

I reckon the rms noise on most lines is always around 10cp no matter
how deep you go. A few quiet lines may have smaller rms errors, but
the active ones tend to bounce around a bit.


[In which case Capa/Kramnik's 10cp difference per move is
startlingly good ....]


I think that is probably because he tended to play down the lines with
intrinsically small rms errors whereas some of the wilder more
exciting players go for positions where Crafty practically grinds to
a standstill at ply12 whilst it tries to figure out all the
complications (and fails).

But that is still enough
to have some confidence in finding gross evaluation errors of 50cp or
more (which is what Crafty at 12 ply does).


Yes, but you still seem to be missing something. 100cp is a
pawn, and you can understand that very directly. 50cp is what? It
will matter if at some point we swap a 50cp advantage for a pawn-up
with 50cp compensation, but until then it's an arbitrary measure.


I was using that as an example. Looking more carefully at that game
there were long sustained periods where Crafty was more than 100cp off
the mark and about 10 moves where it was more than 200cp out (and in
the middlegame). This doesn't bode well for its ability to score GM
level play.

[...] Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.

But they may well differ even less from the output of a stronger
engine.


Possibly. But if G&B's table 3 is showing anything at all
objective about Crafty, it is that Crafty12 plays "rather like" all
the WCs except perhaps Steinitz, and much more like Capablanca and
Kramnik than like other WCs. If Crafty12 is so rotten, it's been
amazingly lucky.


I don't think it is that rotten. Just that it misses a lot of the rare
but absolutely key GM moves and marks them down because it doesn't
understand them. It probably gets the 95% of the routine moves exactly
right, but it is the handful of other moves that make all the
difference.

After all, in the game you showed above, Crafty10
[assuming that's roughly what it was managing in 8s] deviates by
around 35cp/move from Shredder by G&B rules. So either there's a
*huge* improvement between C10 and C12 [and C12 would agree almost


According to the metrics line Crafty was at average search depth 12.2
and Shredder at 13.1 (but in 1/3 the time) during this test match.

exactly with Shredder] or else C12 is not only strong enough to
assess WC play, but is actually closer than you might expect to
emulating it.


One other thign worrying about Crafty is that its accuracy does not
improve with increased ply at anything like the rate of other modern
engines. Its search seems to get stuck in the same local optimum ruts.
A problem it shares with Fritz.

I think it is worth trying to agree a test protocol that could be used
to produce say 100 top level games consistently annotated by multiple
engines. Then we might be able to get some half decent stats. Hunches
really don't cut it.


Absolutely. But I don't think the stats will mean what you
seem to think they mean.


Can we define a set of rules then and lets see if enough volunteers
can be mustered to get an agreed set of games from a recent match
analysed with multiple eniges. We need to set the cutoff for
annotation - I normally use 8cp to avoid getting meaningless dross,
but for this purpose and to make the test as close as possible to the
G&B protocol I guess we try either zero or if that isn't allowed 1cp
window. Lets see if we can get a dataset first...

Regards,
Martin Brown

Ads
 

Credit Cards - Xecuter 3 Mod Chip - Buy Anything On eBay - Unblock Myspace - Loans