A Chess forum. ChessBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » ChessBanter forum » Chess Newsgroups » rec.games.chess.computer (Computer Chess)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Tags: , , , , , , , , , ,

Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)



 
 
Thread Tools Display Modes
  #221  
Old May 16th 07, 11:59 AM posted to rec.games.chess.misc,rec.games.chess.computer
Dr A. N. Walker
external usenet poster
 
Posts: 96
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article . com,
help bot wrote:
[...]
Suppose Crafty_12 ply
penalizes either move as inferior to the other --
how does this style issue contribute to ranking
the world champs *accurately*?

... In which case this position is outside the [-2..2]
range, and is discarded by the G&B methodology, really for exactly
the reasons you gave. So Crafty12 would not penalise your move.

Once again, you have demonstrated a complete, utter
inability to read my comments *in context*.


Bit prickly aren't we? ...

Look back at my original post. I was (obviously) replying
to this comment by Ray Lopez: [...]


... Indeed you were. But then you asked a very specific
question about Crafty12 and the ranking of WCs, which can only
["in context"] relate to the work of Guid and Bratko that is the
initial topic of this thread. I gave you the answer: G&B did
consider the situation you describe, and took steps to ensure
that it did not bias their results.

As for the G&B methodology, it was never described
in any detail in any of the articles which I read by
following the links earlier in this thread. Clearly, if I had
wished to skewer their "methodology", I would probably
want to know what it was. But having already learned
that the reason for the sloppiness was a shortage of
time and a complete disregard for quality work, I have
no interest in further details regarding the authors'
methodology.


Yet you are willing to "skewer" their methodology to the
extent of "sloppiness" and "complete disregard for quality work"?
Even though most, if not all, of the criticisms in this thread
are addressed by the authors in a peer-reviewed paper? And if
you have "no interest in further details", why did you ask about
them in relation to your above question [and then take umbrage at
my answer to it]?

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.

Ads
  #222  
Old May 16th 07, 02:50 PM posted to rec.games.chess.misc,rec.games.chess.computer
Dr A. N. Walker
external usenet poster
 
Posts: 96
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article .com,
Martin Brown wrote:
[...] So I'm guessing that when both sides think they
are winning [by something significant, not by 20cp or so], one
of them has overlooked something of tactical importance, which
is why, after a bit, it turns into a tactical win.

It started from the opening in this game. Shredder +0.90 Crafty -0.27
peaking at +1.40 vs -0.10 then converging a bit until the fateful
17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before
again diverging to 1.24, 3.27.


Ah. Perhaps I have a misunderstanding about what you did
or said? I understood that you had played some fast[ish] games
between Shredder and Crafty, and then passed on to us Shredder's
analysis [at much longer time limits] of the game? So that, for
example, the analysis would have been/looked the same even if the
game had been a GM encounter at classical time limits or you vs
me at 5-min chess, or any other source? So were the black scores
not *Shredder's* evaluations rather than *Crafty's*? Otherwise, I
find the exact agreements, eg at 1.22 and later at 2.58, for several
moves very suspicious. And if so, then this is not an example of
both *sides* thinking they were winning, but rather of *Shredder*
thinking both sides were winning? [As you mentioned later, there
are parity problems in many engines that cause this, esp in gambits,
but if Shredder is particularly prone to it, it doesn't help its
case to be a reliable annotator.]

It was pretty clear in this game that
Crafty simply did not know which way was up!


Absolutely. Computers seem to be prone to that sort of
game, though. Once they don't understand a position, they tend
to go *really* pear-shaped.

For the sake of balance in a best of 3 engine match at this time
penalty was 1 win 1 draw 1 loss for each engine.


Don Beal used to say that you need matches of 100+ games
to find which engine is better -- he had cases where one side
was losing 17-0 or thereabouts but hauled back to win [and this
in the days before "learning"].

BTW is there a way to get the graphical display of time taken and
engine score shown in Chessbase window or does the game have to be put
into the playing window to see that info?


Pass. I've never entered games with that info in the
first place.

Remember that is Crafty working at roughly the same search depth
setting as was being used to judge the play of world champoin chess
players. It may be a bit unfair to make it play the opening (where its
performance is very poor).


Ah. I assumed 8s/move wouldn't be enough to reach the
depth used by G&B [roughly 6h/game on 2.5GHz machines] ....

But that is still enough
to have some confidence in finding gross evaluation errors of 50cp or
more (which is what Crafty at 12 ply does).

Yes, but you still seem to be missing something. 100cp is a
pawn, and you can understand that very directly. 50cp is what? It
will matter if at some point we swap a 50cp advantage for a pawn-up
with 50cp compensation, but until then it's an arbitrary measure.

I was using that as an example.


Yes, but you still seem not to have understood! Look, suppose
some engine gives 1.23 as its evaluation. That means that somewhere
down the tree there is a position, reached by "best play" as far as
the current collection of static evaluations goes, which has a static
evaluation of 1.23. *That* evaluation is a sum of various factors --
+1.00 because we have an extra pawn, +0.17 because we control an open
file, +0.47 because of king safety, -0.13 because the opposing knight
is well-placed, +- this, that and the other, possibly including all
manner of complexity and joint factors, etc. Only the extra pawn is
"gold standard" currency. Everything else is there either because
BobH or some other programmer has decreed that an open file is worth
0.17 or because a "learning" program has currently settled on that
as the value. None of it is reliable [else we wouldn't need the tree
search at all], none of it seems to matter very much [or changing the
0.17 to 0.16 would dramatically change the strength of the program],
none of it relates very closely to how a GM would assess that position.

The only merit of this whole scheme is that pragmatically it
works. You may recall the Beal&Smith result that a completely random
static evaluation works surprisingly well. So when you say that
Shredder and Crafty have a discrepancy of [eg] 50cp in some position,
what you mean is that Shredder has looked at many millions of lines,
99.999+% of which are utter rubbish by *any* standards, picked on one
line as best for both sides, chosen the leaf position in that line,
pulled a number more-or-less out of a hat for that position; that
Crafty has done the same [but almost certainly chosen different lines
as best in most of the positions]; and that the two numbers differ
by 0.5. The miracle of computer chess is that quite often the numbers
agree within 0.2 or so. But there is no useful, objective, value in
those numbers. Indeed, we already know that an evaluation of 1.23 is
wrong by either 1.23 [if the position is actually drawn] or by worse
than that if the position is won/lost and the "machine infinity" for
won positions is greater than 2.46. Go figure.

Looking more carefully at that game
there were long sustained periods where Crafty was more than 100cp off
the mark and about 10 moves where it was more than 200cp out (and in
the middlegame). This doesn't bode well for its ability to score GM
level play.


What matters is not that [esp as basically the difference over
those moves was whether Crafty in those positions was totally and
utterly lost or merely utterly and totally lost, and the G&B scheme
would have stopped counting by then], but whether Crafty/Shredder
mis-assesses the correct move ordering, and if so by how much.

[...] If Crafty12 is so rotten, it's been
amazingly lucky.

I don't think it is that rotten. Just that it misses a lot of the rare
but absolutely key GM moves and marks them down because it doesn't
understand them. It probably gets the 95% of the routine moves exactly
right, but it is the handful of other moves that make all the
difference.


If I am reading the annotations correctly, then in the game
you gave, Shredder and Crafty each played 16 moves out of the 41 that
were sub-optimal according to Shredder. That doesn't seem a handful
of other moves to me, and it makes it seem unlikely that a GM would
have played nearly all the moves, routine or otherwise, the same way
as Crafty. [Of course, most of the 16+16 were "in the noise", but
it still suggests that with a 10cp noise level, there is a lot of
scope for Crafty/Shredder to make quite different moves from GMs.
Indeed, G&B's Fig.7 shows most WCs playing the same move as C12
about 50% of the time.]

According to the metrics line Crafty was at average search depth 12.2
and Shredder at 13.1 (but in 1/3 the time) during this test match.


OK, but this is not helping your thesis! Summarising, we
now have that C12 deviates by around 0.35 from Shredder, by between
0.10 and 0.15 from almost all WCs, and by between 0.06 and 0.09 from
recent strong computers in matches vs humans, despite an expected
0.1 or so error from random noise. Accepting that Shredder/Rybka/
etc are technically stronger than Crafty, this nevertheless suggests
that Crafty is doing very well at emulating these other players and
engines, and by inference at assessing how good they are.

Can we define a set of rules then [...]


Something that might be interesting. There are books out
there with titles like "How Good is Your Chess" [and regular
articles in a number of magazines] where strong players have
annotated games with point scores ["Score 7 for Nxe5, 3 for Bg5,
-3 if you blundered by Re1, 1 for routine development by Nc3,
0 for anything else"]. We could set engines doing these tasks
at various rates, see how they score, see how they rate the
alternatives, and perhaps -- if someone would sink some time
and/or money into this -- get some GMs to comment both on the
original scoring and on the computer results.

I still can't take centipawns seriously enough to want
to invest effort into tracking down 20cp discrepancies ....

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.

  #223  
Old May 17th 07, 07:22 AM posted to rec.games.chess.misc,rec.games.chess.computer
help bot
external usenet poster
 
Posts: 7,800
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 16, 6:59 am, (Dr A. N. Walker) wrote:

Once again, you have demonstrated a complete, utter
inability to read my comments *in context*.


Bit prickly aren't we? ...


Nah. I just wanted to make it clear that you have
*repeatedly* misinterpreted words I have written by
taking them out of their proper context. Generally,
this indicates a psychological "need" to distort in
order to accommodate one's peculiar agenda or
to evade valid criticisms and divert the discussion
to some army of straw men.


Look back at my original post. I was (obviously) replying
to this comment by Ray Lopez: [...]


... Indeed you were. But then you asked a very specific
question about Crafty12 and the ranking of WCs, which can only
["in context"] relate to the work of Guid and Bratko that is the
initial topic of this thread.


Not in my opinion it isn't. When I entered this thread,
it consisted of a couple of links to articles which I later
downloaded and read in full. Now, what you are
suggesting is that those articles are not the subject of
this thread, but the original paper upon which these
"summaries" were based was, and that just ain't so.
You might just as well argue that the thread is about
my game at RedHotPawn -- because it was discussed
at some later point. Look back at the links.

So, why don't I go and find that paper and read it in
full like the summaries? Simple: it has already been
shown that a myriad of excuses substitute for any real
desire for *quality* work; hence the choice of a 12 ply
Crafty; and hence the moronic sample size in certain
cases like, say, GM Fischer. In sum, it looks like a
waste of my time (see below).


I gave you the answer: G&B did
consider the situation you describe, and took steps to ensure
that it did not bias their results.


Bully for them. Now, if only they had taken similar
steps regarding adequate sample size, choosing a
chess program of adequate strength, and of course,
allocating sufficient time for such a task as attempting
to decide which of the world champions was the best,
the strongest, the greatest, or even the most accurate.

The fact that we are back to discussing the "G&B"
end of things once again shows that you have missed
the point of what I was actually writing about; it had to
do with positional moves and tactical moves allegedly
being "one and the same thing", you should recall.



As for the G&B methodology, it was never described
in any detail in any of the articles which I read by
following the links earlier in this thread. Clearly, if I had
wished to skewer their "methodology", I would probably
want to know what it was. But having already learned
that the reason for the sloppiness was a shortage of
time and a complete disregard for quality work, I have
no interest in further details regarding the authors'
methodology.


Yet you are willing to "skewer" their methodology to the
extent of "sloppiness" and "complete disregard for quality work"?


Yes, I am. (As far as I can see, any bum off the
street could read their paper, copy their methods,
and by simply setting Crafty to *13 plys*, best their
results in terms of quality).


Even though most, if not all, of the criticisms in this thread
are addressed by the authors in a peer-reviewed paper?


Are their "peers" up to our standards here, I wonder?
I mean, do they know squat about chess? Do they
have the slightest inkling as to what it would take in
order to *accurately* rank the world chess champions
relative to one another? I seriously doubt it.

I recall reading in one of the links a long list of
criticisms, about half of which were unanswerable,
unless you count the list of pathetic excuses given
by a few apologists. I must admit, some of these
had not occurred to me, but they were far from
comprehensive in scope. In any case, I had more
than enough of my own criticisms.


And if you have "no interest in further details", why did you ask about
them in relation to your above question [and then take umbrage at
my answer to it]?


You're not making any logical sense here; I asked
nothing about their methodology; on the contrary, I
already think I have seen enough excuses regarding
them that I can reasonably conclude that the authors
made no serious attempt to accurately rank the world
champions.

For instance, a *serious* attempt might start off by
determining the proper sample size for such a project;
this step was obviously skipped (or worse, bungled).

Secondly, it is important in order to be fair, to not
take any single match against any single opponent, and
try to compare against someone else's results where
they both won and lost, plus faced a variety of
opposition. For instance, you can't fairly compare
Fischer-beating-Spassky to Botvinnik-vs.-all-comers,
because (gasp!) GM Fischer may have been more
accurate in that single match (since he won) than *any*
world champ was in any series of a win plus a loss.
If you do this, you are (quite absurdly) rewarding those
who, instead of letting nature take its course, bow out
order to protect their record from acquiring any tarnish
over time.

Third, if indeed, there is a time issue resulting from
the large number of games, one could arbitrarily chop
GM Steinitz out of the running. How dare I suggest
such a thing? Look at Dr. Elo's rating lists; while GM
Steinitz was a giant figure in the history of chess, his
strength was clearly superseded by others, and if the
goal is to try and measure strength, accuracy, or any
other such aspect of the play, then we can safely rule
him out as the winner; already, such players as Paul
Morphy were excluded, so why not just one more?

Rather than worry ourselves about whether or not
others are going to whine that they cannot duplicate
our exact results, the first order of business should
be to get *meaningful* results ourselves.

For my money, I'll take the strongest chess program
in the world and if necessary, start off by eliminating
GM Steinitz and his predecessors to save time; then
I want each contender to have roughly the same
number of games in the test -- preferably a large
enough sample so that no single game will have much
of an effect on the final outcome. As others have
suggested, it is best to have each match scored
individually, so we can learn where the champions
were at their best and at their worst.

Even so, I am not entirely comfortable with the idea
that even a program rated 2900+ can *accurately* rank
the play of the world champions to the degree necessary.
I would feel more comfortable if the program had a
sizable lead over even the strongest of them, and if
it were known that this lead was not entirely due to its
Titanic *tactical superiority* over all humans.

The thing to remember is this: the match games of
the world champions are slowly increasing in number;
but at the same time, computers are gaining in both
speed and strength at a more rapid pace. No hurry --
do it right. For the sake of maximizing human interest
in the project, you could start off with GMs Fischer and
Tal and report the results as they come in. I like the
idea, but this is not something one can just "whip off",
like a Greco sac.

-- help bot





  #224  
Old May 17th 07, 12:00 PM posted to rec.games.chess.misc,rec.games.chess.computer
Martin Brown
external usenet poster
 
Posts: 616
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 16, 2:50 pm, (Dr A. N. Walker) wrote:
In article .com,
Martin Brown wrote:

[...] So I'm guessing that when both sides think they
are winning [by something significant, not by 20cp or so], one
of them has overlooked something of tactical importance,


It started from the opening in this game. Shredder +0.90 Crafty -0.27
peaking at +1.40 vs -0.10 then converging a bit until the fateful
17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before
again diverging to 1.24, 3.27.


Ah. Perhaps I have a misunderstanding about what you did
or said? I understood that you had played some fast[ish] games
between Shredder and Crafty, and then passed on to us Shredder's
analysis [at much longer time limits] of the game? So that, for


Yes. But the fastish game was calculated roughly to approximate an
average 12 ply lookup for Crafty and a 12ply lookup for Shredder (I
miscalculated the time penalty and Shredder got to 13+ ply at times in
this game).

example, the analysis would have been/looked the same even if the
game had been a GM encounter at classical time limits or you vs
me at 5-min chess, or any other source? So were the black scores
not *Shredder's* evaluations rather than *Crafty's*? Otherwise, I
find the exact agreements, eg at 1.22 and later at 2.58, for several
moves very suspicious. And if so, then this is not an example of


No. That is a quirk of how annotating a game by working back from the
end position it can utilise deep cache move evaluations from the
transposition table provided that that game follows a reasonably
strong well explored line according to the engines own evaluation.
Same happens in GM games since the cache often contains useful info.

both *sides* thinking they were winning, but rather of *Shredder*
thinking both sides were winning?


No. Shredder scores it as a consistent win for white. There are some
important differences between the fast score and the deeper analysis
though. Here is the original game with the engines scores and thinking
time.

[Event "AOI, Blitz:4'+2""]
[Site "East Rounton"]
[Date "2007.05.14"]
[Round "1"]
[White "Shredder 10"]
[Black "Crafty 19.01"]
[Result "1-0"]
[ECO "D25"]
[WhiteElo "9999"]
[BlackElo "9999"]
[Annotator "0.30;0.36"]
[PlyCount "83"]
[TimeControl "240+2"]

{Intel(R) Pentium(R) 4 CPU 3.00GHz 2992 MHz W=13.1 ply; 354kN/s
B=12.2 ply;
835kN/s; 2 TBAs} 1. d4 {Both last book move 0.30/16 12} Nf6 {0.36/12
27} 2. Nf3
{(Bf4) 0.27/15 11} d5 {0.32/12 26} 3. c4 {(e3) 0.27/16 11} dxc4 {
(e6) -0.26/12 25} 4. e3 {(Nc3) 0.34/14 20} b5 {(Bf5) -0.21/12 24} 5.
a4 {
0.79/14 13} c6 {-0.27/12 24} 6. axb5 {(Be2) 0.90/13 5} cxb5 {-0.11/13
23} 7.
Nc3 {0.72/14 17} Qb6 {(Bd7) -0.26/12 23} 8. b3 {(Ne5) 1.01/13 12} e6 {
(b4) -0.15/11 22} 9. bxc4 {0.81/13 15} b4 {(Bb4) -0.16/12 22} 10. c5 {
(Qa4+) 1.04/13 14} Qb7 {-0.22/11 21} 11. Rb1 {1.00/12 8} Nc6 {-0.06/11
21} 12.
e4 {(Bc4) 1.14/12 13} a6 {(Be7) -0.14/9 20} 13. Bc4 {(Bf4) 1.40/13 22}
Qc7 {
-0.09/10 19} 14. Ne2 {(e5) 1.18/12 7} Nxe4 {-0.50/10 21} 15. O-O
{0.98/11 5} f5
{(Bb7) -0.43/10 19} 16. Bf4 {0.67/11 11} Qd7 {0.40/11 24} 17. Bb3 {
(Ne5) 1.02/10 6} Be7 {(Qb7) 0.60/11 18} 18. Ba4 {2.77/12 5} Bf6
{2.43/11 21}
19. Rxb4 {2.35/13 6} Nxb4 {2.61/12 17} 20. Ne5 {2.33/13 7} Bxe5
{2.35/14 17}
21. Bxd7+ {2.38/14 4} Bxd7 {2.22/14 5} 22. Bxe5 {2.38/15 6} O-O
{2.21/13 16}
23. f3 {2.35/14 3} Nf6 {2.43/13 16} 24. Nf4 {(Qd2) 2.35/14 5} a5
{2.15/11 15}
25. Qb3 {2.48/13 5} Ra6 {(Rfe8) 2.16/11 15} 26. Kf2 {(Re1) 2.67/11 7}
Nfd5 {
(Kf7) 1.43/12 15} 27. Nxd5 {2.96/11 1} exd5 {1.68/12 14} 28. Ke3 {
(Rc1) 2.86/12 3} Rg6 {(Bb5) 1.09/11 14} 29. Rg1 {(g3) 2.94/11 3} Bb5 {
1.11/11 14} 30. g4 {(Kd2) 2.81/11 4} Re8 {(f4+) 1.16/11 14} 31. h3 {
(Kd2) 2.99/11 4} Bc4 {(fxg4) 1.24/10 13} 32. Qa4 {3.27/12 2} Nc6
{1.82/12 15}
33. gxf5 {3.60/12 2} Rxg1 {2.24/13 13} 34. Qxc6 {4.44/11 2} Re1+ {
(Kf7) 2.80/12 13} 35. Kf4 {5.16/12 3} Kf7 {3.25/13 12} 36. Qc7+ {(f6)
5.65/13 2
} Re7 {4.90/14 37} 37. Qxa5 {5.93/13 2} Rc1 {(Rg1) 4.95/12 12} 38. Qd8
{
7.15/11 2} Re8 {(Bb5) 6.19/11 22} 39. Qg5 {9.31/12 2} Rxe5 {7.01/13
10} 40.
dxe5 {10.19/12 3} d4 {7.94/12 37} 41. Qd8 {11.44/11 3} h6 {9.39/11 15}
42. Qd7+
{11.41/10 2} 1-0

It was pretty clear in this game that
Crafty simply did not know which way was up!


Absolutely. Computers seem to be prone to that sort of
game, though. Once they don't understand a position, they tend
to go *really* pear-shaped.


Crafty allowed itself to get stiffed with a double threat knight
skewered against queen, king and losing a pawn. At move 17 it should
have seen that getting the queen off d7 was a priority rather than
just developing the bishop. Shredder was expecting Qb7 (which is
probably weaker than Qa7 or Qe7). Then it pounced.

For the sake of balance in a best of 3 engine match at this time
penalty was 1 win 1 draw 1 loss for each engine.


Don Beal used to say that you need matches of 100+ games
to find which engine is better -- he had cases where one side
was losing 17-0 or thereabouts but hauled back to win [and this
in the days before "learning"].


I may give it a try overnight to see.

BTW is there a way to get the graphical display of time taken and
engine score shown in Chessbase window or does the game have to be put
into the playing window to see that info?


Pass. I've never entered games with that info in the first place.


I don't enter the info so much as allow blundercheck 20s to run on all
my games. And for most of the GM games that I decide look interesting
too - always fun to compare old human annotation in books to modern
engines.

Remember that is Crafty working at roughly the same search depth
setting as was being used to judge the play of world champoin chess
players. It may be a bit unfair to make it play the opening (where its
performance is very poor).


Ah. I assumed 8s/move wouldn't be enough to reach the
depth used by G&B [roughly 6h/game on 2.5GHz machines] ....


I don't understand why they got quite such bad performance, but I
suspect the full width fixed ply search with the cache cleared between
moves for reproducibility probably played a part. Complex middlegames
slow down a lot on Crafty. In this game most times the engines played
the move that the other engine had expected fairly often - I watched
the first game in realtime (Crafty about 20s/move for 40 moves).

Yes, but you still seem not to have understood! Look, suppose
some engine gives 1.23 as its evaluation. That means that somewhere
down the tree there is a position, reached by "best play" as far as
the current collection of static evaluations goes, which has a static
evaluation of 1.23. *That* evaluation is a sum of various factors --
+1.00 because we have an extra pawn, +0.17 because we control an open
file, +0.47 because of king safety, -0.13 because the opposing knight
is well-placed, +- this, that and the other, possibly including all
manner of complexity and joint factors, etc. Only the extra pawn is
"gold standard" currency. Everything else is there either because
BobH or some other programmer has decreed that an open file is worth
0.17 or because a "learning" program has currently settled on that
as the value. None of it is reliable [else we wouldn't need the tree


I agree entirely so far. But what is interesting here is that the
current generation of top programs appear to have tuned the evaluation
function weights for self consistency to maximise the efficiency of
alpha-beta cutoffs in the tree.

none of it relates very closely to how a GM would assess that position.


Indeed, Although to me Shredder feels closer to human assessment than
other engines.
(and it still struggles with endgame transitions like all of them)

by 0.5. The miracle of computer chess is that quite often the numbers
agree within 0.2 or so. But there is no useful, objective, value in


In a lot of cases they agree about the best line though even if they
score it differently.

Looking more carefully at that game
there were long sustained periods where Crafty was more than 100cp off
the mark and about 10 moves where it was more than 200cp out (and in
the middlegame). This doesn't bode well for its ability to score GM
level play.


What matters is not that [esp as basically the difference over
those moves was whether Crafty in those positions was totally and
utterly lost or merely utterly and totally lost, and the G&B scheme
would have stopped counting by then], but whether Crafty/Shredder
mis-assesses the correct move ordering, and if so by how much.


The best way I can think of to test this is on the key positions where
things went awry. The majority of positions where they pretty much
agree on the continuation line don't provide any discrimination.

[...] If Crafty12 is so rotten, it's been
amazingly lucky.

I don't think it is that rotten. Just that it misses a lot of the rare
but absolutely key GM moves and marks them down because it doesn't
understand them. It probably gets the 95% of the routine moves exactly
right, but it is the handful of other moves that make all the
difference.


If I am reading the annotations correctly, then in the game
you gave, Shredder and Crafty each played 16 moves out of the 41 that
were sub-optimal according to Shredder. That doesn't seem a handful
of other moves to me, and it makes it seem unlikely that a GM would
have played nearly all the moves, routine or otherwise, the same way
as Crafty. [Of course, most of the 16+16 were "in the noise", but
it still suggests that with a 10cp noise level, there is a lot of
scope for Crafty/Shredder to make quite different moves from GMs.
Indeed, G&B's Fig.7 shows most WCs playing the same move as C12
about 50% of the time.]


I think that may not be as remarkable as it sounds. And in essence it
highlights one of the problems of having Crafty (or Fritz for that
matter) scoring GM level games. It will automatically penalise anyone
who raises or maintains the complexity of the postion by keeping the
tension and does not swap off material when it is safe to do so. I
reckon that is why it scores Capablanca and Kramnik so highly - take a
look at fig 8.

According to the metrics line Crafty was at average search depth 12.2
and Shredder at 13.1 (but in 1/3 the time) during this test match.


OK, but this is not helping your thesis! Summarising, we
now have that C12 deviates by around 0.35 from Shredder, by between
0.10 and 0.15 from almost all WCs, and by between 0.06 and 0.09 from
recent strong computers in matches vs humans, despite an expected
0.1 or so error from random noise. Accepting that Shredder/Rybka/
etc are technically stronger than Crafty, this nevertheless suggests
that Crafty is doing very well at emulating these other players and
engines, and by inference at assessing how good they are.


I agree that the numbers do not seem to add up.

Can we define a set of rules then [...]


Something that might be interesting. There are books out
there with titles like "How Good is Your Chess" [and regular
articles in a number of magazines] where strong players have
annotated games with point scores ["Score 7 for Nxe5, 3 for Bg5,
-3 if you blundered by Re1, 1 for routine development by Nc3,
0 for anything else"]. We could set engines doing these tasks
at various rates, see how they score, see how they rate the
alternatives, and perhaps -- if someone would sink some time
and/or money into this -- get some GMs to comment both on the
original scoring and on the computer results.


OK. This sounds like an amusing idea. And not too onerous.
How about GM Daniel Kings HGIYC piece from May's Chess magazine?
I expect that the intricacies of the Ragozin Defence will give some
engines a very serious headache.
with fixed Ply 1, Ply 12 and 60s/move searches as the test conditions?

I still can't take centipawns seriously enough to want
to invest effort into tracking down 20cp discrepancies ....


Neither can I. But I am curious to identify the types of position
where choice of the right engine (or other program) is important for
analysing the position correctly. Engines can have very different
playing styles.

Regards,
Martin Brown

PS Goofgle dropped it on the floor again so perhaps I will be third
time lucky.

  #225  
Old May 17th 07, 06:17 PM posted to rec.games.chess.misc,rec.games.chess.computer
Dr A. N. Walker
external usenet poster
 
Posts: 96
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article om,
help bot wrote:
Once again, you have demonstrated a complete, utter
inability to read my comments *in context*.

Bit prickly aren't we? ...

Nah. I just wanted to make it clear that you have
*repeatedly* misinterpreted words I have written by
taking them out of their proper context. [...]


I think I'd prefer others to judge this rather than enter
into a "did", "didn't", "'tis", "'tisn't" slanging match; shall
we just agree to differ?

Look back at my original post. I was (obviously) replying
to this comment by Ray Lopez: [...]

... Indeed you were. But then you asked a very specific
question about Crafty12 and the ranking of WCs, which can only
["in context"] relate to the work of Guid and Bratko that is the
initial topic of this thread.

Not in my opinion it isn't. When I entered this thread,
it consisted of a couple of links to articles which I later
downloaded and read in full.


The very first "word" in this thread was one of those
links -- to the Chessbase article about *the work of Guid and
Bratko*; and the thread title clearly relates to that work.
Has anyone mentioned Crafty12 *other than* in relation to it?

Now, what you are
suggesting is that those articles are not the subject of
this thread, but the original paper upon which these
"summaries" were based was, and that just ain't so.


I made no such suggestion. But you asked a question
which was not answered in the Chessbase articles, but was in
the original paper. Did you want to know the answer or not?

You might just as well argue that the thread is about
my game at RedHotPawn -- because it was discussed
at some later point. Look back at the links.


If I asked a question in this thread about RHP, then
you might quite reasonably think that it was sparked by your
mentions in this thread. But whereas RHP has been a minor
part of this thread, the work of G&B, and more specifically
the use of Crafty12 in that work, has been very prominent.

So, why don't I go and find that paper and read it in
full like the summaries? Simple: it has already been
shown that a myriad of excuses substitute for any real
desire for *quality* work; hence the choice of a 12 ply
Crafty; and hence the moronic sample size in certain
cases like, say, GM Fischer. In sum, it looks like a
waste of my time (see below).


From a reading merely of the two CB articles, I would
very likely agree with you [apart from the emotive words]. But
the full article presents at least a somewhat different picture.
That doesn't mean that you should read it; life is too short
to read everything that might possibly be of interest, and the
paper is not that marvellous. But nor is it total rubbish,
and at least those who intend to use words like "moronic" in
relation to it perhaps ought to critique what they actually
did rather than the "red top" version of it.

[...]
The fact that we are back to discussing the "G&B"
end of things once again shows that you have missed
the point of what I was actually writing about; it had to
do with positional moves and tactical moves allegedly
being "one and the same thing", you should recall.


I recall that perfectly well. But having written about
that, you then asked a question about Crafty12 and the ranking
of WCs.

Yet you are willing to "skewer" their methodology to the
extent of "sloppiness" and "complete disregard for quality work"?

Yes, I am. (As far as I can see, any bum off the
street could read their paper, copy their methods,
and by simply setting Crafty to *13 plys*, best their
results in terms of quality).


So? That [mutatis mutandis] applies to a very large
number, perhaps the majority, of scientific papers. We all
have to take decisions about how much computer time or other
resource it is worth pouring in to some experiment. Crafty13
would have occupied their roomful of computers for several
months, and would *probably* not have shown anything new.
If you, or anyone else, think that Rybka or some other engine
[inc Crafty13] would show different results, then you have
enough information to "copy their methods" and "best their
results". Go ahead. My expectation is that you will get the
same results, to good approximation. If so, then you will
have confirmed to each other than the methodology is doing
something objective, even if not what G&B claim. If not,
then you can publish a paper [or at least a letter in ICGAJ]
showing that G&B are wrong, and gain credit for it.

Even though most, if not all, of the criticisms in this thread
are addressed by the authors in a peer-reviewed paper?

Are their "peers" up to our standards here, I wonder?


It's a bit of a stretch to assume that they are not.
ICGAJ may not be "Nature", but it's the leading journal for
computer game theory, and some pretty bright people write
and review for it.

And if you have "no interest in further details", why did you ask about
them in relation to your above question [and then take umbrage at
my answer to it]?

You're not making any logical sense here; I asked
nothing about their methodology;


Then you need to explain what your question *was* about.
Are you *really* interested in Crafty12 and the ranking of WCs
for any reason *other than* to discuss G&B's work?

[...]
For my money, I'll take the strongest chess program
in the world and if necessary, start off by eliminating
GM Steinitz and his predecessors to save time; [...]


No-one is preventing you.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.

  #226  
Old May 17th 07, 09:27 PM posted to rec.games.chess.misc,rec.games.chess.computer
David Kane
external usenet poster
 
Posts: 1,099
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)


"Dr A. N. Walker" wrote in message
...
In article om,
help bot wrote:


Yes, I am. (As far as I can see, any bum off the
street could read their paper, copy their methods,
and by simply setting Crafty to *13 plys*, best their
results in terms of quality).


So? That [mutatis mutandis] applies to a very large
number, perhaps the majority, of scientific papers. We all
have to take decisions about how much computer time or other
resource it is worth pouring in to some experiment. Crafty13
would have occupied their roomful of computers for several
months, and would *probably* not have shown anything new.
If you, or anyone else, think that Rybka or some other engine
[inc Crafty13] would show different results, then you have
enough information to "copy their methods" and "best their
results". Go ahead. My expectation is that you will get the
same results, to good approximation. If so, then you will
have confirmed to each other than the methodology is doing
something objective, even if not what G&B claim. If not,
then you can publish a paper [or at least a letter in ICGAJ]
showing that G&B are wrong, and gain credit for it.


If you are proposing a methodology (ranking players
according to move analysis), you can't simply
pull an algorithm out of thin air and pretend that it means
something. The burden is on the authors to *show* that
it is meaningful. *They* should have done (at least partial)
analyses at much deeper ply, or on weaker players (if
computational time was severely limited), if they want
their method to have any credibility. They should
also have looked for the correspondence with this
ranking method and alternate ranking methods (e.g. ELO)
especially in those cases where the alternate method
has a high degree of credibility (contemporary players
playing actively in a pool)

The two most basic questions anyone should have
upon reading this work are 1. How many moves do
you need to analzye? 2. How deeply do you need
to analyze them? Neither are addressed by the paper
in any meaningful way. There is no way that that
can be characterized as anything other than a serious
defect. The excuse that it might have been hard to
address (which I don't believe, by the way) is
no excuse at all.


  #227  
Old May 18th 07, 08:41 AM posted to rec.games.chess.misc,rec.games.chess.computer
help bot
external usenet poster
 
Posts: 7,800
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 17, 4:27 pm, "David Kane" wrote:

If you are proposing a methodology (ranking players
according to move analysis), you can't simply
pull an algorithm out of thin air and pretend that it means
something. The burden is on the authors to *show* that
it is meaningful. *They* should have done (at least partial)
analyses at much deeper ply, or on weaker players (if
computational time was severely limited), if they want
their method to have any credibility. They should
also have looked for the correspondence with this
ranking method and alternate ranking methods (e.g. ELO)
especially in those cases where the alternate method
has a high degree of credibility (contemporary players
playing actively in a pool)

The two most basic questions anyone should have
upon reading this work are 1. How many moves do
you need to analzye? 2. How deeply do you need
to analyze them? Neither are addressed by the paper
in any meaningful way. There is no way that that
can be characterized as anything other than a serious
defect. The excuse that it might have been hard to
address (which I don't believe, by the way) is
no excuse at all.


In the articles at the links I discussed earlier, the
authors said little or nothing about what constitues an
adequate sample size. It seems to be a bit unfair to try
and compare, head to head, the results of GM Fischer
in a single, won match (he didn't win every match, you
know) with, say, the varied results of someone like GM
Steinitz, who kept taking on all comers until he *finally*
found one he couldn't beat. In any case, my idea is
that closely matching what Crafty_12_plys thinks are
the optimal moves is no guarantee of quality results.
I would be far more comfortable with closely matching
a program whose own rating is markedly *superior* to
the humans it is trying to rank. Also, the idea of a
fixed ply depth is somewhat annoying, unless that
number is around 20+. Believe it or not, a few of my
games have seen me calculate (or plan) far beyond
only 12 plys, and I fully expect the world champions
to be capable of seeing almost as far. ;D

---

In a recent thread (consisting of just one posting), a
game between GMs Fischer (as White) and Spassky
was linked to. In that game, BF started out well,
gaining a Maroczy bind _style_ of position, but it soon
became apparent that he was not able to figure out
any active plan, despite a nice space advantage and
the apparent bind. GM Spassky soon broke free
from his cramped position, but at the cost of a pawn
which the American eagerly gobbled. Nevertheless,
GM Spassky was able to intrude into White's half of
the board with Queen and, ultimately, both Rooks,
and it looked like a draw by repetition was in the
cards, the only question being who would be on the
receiving end of a perpetual check. In the end,
however, GM Spassky unwisely traded off one of
his three attackers, and then let GM Fischer's
pawns get down the board. Stopping these pawns
got him into a temporary bind, and from there into
a (just barely) lost Rook and pawn ending. To me,
it looked like a bit of luck; especially in comparison
to games I have seen which were won by superior
strategy, not "shaking the tree" until something
pops loose, and the more so since at times, it
looked like GM Fischer was on the run.

I wonder how a long, close game such as this
would end up scoring by a chess engine. I mean,
say that GM Fischer's intention was to *wait* until
the inevitable ...b5, and then be in good position to
commence fighting. Or say that GM Spassky's
real problem was that his opponent was already
winning the match, and he desperately needed to
claw his way back into it by winning as Black. No
chess engine would take any of this into account in
scoring the moves, so what we are attempting is
merely to estimate the accuracy or optimality of
the moves played, while the players were engaged
in a different sort of contest altogether; one where
optimality was not the issue; winning was.

Yet another annoying issue is the player who
habitually gets into time-pressure situations, where
he (and in many cases, also his opponent) will be
forced to whip off several quick moves in order to
make time control. Such players would likely get
penalized for this style of time (mis-)management.
Does this mean they aren't great chess players?
How many small "errors" equate to one large one?
And what if they are so small that the opponent
doesn't even notice?

I know of at least one game where two top GMs
quickly played through an opening line but one of
them got his move order mixed up, falling into a
fatal trap; even so, his opponent never noticed,
and just made his own reply by rote. Because of
who they were, the commentators just assumed
the opening moves were A-okay, but one of the
spectators knew better and wrote up an article on
the event, pinpointing the double-blunder. How
does this score? Who decides the penalty, and
is it "adjusted" if the players in question are among
the favorites or the most despised?

Can every conceivable possibility be considered
and entered into the equation beforehand, so there
will be no "tweaking" which might allow human bias
to rear its ugly head? I seriously doubt it. In fact,
one of the articles I read went in with the loaded
question: is Gary Kasparov the greatest player of
all time? One can hardly expect any sort of
objectivity with an approach like that.

-- help bot



  #228  
Old May 18th 07, 10:14 AM posted to rec.games.chess.misc,rec.games.chess.computer
raylopez99
external usenet poster
 
Posts: 290
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 17, 1:27 pm, "David Kane" wrote:


If you are proposing a methodology (ranking players
according to move analysis), you can't simply
pull an algorithm out of thin air and pretend that it means
something. The burden is on the authors to *show* that
it is meaningful.


They have you idiot. It was a peer reviewed paper.

*They* should have done (at least partial)
analyses at much deeper ply, or on weaker players (if
computational time was severely limited), if they want
their method to have any credibility.


You don't understand 'normalization' do you, dimwit? Read all 220+
replies, especially mine and Dr. Walker's, and commit to memory. Then
and only then post here again. I see you flunked out of school, or
should have.

They should
also have looked for the correspondence with this
ranking method and alternate ranking methods (e.g. ELO)
especially in those cases where the alternate method
has a high degree of credibility (contemporary players
playing actively in a pool)


Means nothng. And the list presented does correlate very well with
ELO. Jeff Sonas' work found that Capa was #1 using ELO, and Kramnik
beat Kasparov and has a high Elo. Not the brightest bulb in the room,
are ya?


The two most basic questions anyone should have
upon reading this work are 1. How many moves do
you need to analzye? 2. How deeply do you need
to analyze them?


Shiite for brains: 1/ one move is sufficient, but logic tells you
more moves will give greater and finer "granularity". So with only
one move lookahead you could only rank "patzers" (like you) from "non-
patzers". With Crafty's 6+ move event horizon, you can get very good
granularity. Perhaps not as good as Rybka's but very good. And,
again, the principle of normalization says you do NOT need to look
further ahead than the best players. Why am I wasting my time with
you? Your own social worker says you're hopeless.


The excuse that it might have been hard to
address (which I don't believe, by the way) is
no excuse at all.-


You are an excuse. Quit wasting Dr. Andy's time. He is badgered as
it is by the idiot Help Bot, and now you have to chime in.

This is my very last post here. Sorry I even started this thread with
the retarded hoi polloi of this forum.

RL


  #229  
Old May 18th 07, 05:07 PM posted to rec.games.chess.misc,rec.games.chess.computer
David Kane
external usenet poster
 
Posts: 1,099
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)


"raylopez99" wrote in message
oups.com...

This is my very last post here. Sorry I even started this thread with
the retarded hoi polloi of this forum.


The thread is actually an interesting one (as is the
original work, though flawed) and has contained
a number of interesting posts. Unfortunately those
haven't originated from you - you simply lack the
brain power to understand the criticisms.


  #230  
Old May 19th 07, 05:36 AM posted to rec.games.chess.misc,rec.games.chess.computer
help bot
external usenet poster
 
Posts: 7,800
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 18, 5:14 am, raylopez99 wrote:
On May 17, 1:27 pm, "David Kane" wrote:


If you are proposing a methodology (ranking players
according to move analysis), you can't simply
pull an algorithm out of thin air and pretend that it means
something. The burden is on the authors to *show* that
it is meaningful.


They have you idiot. It was a peer reviewed paper.



Uh oh. It looks like someone forgot to learn the
difference between ad hominem and reason. (Maybe
they will let RL back into school despite this glaring
mental handicap?)


*They* should have done (at least partial)
analyses at much deeper ply, or on weaker players (if
computational time was severely limited), if they want
their method to have any credibility.


You don't understand 'normalization' do you, dimwit? Read all 220+
replies, especially mine and Dr. Walker's, and commit to memory. Then
and only then post here again.


Wow. In addition to lacking reasoning skills, this
poor sap now thinks he is "in charge". LOL!


I see you flunked out of school, or should have.


Well, at least Fishead knows about the *existence* of
schools -- that's a start.


They should
also have looked for the correspondence with this
ranking method and alternate ranking methods (e.g. ELO)
especially in those cases where the alternate method
has a high degree of credibility (contemporary players
playing actively in a pool)


Means nothng. And the list presented does correlate very well with
ELO.


Nonsense. Every posting I have read here claims the
same thing: that they ranked GM Capablanca above such
players as GMs Lasker, Fischer, Kasparov, etc. -- all of
whom were higher, not lower rated. It looks like a flaw
from that particular angle.


Jeff Sonas' work found that Capa was #1 using ELO, and Kramnik
beat Kasparov and has a high Elo. Not the brightest bulb in the room,
are ya?


Look at the official ratings, Fishead; if anyone should
have stood out, it was GMs like Lasker, Fischer, and
Kasparov -- NOT GMs Capablanca and Kramnik.
Those two may stand apart because of a stylistic
issue, but not in terms of results, which are what
chess ratings are based on.

I think it is fairly obvious that players like GM Tal for
instance, who won by playing suboptimal moves, are
getting penalized for not closely approximating the
computeresque style of play. This reveals a deeper
issue he why rank the world champs on anything
other than the "game" they were playing, which was
trying to win, not to play "perfectly"?


The two most basic questions anyone should have
upon reading this work are 1. How many moves do
you need to analzye? 2. How deeply do you need
to analyze them?


Shiite for brains: 1/ one move is sufficient,


Uh oh. Apparently there must be a missing cap,
'cause it looks to me like most of his brains have
somehow "leaked out". Either that, or he was
shorted at the fish factory that made him.


but logic tells you
more moves will give greater and finer "granularity".


Hmm. Quite an improvement here. Maybe it's just
an intermittent mental short circuit?


So with only
one move lookahead you could only rank "patzers" (like you) from "non-
patzers".


I take it Fishead is assuming the program has the
ability to do check-and-capture extensions, on top
of the numbers he is actually discussing; it might be
helpful if he were to make this point clearer.


With Crafty's 6+ move event horizon, you can get very good
granularity.


Just how good is "very good", though? We need it
to be good enough to *accurately* rank the world
champions, and that is a tall order. In fact, were a
human to try this, his results would be summarily
dismissed as mere opinion -- at least by those who
objected to the results.


Perhaps not as good as Rybka's but very good.


Oh, I like that. Did you see the way he struggled
to keep a straight face while pretending not to know
for certain that Rybka was better-equipped for this
sort of thing than Crafty_12_plys? Very nice.

Oh, but he seems to have overlooked the central
idea: that no *evidence* was presented to show
that Crafty_12_plys was good enough for the job.
Damned lawyers. Always asking for stuff that
doesn't even exist!


And,
again, the principle of normalization says you do NOT need to look
further ahead than the best players.


Nobody had mentioned that straw man position, until
just NOW.


Why am I wasting my time with you?


Because you have nothing constructive to do?

I think going back to school would be your best
try; after all, until you learn to think more clearly,
you're not going to get very far in life.


Your own social worker says you're hopeless.


Things could be worse; you might have told us
about your probation officer or your prison's warden.
Now we can relax, knowing that you are in the
hands of a professional, and getting th help you
so desperately need. Say hello to Skippy for us.


The excuse that it might have been hard to
address (which I don't believe, by the way) is
no excuse at all.-


You are an excuse. Quit wasting Dr. Andy's time.


Hey -- maybe you could get a job protecting those
posters who are unable to defend themselves by
unleashing the floodgates of ad hominem? It's
right up your alley, and you already have internet
access. Just a thought... .



He is badgered as
it is by the idiot Help Bot,


Wrong. An idiot is someone with an IQ in a range
far beyond my mental prowess; this just goes to
show that you rant and rave without first getting the
relevant facts (which, of course, we already knew).


and now you have to chime in.


Again. He posted here before, but it must have
slipped out of your mind. Try retracing your steps;
the missing cap could be anywhere. If only your
brain were more solid, and not quite so watery.


This is my very last post here.


Nobody believes you, any more than they did
Sanny. The reason is obvious: you keep making
pie-in-the-sky promises, but never deliver the goods.

In fact, I believe the odds-makers in Vegas actually
*increase* the odds of follow-up post each time you
make another such promise as this. Have you
considered a job as a petty politician?


Sorry I even started this thread with
the retarded hoi polloi of this forum.


Take some consolation in the fact that you are not
alone; the other fish (including koi and pollack) feel
your pain. Plus, a few of them may have standards
similar to your own (catfish, other bottom-dwellers).

-- help bot




 




Thread Tools
Display Modes
Linear Mode Linear Mode
Hybrid Mode