A Chess forum. ChessBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » ChessBanter forum » Chess Newsgroups » rec.games.chess.misc (Chess General)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Tags: , , , , , , , , , ,

Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)



 
 
Thread Tools Display Modes
  #211  
Old May 14th 07, 07:37 PM posted to rec.games.chess.misc,rec.games.chess.computer
Dr A. N. Walker
external usenet poster
 
Posts: 96
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article . com,
Martin Brown wrote:
In positional terms, I trust my own judgement more than Fritz,
so I'm really using the computer only for blunder-checking. If

In that case it is certainly worth downloading and running something
like Fruit2.2.1 (evaluation free for 14days) as a kibitzer to see the
sort of things that you are missing. [...]


What sorts of things do you think I am missing that Fruit
[or any other strong engine] might show me? We are perhaps somewhat
at cross-purposes, in that I'm primarily interested [when entering
my own games] *only* in the stupid tactical things I've missed. If
Fritz doesn't show it in 10 seconds, then it wasn't that stupid after
all.

[...] Roughly Crafty19.19 takes 1-3mins to reach
12ply in this mode but in 60s Shredder10 typically reaches 15-16ply in
all but the most complex positions.


OK, but different engines mean different things by "12 ply".
[My own program would mean "anything from 6-ply upwards", as it has
variable depth-reduction, depending how "interesting" a move is, and
particularly boring moves count double; I know this is eccentric.]

[...]
The problem here is that Crafty is frequently out by more than 50cp on
key variations and has been in all the GM games I have fed it so far.


Hang about! The *true* value of any position is "won",
"drawn" or "lost", so "out by 50cp" is meaningless except as an
evaluation against some amorphous scale of "slight advantage",
"definite advantage", "surely a winning advantage" and so on. If
Crafty is getting the *material* wrong, that's probably serious,
but otherwise 50cp simply means that Crafty has a different scale
for positional edges. It's not, of itself, right or wrong *unless*
it causes Crafty to lose a drawn position or lose/draw a won one.
....

Admittedly the first two were engine showpieces but the second pair
were randomly chosen high level games. You can see it happen most
prominently in the longer game where it misses the crucial winning
line and mis scores a host of moves systematically wrong because it
doesn't understand what is going on.


... But does it miss the crucial winning line because it
has *tactical* shortcomings, or because it misunderstands how to
play positionally? "Missing a winning line" sounds more like the
former [or you might have said (eg) "misses the winning plan"].
How are you judging "systematically wrong"? Merely because a
strong engine gives different numbers, or because [eg] Crafty
gives the "wrong" number of centipawns to a positional feature?
There is no objective meaning to be attached to "White is 1.23
centipawns ahead" other than "Rybka/Fruit/Crafty gives this as
its evaluation".

[...] I do think that a fair
proportion of the "errors" that the G&B analysis says the GMs have
made are in reality just the rms error of Crafty's evaluation which is
something like 30cp multiplied by the number of times they do
something that it doesn't expect.


Possibly; and we won't know unless/until someone does the
experiment. But in that case, the actual figures for most WCs of
around 13cp/move, and less for the WCs who most of us would regard
as the most "accurate" in their positional judgement and tactical
awareness, are surprisingly low. Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.

It is time to turn the question around slightly. Can anyone find a GM
level game where Crafty at 12ply avoids missing important winning
lines and obtains reasonable blundercheck agreement to within say 20cp
against any other top rated engine run for 60s/move? So far all the
games I have tested have shown serious discreprancies (50cp).


I don't think this is an interesting question *unless* we can
produce an objective meaning [beyond Crafty/Rybka] of 20cp.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.

Ads
  #212  
Old May 14th 07, 11:17 PM posted to rec.games.chess.misc,rec.games.chess.computer
Ron
external usenet poster
 
Posts: 473
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article .com,
raylopez99 wrote:

Don't confuse the PSEUDO-chess scientists and programmers answers on
this thread with REAL answers. Keep in mind I program as a hobby,
have an IQ of over 140, and am a successful and quite wealthy
businessman.


I've got to level with you. It always bothers me when somebody starts an
answer to a simple question by attacking people who disagree with him
and then puffing up his own irrelevant credentials.

Now to get to the point of your questions: I don't know.


Ok. But since you agree it's all intuition and hunches, don't you think
you should be a little more open to the possibility of being wrong?

Personally, I'm not convinced that 12 ply is anywhere near sufficient. I
also think the folly of your insistence of normalization being
sufficient is illustrated by the example that this analysis process
would rate a program that beat crafty-at-12-ply as being worse than
crafty-at-12-ply. Since all of the players evaluated would demolish
crafty-at-12-ply, this strikes me as a near-fatal flaw.

-Ron
  #213  
Old May 14th 07, 11:30 PM posted to rec.games.chess.misc,rec.games.chess.computer
Ron
external usenet poster
 
Posts: 473
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article ,
JohnnyT wrote:

Wow, that word. That is the key of the whole thing. Lack of blunder is
by a long way, in my mind, and I think in many's, a long long way from
the word "accuracy".

And some of the questions had to do with #1 move correlation. Which
again raises the question of "accuracy". And not of blunders.

I think that Crafty-12 as an arbiter of accuracy, is a pretty tough row
to hoe.


The even bigger problem is leaping from "accuracy" to "greatness," which
is the claim of some people.

Accuracy may be a component of greatness (and, in fact, it certainly is)
but I don't think very many people would consider it the sole defining
quality of greatness.

I think there are many other qualities, which I don't think even
RayLopez can claim Crafty-12 is capable of measuring, which go into
greatness. These include, but are not limited to:

Combativeness, contributions to theory, originality, number of
brilliancies, and performance under pressure.

And I'd do this myself, if I had the tools to do so easily, but I'd be
fascinated to see the Crafty-12 evaluations of Tal and Botvinnik based
entirely on their first match. Does anybody have a way of setting up a
computer to do this automatically?

-Ron
  #214  
Old May 15th 07, 03:59 AM posted to rec.games.chess.misc,rec.games.chess.computer
help bot
external usenet poster
 
Posts: 7,552
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 11, 11:06 am, raylopez99 wrote:

1) Would you feel equally confident if we only gave crafty 11 ply? 10?
8? 4? Where do you draw the line? What non-arbitrary criteria are you
using to suggest that 12-ply is meaningful whereas 3 ply, obviously,
would not be?


2) What objective criteria are you using to define "extremely close"
such that you don't trust the computer's ability to rank players
properly?


I'm very curious to hear your answers to these questions.



In truth, nobody in this thread really knows, and indeed further
research is needed. But the burden of persuasion is on Camp #1 to
make their case--that so called "positional sacrifice" positions are
rather common in a game of chess and that chess is NOT largely tactics


This is absurd. The burden of proof, if indeed there
is one here, is on those who would maintain that a
mediocre chess program can *accurately* rank the
world champions.

My view is that the greater the superiority of the
chess engine over those it judges, the smaller this
burden of proof becomes; the less important the
dead-on accuracy of move-ranking becomes as a
requirement to accurately rank the players relative
to one another. An inferior program is good for one
thing, though: spotting gross blunders in the games
of the world champions. A perfect example would be
the famous game where BOTH human players
overlooked a simple tactic involving moves: 1. Q-h8+
(gives away the Queen) ...Kxh8, 2. Nxf7+ (capturing a
pawn) ...K-moves, 3.NxQ, netting a pawn. A human
is *far* more likely to make this oversight than even a
mediocre program like Crafty_at_12 plys.


(these are the assumptions behind their claims--I claim the
contrary). History has shown otherwise.


Nonsense. History has not yet shown that RL's
ignorant assumption regarding tactics is anything
other than just that: an assumption. The figure often
quoted by GM Tarrasch (99%), was not an actual
measurement, but only a way of making a point.

For purposes of the present discussion, it would be
far more useful to think of chess as "only" 90% tactics.
Even here, it is not really the percentage which is the
main point, but rather it is that tactics take precedence
over many lesser things. It's akin to having nuclear
missiles or, say, an aircraft carrier within range.


Indeed, on the last point,
Kramnik missed a mate in one last year.


This was an anomaly; generally speaking, the
world champions don't overlook mates-on-the-move
in top-level play.


Chess is largely tactics, and
that's why it is fair to have a chess engine rate the champions.


Ah, but "fairness" was never the issue here.

What critics have focused on is the subject of
*accuracy* (something RL obviously knows
nothing about).


You can make 30 brilliant "deep" positional moves in chess, have a clearly
winning position, and still lose a chess game in a mate in one.



Obsessing over GM Kramnik's fluke blunder, I see.
The phrase "grasping at straws" leaps to mind.



That is chess. A PC would score you poorly in such a game,
even though you were "brilliant" up until your blunder


Now that you mention it, GM Kramnik had
already tossed away his win by that point; far
from being rated as brilliant, I think his computer
opponent would likely have ranked him as weaker
than itself, since the human's moves did not as
closely match its own quirks. What about
Crafty? I imagine it would rank most other
programs as superior to human GMs for similar
reasons, including those who are objectively
superior.


(and perhaps unappreciated by
the PC, though I have argued in this thread that PCs are in fact not
so bad at rating positions that require positional moves, even
exchange sacs).


The question is not "are they bad" at rating positions,
but rather, it is "are they good enough" to *accurately*
rank the world champions. My view is that if they are
good enough, it is probably programs such as Rybka
and Shredder -- essentially, the ones rated at the top
of the heap -- which are good enough for the job. And
we need a decent sample size for this sort of approach
to render meaningful data; at least one of the champs
had only one single world championship match, of
course, against but one single opponent; that is no
way to do this sort of thing properly.


In fact, Camp #1's arguments are better if we were trying to rate
"correspondence chess" champions rather than OTB champions, since in
correspondence chess tactics are much less important than deep
positional moves.


I wouldn't say that. It would be more accurate to
say that in correspondence play, there are fewer
gross tactical blunders and that in general, the
tactics are deeper and better executed. In sum,
it is a closer match to what the chess programs
rate as perfect chess.


But that was not the inquiry of the original
article ranking of champions: it was for OTB world championship play.
However, that said, I would not be surprised that even for
correspondence chess players, rating such players with Fritz 5.31 at 5
seconds a move would give you a pretty clear indication of the best
correspondence chess players, since good positional moves and good
tactical moves are largely one and the same in chess


Not true.

In one of my current games at RedHot, I just
chose to double my Rooks on the only open file,
as opposed to snatching a free pawn and
surrendering that file to the opponent, even
though doing so would not make a whole lot
of difference. The point is, grabbing at the
maximum of material gain will most likely rate
higher with a shallow search by a mediocre
program, such as Fritz 5.31; yet my move is
likely to result in an immediate resignation
because it stomps out any imagined counter
play and thereby underscores the fact that I
am up the exchange for nothing and can win
material almost at will. Suppose Crafty_12 ply
penalizes either move as inferior to the other --
how does this style issue contribute to ranking
the world champs *accurately*?


(again, this goes to chess being 99% tactics).


Personally, I think this obsession with the
figure "99%" may be the root of the problem. If
you can learn to accept that tactics are, let us
say, higher in rank than positional play, but not
so overwhelming as to merit such a figure as this
"99%", then you will finally begin to grasp the
issue.

Are Generals and Colonels 99% of the military?
Do airplanes and jets and missiles constitute 99%
of the armed forces? No, but their greater weight
may make it seem to be such. Suppose that GM
Tarrasch had instead stated that "artillery is 99%
of winning the war" -- would this mean that all other
segments could then be dismissed as utterly
irrelevant? Would you then sit down and just start
counting up beans (here, artillery pieces)?

To sum up, there are *at least* two problems with
the approach taken:

1) sample size too small (except with GMs like
Steinitz and Botvinnik)

2) 12 plys depth of search is likely insufficient


-- help bot



  #215  
Old May 15th 07, 07:38 AM posted to rec.games.chess.misc,rec.games.chess.computer
help bot
external usenet poster
 
Posts: 7,552
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 14, 10:59 pm, help bot wrote:

In one of my current games at RedHot, I just
chose to double my Rooks on the only open file,
as opposed to snatching a free pawn and
surrendering that file to the opponent, even
though doing so would not make a whole lot
of difference. The point is, grabbing at the
maximum of material gain will most likely rate
higher with a shallow search by a mediocre
program, such as Fritz 5.31; yet my move is
likely to result in an immediate resignation
because it stomps out any imagined counter
play and thereby underscores the fact that I
am up the exchange for nothing and can win
material almost at will. Suppose Crafty_12 ply
penalizes either move as inferior to the other --
how does this style issue contribute to ranking
the world champs *accurately*?


Update: Contrary to my prediction that seizing
the only available file with both Rooks would quite
possibly result in a resignation, my opponent
stubbornly insisted on "contesting" the file,
thereby immediately *hanging* his Rook. My
new, updated prediction? A resignation. (If it
turns out that my opponent instead hangs his
only remaining piece, a Bishop, I will swear off
making predictions forever, and get a real job.)

-- help bot



  #216  
Old May 15th 07, 12:08 PM posted to rec.games.chess.misc,rec.games.chess.computer
Martin Brown
external usenet poster
 
Posts: 598
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 14, 7:37 pm, (Dr A. N. Walker) wrote:
In article . com,
Martin Brown wrote:

In positional terms, I trust my own judgement more than Fritz,
so I'm really using the computer only for blunder-checking. If

In that case it is certainly worth downloading and running something
like Fruit2.2.1 (evaluation free for 14days) as a kibitzer to see the
sort of things that you are missing. [...]


What sorts of things do you think I am missing that Fruit
[or any other strong engine] might show me? We are perhaps somewhat


I tend to like having a faster engine analyse the game at a deeper
level. On paper there is little to chose between Fritz & Shredder but
in reality they give a very different view of some positions where the
strongest move is not a capture.

at cross-purposes, in that I'm primarily interested [when entering
my own games] *only* in the stupid tactical things I've missed. If
Fritz doesn't show it in 10 seconds, then it wasn't that stupid after
all.


Or perhaps it was, but Fritz is blinded to it by a tempting looking
swapoff line. The sorts of things that it will predictably miss are
the passive looking single moves that set up a future forcing
combination or longer term positional advantage. The one that Phil
Innes challenged me with earlier in the thread being a canonical
example. The key move there Nh4 aiming for a weak spot on g6 where it
becomes a major thorn in black's side is beyond hope of Fritz ever
seeing it. Rybka & Shredder both find it quickly and Fruit managed it
after half an hour just before I lost patience with it.

[...] Roughly Crafty19.19 takes 1-3mins to reach
12ply in this mode but in 60s Shredder10 typically reaches 15-16ply in
all but the most complex positions.


OK, but different engines mean different things by "12 ply".


I agree that ply has a somehwhat random meaning, but here we are
talking of a notional full depth search with pruning to at least that
depth with various extensions. It is in the choice of lines to extend
that different engines can give radically diferent results. Fritz &
Shredder/Rybka are poles apart. Rybka doesn't show how far search
extensions go.

[My own program would mean "anything from 6-ply upwards", as it has
variable depth-reduction, depending how "interesting" a move is, and
particularly boring moves count double; I know this is eccentric.]


I grant you that the notional evaluation displayed in blundercheck as
n.nn/MM MM is decidedly +- 1 or 2. Presumably some miscounting of
singular extensions or cache lookups.

The problem here is that Crafty is frequently out by more than 50cp on
key variations and has been in all the GM games I have fed it so far.


Hang about! The *true* value of any position is "won",
"drawn" or "lost", so "out by 50cp" is meaningless except as an
evaluation against some amorphous scale of "slight advantage",
"definite advantage", "surely a winning advantage" and so on. If
Crafty is getting the *material* wrong, that's probably serious,
but otherwise 50cp simply means that Crafty has a different scale
for positional edges. It's not, of itself, right or wrong *unless*
it causes Crafty to lose a drawn position or lose/draw a won one.


If you run an engine engine match with the stronger engine penalised
on time to give Crafty a chance you can watch as the game unfolds.
Both sides claim to be winning for a while until one gets a deep
tactical edge over the other. I did one last night which illustrates
my point - here annotated here by the victor at 60s/move.

[Event "AOI, Blitz:4'+2""]
[Site "East Rounton"]
[Date "2007.05.14"]
[Round "1"]
[White "Shredder 10"]
[Black "Crafty 19.01"]
[Result "1-0"]
[ECO "D25"]
[WhiteElo "9999"]
[BlackElo "9999"]
[Annotator "0.30;0.36"]
[PlyCount "83"]
[TimeControl "240+2"]

{Intel(R) Pentium(R) 4 CPU 3.00GHz 2992 MHz W=13.1 ply; 354kN/s
B=12.2 ply;
835kN/s; 2 TBAs} 1. d4 {Both last book move 0.30/16 12} Nf6 {0.36/12
27} 2. Nf3
{(Bf4) 0.27/15 11} d5 {0.32/12 26} 3. c4 {(e3) 0.27/16 11} dxc4 {
(e6) -0.26/12 25} 4. e3 {(Nc3) 0.34/14 20} b5 {(Bf5) -0.21/12 24} 5.
a4 {
0.79/14 13} c6 {-0.27/12 24} 6. axb5 {(Be2) 0.90/13 5} cxb5 {-0.11/13
23} 7.
Nc3 {0.72/14 17} Qb6 {(Bd7) -0.26/12 23} 8. b3 {(Ne5) 1.01/13 12} e6 {
(b4) -0.15/11 22} 9. bxc4 {last book move 0.81/13 15} b4 {(Bb4)
1.22/16 22} 10.
c5 {(Qa4+) 1.22/16 14} Qb7 {1.22/14 21} 11. Rb1 {1.37/15 8} Nc6
{1.22/15 21}
12. e4 {(Bc4) 1.22/13 13} a6 {(Be7) 1.51/14 20} 13. Bc4 {(Bf4) 1.33/14
22} Qc7
{1.33/14 19} 14. Ne2 {(e5) 1.17/16 7} Nxe4 {0.66/14 21} 15. O-O
{0.66/13 5} f5
{(Bb7) 1.79/15 19} ({Shredder 10:} 15... Bb7 16. Bf4 Qd7 17. Qb3 Be7
18. d5
exd5 19. Bxd5 Qf5 20. Bxc6+ Bxc6 21. Nfd4 Qxc5 22. Nxc6 Qxc6 23. Rfd1
{0.66/13}
) 16. Bf4 {1.65/13 11} Qd7 {2.01/15 24} 17. Bb3 {(Ne5) 1.77/13 6} Be7
{
(Qb7) 2.58/16 18} ({Shredder 10:} 17... Qa7 18. Ba4 Bd7 19. Qb3 Qb7
20. Bxc6
Qxc6 21. Ne5 Qb5 22. Nxd7 Qxd7 23. Qxb4 Kf7 24. Qc4 Be7 {1.77/13}) 18.
Ba4 {
2.58/14 5} Bf6 {2.58/16 21} 19. Rxb4 {2.58/14 6} Nxb4 {2.58/16 17} 20.
Ne5 {
2.58/18 7} Bxe5 {2.58/18 17} 21. Bxd7+ {2.58/18 4} Bxd7 {2.58/18 5}
22. Bxe5 {
2.58/16 6} O-O {2.58/16 16} 23. f3 {2.55/17 3} Nf6 {2.55/15 16} 24.
Nf4 {
(Qd2) 2.51/15 5} a5 {2.59/15 15} 25. Qb3 {2.58/14 5} Ra6 {(Rfe8)
2.87/15 15}
26. Kf2 {(Re1) 2.86/14 7} Nfd5 {(Kf7) 2.86/14 15} 27. Nxd5 {2.96/16 1}
exd5 {
2.97/16 14} 28. Ke3 {(Rc1) 2.96/14 3} Rg6 {(Bb5) 3.15/16 14} 29. Rg1 {
(g3) 2.96/15 3} Bb5 {2.98/14 14} 30. g4 {(Kd2) 2.63/16 4} Re8 {(f4+)
3.06/14 14
} 31. h3 {(Kd2) 2.93/14 4} Bc4 {(fxg4) 3.65/16 13} 32. Qa4 {3.62/14 2}
Nc6 {
5.63/19 15} ({Shredder 10:} 32... Rf8 33. Rb1 fxg4 34. hxg4 Nc6 35.
Rb7 Rc8 36.
Rb6 Kh8 37. f4 {3.62/14}) 33. gxf5 {5.63/17 2} Rxg1 {5.63/17 13} 34.
Qxc6 {
5.63/15 2} Re1+ {(Kf7) 7.00/17 13} 35. Kf4 {7.12/18 3} Kf7 {7.00/18
12} 36.
Qc7+ {(f6) 7.00/20 2} Re7 {7.00/18 37} 37. Qxa5 {7.00/16 2} Rc1 {
(Rg1) 9.36/17 12} ({Shredder 10:} 37... Rg1 38. c6 g5+ 39. fxg6+ Rxg6
40. c7
Rc6 41. Qa7 Ba6 42. Qa8 Ree6 43. Qd8 h6 {7.00/16}) 38. Qd8 {9.41/16 2}
Re8 {
(Bb5) 16.78/17 22} ({Shredder 10:} 38... Rg1 39. Bd6 Rge1 40. Bxe7
Rxe7 41. c6
Re8 42. c7 Ba6 43. Qxd5+ Kf8 44. Qd6+ Re7 45. f6 gxf6 46. Qd8+ Kf7
{9.41/16})
39. Qg5 {16.78/16 2} Rxe5 {17.52/16 10} 40. dxe5 {19.80/16 3} d4
{21.73/16 37}
41. Qd8 {20.98/14 3} h6 {#151/15 15} ({Shredder 10:} 41... h5 42. Qd7+
Kf8 43.
c6 Bf7 44. e6 Bxe6 45. fxe6 Kg8 46. e7 g5+ 47. Kxg5 Re1 48. Qe8+
{20.98/14})
42. Qd7+ {#150/19 2} 1-0

Both engines playing with no opening book. Shredder10 has found a
plausible Queens Gambit sideline [D25] ab initio out to move 9. The
evaluation looks like a picket fence for large parts of the game.

Admittedly the first two were engine showpieces but the second pair
were randomly chosen high level games. You can see it happen most
prominently in the longer game where it misses the crucial winning
line and mis scores a host of moves systematically wrong because it
doesn't understand what is going on.


... But does it miss the crucial winning line because it
has *tactical* shortcomings, or because it misunderstands how to
play positionally? "Missing a winning line" sounds more like the
former [or you might have said (eg) "misses the winning plan"].
How are you judging "systematically wrong"?


Incapable of seeing deep enough to catch significant move refutations,
or in some cases unable to see them at all no matter how much time it
is given.

There is no objective meaning to be attached to "White is 1.23
centipawns ahead" other than "Rybka/Fruit/Crafty gives this as
its evaluation".


I reckon the rms noise on most lines is always around 10cp no matter
how deep you go. A few quiet lines may have smaller rms errors, but
the active ones tend to bounce around a bit. But that is still enough
to have some confidence in finding gross evaluation errors of 50cp or
more (which is what Crafty at 12 ply does).

[...] I do think that a fair
proportion of the "errors" that the G&B analysis says the GMs have
made are in reality just the rms error of Crafty's evaluation which is
something like 30cp multiplied by the number of times they do
something that it doesn't expect.


Possibly; and we won't know unless/until someone does the
experiment. But in that case, the actual figures for most WCs of
around 13cp/move, and less for the WCs who most of us would regard
as the most "accurate" in their positional judgement and tactical
awareness, are surprisingly low. Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.


But they may well differ even less from the output of a stronger
engine. Certainly of the GM games I have tried Crafty12ply on it has
seen "improvements" that stronger engines at deeper ply levels can
easily refute.

If the GM makes the move Crafty expects it doesn't matter how wrong
the evaluation is. The identity X + (-X) = 0 * is very helpful. It is
only when the GM makes a different move that evaluation errors hurt
the scoring.

* Thanks to USPO Xerox have a patent on this blindingly obvious
identity as applied to JPEG decoding.

It is time to turn the question around slightly. Can anyone find a GM
level game where Crafty at 12ply avoids missing important winning
lines and obtains reasonable blundercheck agreement to within say 20cp
against any other top rated engine run for 60s/move? So far all the
games I have tested have shown serious discreprancies (50cp).


I don't think this is an interesting question *unless* we can
produce an objective meaning [beyond Crafty/Rybka] of 20cp.


Annotating a few more GM games with both Crafty12pl, your favourite
engine 12ply, and your favourite engine 60s/move would go a long way
to settling the dispute of whether or not Crafty12ply scoring is
adequate. I have tried the experiment and so far found Crafty-12ply
wanting. YMMV

I think it is worth trying to agree a test protocol that could be used
to produce say 100 top level games consistently annotated by multiple
engines. Then we might be able to get some half decent stats. Hunches
really don't cut it.

Regards,
Martin Brown

  #217  
Old May 15th 07, 12:26 PM posted to rec.games.chess.misc,rec.games.chess.computer
Dr A. N. Walker
external usenet poster
 
Posts: 96
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article .com,
help bot wrote:
In one of my current games at RedHot, I just
chose to double my Rooks on the only open file,
as opposed to snatching a free pawn [...];
yet my move is
likely to result in an immediate resignation
because it stomps out any imagined counter
play and thereby underscores the fact that I
am up the exchange for nothing and can win

^^^^^^^^^^^^^^^
material almost at will.


*Now* he tells us! OK, you are the exchange up, control
the only open file, and can win material at will. So -- purely
a guess! -- Fritz, Crafty, Rybka and any other engine except
possibly Sanny's is scoring your position at +3 or more? ...

Suppose Crafty_12 ply
penalizes either move as inferior to the other --
how does this style issue contribute to ranking
the world champs *accurately*?


... In which case this position is outside the [-2..2]
range, and is discarded by the G&B methodology, really for exactly
the reasons you gave. So Crafty12 would not penalise your move.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.

  #218  
Old May 15th 07, 07:00 PM posted to rec.games.chess.misc,rec.games.chess.computer
Dr A. N. Walker
external usenet poster
 
Posts: 96
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

In article .com,
Martin Brown wrote:
[...] The one that Phil
Innes challenged me with earlier in the thread being a canonical
example. The key move there Nh4 aiming for a weak spot on g6 where it
becomes a major thorn in black's side is beyond hope of Fritz ever
seeing it. Rybka & Shredder both find it quickly [...]


Yes, but I don't *need* Fritz to see it -- I need Fritz
to confirm to me that Nh4 isn't getting trapped by ... g5 and/or
that it wasn't doing something important relative to d4/e5 [not,
of course, difficult in this particular position, but arguing
much more generally]. That's why I think we are somewhat at
cross purposes. You see Rybka/whatever as a strong player going
through your game and pointing out the best moves; I see Fritz
as a slightly annoying spectator saying [Harry Enfield voice:]
"You can't do that, you've just dropped a piece" [/HE] as I look
at the things I or my opponent might have done.

If you run an engine engine match with the stronger engine penalised
on time to give Crafty a chance you can watch as the game unfolds.
Both sides claim to be winning for a while until one gets a deep
tactical edge over the other.


Sure. But you surely aren't claiming that Crafty is
so stupid that it thinks doubled pawns are good and centralised
pieces are bad? So I'm guessing that when both sides think they
are winning [by something significant, not by 20cp or so], one
of them has overlooked something of tactical importance, which
is why, after a bit, it turns into a tactical win.

I did one last night which illustrates
my point - here annotated here by the victor at 60s/move.
[Event "AOI, Blitz:4'+2""] [...]


OK, so around 8-second chess, and an amusing crunch.
It looks as though Crafty-sans-book has not the foggiest idea
about developing and getting castled, and is also tactically
unaware. But not relevant to the present debate!

[There are some interesting discrepancies between
evaluations on successive ply in the annotations, but these
too are not that relevant, unless we find that Shredder is
much more or less prone to these than other engines.]

I reckon the rms noise on most lines is always around 10cp no matter
how deep you go. A few quiet lines may have smaller rms errors, but
the active ones tend to bounce around a bit.


[In which case Capa/Kramnik's 10cp difference per move is
startlingly good ....]

But that is still enough
to have some confidence in finding gross evaluation errors of 50cp or
more (which is what Crafty at 12 ply does).


Yes, but you still seem to be missing something. 100cp is a
pawn, and you can understand that very directly. 50cp is what? It
will matter if at some point we swap a 50cp advantage for a pawn-up
with 50cp compensation, but until then it's an arbitrary measure.
And even after that, it matters only if the implied equation "it's
worth giving up the two bishops in order to win a doubled pawn" [or
whatever] is so wrong that [eg] a won position is now drawn. GMs
don't normally talk in those terms, nor about a 50cp advantage, but
in terms of concrete material, specific positional pros and cons,
and plans in a specific position.

[...] Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.

But they may well differ even less from the output of a stronger
engine.


Possibly. But if G&B's table 3 is showing anything at all
objective about Crafty, it is that Crafty12 plays "rather like" all
the WCs except perhaps Steinitz, and much more like Capablanca and
Kramnik than like other WCs. If Crafty12 is so rotten, it's been
amazingly lucky. After all, in the game you showed above, Crafty10
[assuming that's roughly what it was managing in 8s] deviates by
around 35cp/move from Shredder by G&B rules. So either there's a
*huge* improvement between C10 and C12 [and C12 would agree almost
exactly with Shredder] or else C12 is not only strong enough to
assess WC play, but is actually closer than you might expect to
emulating it.

If the GM makes the move Crafty expects it doesn't matter how wrong
the evaluation is.


Yes, but this doesn't matter *anyway* unless it results in
scoring moves in the wrong order -- and in that case, the GM should
*not* be playing Crafty's move. You can't have it all ways!

I think it is worth trying to agree a test protocol that could be used
to produce say 100 top level games consistently annotated by multiple
engines. Then we might be able to get some half decent stats. Hunches
really don't cut it.


Absolutely. But I don't think the stats will mean what you
seem to think they mean.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.

  #219  
Old May 15th 07, 10:42 PM posted to rec.games.chess.misc,rec.games.chess.computer
help bot
external usenet poster
 
Posts: 7,552
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 15, 7:26 am, (Dr A. N. Walker) wrote:
In article .com,
help bot wrote:

In one of my current games at RedHot, I just
chose to double my Rooks on the only open file,
as opposed to snatching a free pawn [...];
yet my move is
likely to result in an immediate resignation
because it stomps out any imagined counter
play and thereby underscores the fact that I
am up the exchange for nothing and can win

^^^^^^^^^^^^^^^
material almost at will.


*Now* he tells us! OK, you are the exchange up, control
the only open file, and can win material at will. So -- purely
a guess! -- Fritz, Crafty, Rybka and any other engine except
possibly Sanny's is scoring your position at +3 or more? ...


I wouldn't know. BTW, I recently downloaded a few of the free
chess programs but have not yet been able to get any of them
to work "as advertised" so I can analyze my games. One of
these was Fritz 5.32, but its game analysis seems to just
vanish into thin air.

All in all, I would guess that your figure (+3 or more) is about
right since all of my pawns were (yes, it's over; he resigned
immediately after I captured his Rook for free) on the color
opposite to his Bishop and therefore immune to capture so
long as I kept his Rook at bay.


Suppose Crafty_12 ply
penalizes either move as inferior to the other --
how does this style issue contribute to ranking
the world champs *accurately*?


... In which case this position is outside the [-2..2]
range, and is discarded by the G&B methodology, really for exactly
the reasons you gave. So Crafty12 would not penalise your move.



Once again, you have demonstrated a complete, utter
inability to read my comments *in context*.

Look back at my original post. I was (obviously) replying
to this comment by Ray Lopez:

However, that said, I would not be surprised that even for
correspondence chess players, rating such players with Fritz 5.31 at 5
seconds a move would give you a pretty clear indication of the best
correspondence chess players, since good positional moves and good
tactical moves are largely one and the same in chess



To the idea that good positional moves and good tactical
moves are *one and the same thing*. This silly notion
is why I gave the example from my game where I had
deliberately chosen a positional move over the sharper,
tactical, material grab. Clearly, in this context, it would
not matter if I had a dozen extra Queens; there *is* a
substantial difference between positional and tactical
moves.

Among the world champions, those who tended
toward the positional were often described as having a
"dominating" style, while those who liked to live on the
edge were often described as "aggressive", "dynamic",
or perhaps more accurately, "reckless". :D

----

On the other subject, I strongly disagree that Fritz5.31
could *accurately* rank top correspondence players at
5 seconds per move (quick blunder check). This
assumption relies on the silly idea that "chess is 99%
tactics", and the remaining 1% is largely irrelevant. IMO,
the remaining portion -- whether it be only 1% or many
times that -- is not only relevant, but very *important*.

----

As for the G&B methodology, it was never described
in any detail in any of the articles which I read by
following the links earlier in this thread. Clearly, if I had
wished to skewer their "methodology", I would probably
want to know what it was. But having already learned
that the reason for the sloppiness was a shortage of
time and a complete disregard for quality work, I have
no interest in further details regarding the authors'
methodology.

-- help bot




  #220  
Old May 16th 07, 09:18 AM posted to rec.games.chess.misc,rec.games.chess.computer
Martin Brown
external usenet poster
 
Posts: 598
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)

On May 15, 7:00 pm, (Dr A. N. Walker) wrote:
In article .com,
Martin Brown wrote:

[...] The one that Phil
Innes challenged me with earlier in the thread being a canonical
example. The key move there Nh4 aiming for a weak spot on g6 where it
becomes a major thorn in black's side is beyond hope of Fritz ever
seeing it. Rybka & Shredder both find it quickly [...]


Yes, but I don't *need* Fritz to see it -- I need Fritz
to confirm to me that Nh4 isn't getting trapped by ... g5 and/or
that it wasn't doing something important relative to d4/e5 [not,
of course, difficult in this particular position, but arguing
much more generally]. That's why I think we are somewhat at
cross purposes.


Perhaps, but I still think that you might find trying another engine
instead of Fritz an interesting experiment (and if you have a dual
core CPU you can run a second engine effectively for free as long as
you don't want to do anything else at the same time). Fruit2.2.1 is
free for the first 14 days trail period and it shows one alternative
game view.

You see Rybka/whatever as a strong player going
through your game and pointing out the best moves;


Yes. I am having the engine look for deep tactical themes that I would
like to be able to recognise and find in the future. Most of the time
the engine agrees with me (and I am not a GM) but when it sees
something I or my opponent missed completely that is interesting and
worth a second look. Blundercheck is my preferred method. "Analysis"
is too verbose and not so informative.

I see Fritz
as a slightly annoying spectator saying [Harry Enfield voice:]
"You can't do that, you've just dropped a piece" [/HE] as I look
at the things I or my opponent might have done.


OK

If you run an engine engine match with the stronger engine penalised
on time to give Crafty a chance you can watch as the game unfolds.
Both sides claim to be winning for a while until one gets a deep
tactical edge over the other.


Sure. But you surely aren't claiming that Crafty is
so stupid that it thinks doubled pawns are good and centralised
pieces are bad? So I'm guessing that when both sides think they
are winning [by something significant, not by 20cp or so], one
of them has overlooked something of tactical importance, which
is why, after a bit, it turns into a tactical win.


It started from the opening in this game. Shredder +0.90 Crafty -0.27
peaking at +1.40 vs -0.10 then converging a bit until the fateful
17. ... Be7 2.77 vs 0.6 then for a while the scores agreed before
again diverging to 1.24, 3.27. It was pretty clear in this game that
Crafty simply did not know which way was up!

For the sake of balance in a best of 3 engine match at this time
penalty was 1 win 1 draw 1 loss for each engine.

BTW is there a way to get the graphical display of time taken and
engine score shown in Chessbase window or does the game have to be put
into the playing window to see that info?

I did one last night which illustrates
my point - here annotated here by the victor at 60s/move.
[Event "AOI, Blitz:4'+2""] [...]


OK, so around 8-second chess, and an amusing crunch.
It looks as though Crafty-sans-book has not the foggiest idea
about developing and getting castled, and is also tactically
unaware. But not relevant to the present debate!


Remember that is Crafty working at roughly the same search depth
setting as was being used to judge the play of world champoin chess
players. It may be a bit unfair to make it play the opening (where its
performance is very poor).

[There are some interesting discrepancies between
evaluations on successive ply in the annotations, but these
too are not that relevant, unless we find that Shredder is
much more or less prone to these than other engines.]


A lot of engines have some move parity issues depending on turn to
move.

I reckon the rms noise on most lines is always around 10cp no matter
how deep you go. A few quiet lines may have smaller rms errors, but
the active ones tend to bounce around a bit.


[In which case Capa/Kramnik's 10cp difference per move is
startlingly good ....]


I think that is probably because he tended to play down the lines with
intrinsically small rms errors whereas some of the wilder more
exciting players go for positions where Crafty practically grinds to
a standstill at ply12 whilst it tries to figure out all the
complications (and fails).

But that is still enough
to have some confidence in finding gross evaluation errors of 50cp or
more (which is what Crafty at 12 ply does).


Yes, but you still seem to be missing something. 100cp is a
pawn, and you can understand that very directly. 50cp is what? It
will matter if at some point we swap a 50cp advantage for a pawn-up
with 50cp compensation, but until then it's an arbitrary measure.


I was using that as an example. Looking more carefully at that game
there were long sustained periods where Crafty was more than 100cp off
the mark and about 10 moves where it was more than 200cp out (and in
the middlegame). This doesn't bode well for its ability to score GM
level play.

[...] Further, it's interesting that
the strongest and best WCs, by reasonably common consent, are
those whose judgement differs least from that of Crafty.

But they may well differ even less from the output of a stronger
engine.


Possibly. But if G&B's table 3 is showing anything at all
objective about Crafty, it is that Crafty12 plays "rather like" all
the WCs except perhaps Steinitz, and much more like Capablanca and
Kramnik than like other WCs. If Crafty12 is so rotten, it's been
amazingly lucky.


I don't think it is that rotten. Just that it misses a lot of the rare
but absolutely key GM moves and marks them down because it doesn't
understand them. It probably gets the 95% of the routine moves exactly
right, but it is the handful of other moves that make all the
difference.

After all, in the game you showed above, Crafty10
[assuming that's roughly what it was managing in 8s] deviates by
around 35cp/move from Shredder by G&B rules. So either there's a
*huge* improvement between C10 and C12 [and C12 would agree almost


According to the metrics line Crafty was at average search depth 12.2
and Shredder at 13.1 (but in 1/3 the time) during this test match.

exactly with Shredder] or else C12 is not only strong enough to
assess WC play, but is actually closer than you might expect to
emulating it.


One other thign worrying about Crafty is that its accuracy does not
improve with increased ply at anything like the rate of other modern
engines. Its search seems to get stuck in the same local optimum ruts.
A problem it shares with Fritz.

I think it is worth trying to agree a test protocol that could be used
to produce say 100 top level games consistently annotated by multiple
engines. Then we might be able to get some half decent stats. Hunches
really don't cut it.


Absolutely. But I don't think the stats will mean what you
seem to think they mean.


Can we define a set of rules then and lets see if enough volunteers
can be mustered to get an agreed set of games from a recent match
analysed with multiple eniges. We need to set the cutoff for
annotation - I normally use 8cp to avoid getting meaningless dross,
but for this purpose and to make the test as close as possible to the
G&B protocol I guess we try either zero or if that isn't allowed 1cp
window. Lets see if we can get a dataset first...

Regards,
Martin Brown

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
rec.games.chess.misc FAQ [2/4] pribut@yahoo.com rec.games.chess.misc (Chess General) 0 February 19th 06 05:44 AM
Play chess online! Internet chess games. nateg5@yahoo.com rec.games.chess.misc (Chess General) 0 January 7th 06 01:24 AM
Play chess online! Internet chess games. nateg5@yahoo.com alt.chess (Alternative Chess Group) 0 January 7th 06 01:22 AM
Play chess online! Internet chess games. nateg5@yahoo.com alt.chess (Alternative Chess Group) 0 December 29th 05 07:04 PM
rec.games.chess.misc FAQ [2/4] pribut@yahoo.com rec.games.chess.misc (Chess General) 0 October 19th 05 05:37 AM


All times are GMT +1. The time now is 08:23 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright ©2004-2008 ChessBanter, part of the NewsgroupBanter project.
The comments are property of their posters.
Actress - Home Loan - Free Advertising - Credit Counseling - News