View Single Post
  #236  
Old May 20th 07, 11:10 PM posted to rec.games.chess.misc,rec.games.chess.computer
David Kane
external usenet poster
 
Posts: 1,105
Default Greatest chess players ever? Capa, Kramnik, Karpov, Kasparov, *in that order* (cuz 'puters don't lie!)


"Ron" wrote in message
...
In article WHC3i.1755$xP.1292@trndny04,
"Chess One" wrote:

does the result make sense to strong chess players?


The problem here is what's known as confirmation bias.

The test agrees with what our intuition tells us, therefore it must be a
good and valid test. You use the test to check your intuition and your
intuition to check the test - it's perfectly circular reasoning.

But our intuition can be wrong.

You avoid confirmation bias by setting up rigorous standards for a test
before you run it, which was clealry not done in this case.

eg, if I was going to run this test, this is how I'd do it.

1. Select the players to be tested.
2. Since we're trying to measure player at their peak, pick five major
tournaments which they won (or slightly more, or slightly less, to even
out the sample size, aiming for the same number of games). A public
consensus of each player's best tournaments would be a reasonable
starting point. Tournaments - as opposed to matches - should give us
more variability in the types of positions reached, and thus help wash
out the bias if the computer struggled with some positions more than
others.
3. Analyze the game with the strongest available engine, with enough
analysis time that it would be expected to be competitive with top
Grandmasters today. Probably you would not include analysis of the first
5-10 moves of each game (although we'd have to find a logical, objective
methodology to mark the starting point of each game, which is not easy.
You want to be very careful about not scoring a player down because his
taste in openings is different from the computer's, and you want to
avoid giving later players credit for working in an era of higher
quality opening theory).

If you wanted to test this test, before you ran it, you could pick
specific tournaments and matches, and see how you did. Running all the
games from, say, the Zurich, 1953 candidates tournament and seeing if it
picked out Smyslov as the best player would be very interesting. If it
didn't, we'd be stuck with an interesting decision, to decide if the
program was inaccurate, or if the tournament did a poor job of selecting
the best player.


We already have a ranking system (ELO). Like any ranking system,
it has weaknesses, but ELO's errors can be minimized by picking
games between contemporaneous players who played regularly
against each other during a short period of time. This gives us something
that we can use to test rating-by-move-analysis. What we should then
do is put forth trial move-rating algorithms and see which ones (if any)
predict results as well or better than ELO does, being careful to make sure that
the time control is controlled for in the sample games. We should not
prejudge that 12-ply Crafty is adequate or inadequate based on some
vague intuition. We should empirically determine how well n-ply Crafty
works for a variety of n. That way we can learn something. If, for example,
12-ply Crafty predicts results as well as 14-ply Crafty, then that would
mean that sample games are resolved by shorter combinations. On
the other hand, if we see the predictive power of rating-by-move-analysis
increase as we increase the strength of the rating tool, then we'd also have
useful information as to how to proceed.

We should never expect perfect correspondence between different
rating methods. All rating methods have error bars, and will have different
ranges of validity. That said, there should be domains where they
overlap and can be compared. The interesting thing is that once you have
the rating-by-move-analysis tool, you can think about answering
questions like: How good was Morphy compared to modern players?,
What happens to the quality of play when you change the time control?,
Is the quality of rated games different from unrated games? How
much does an influx of young improving players deflate ELO ratings?
These are questions that either in principle or in practice cannot be
adequately answered by ELO alone.


Ads
 

Loans - Child Trust Funds - Mobile Phone - Credit Card - Mobile Phones