A Chess forum. ChessBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » ChessBanter forum » Chess Newsgroups » rec.games.chess.misc (Chess General)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Tags: , , , ,

Technical aspects of historical ratings system



 
 
Thread Tools Display Modes
  #1  
Old August 11th 03, 08:01 PM
Jeremy Spinrad
external usenet poster
 
Posts: n/a
Default Technical aspects of historical ratings system

This post deals with the details of the rating system I used for historical
evaluation of chess players, and some choices which I made in this system.

Before the choice of any rating system, it is important to discuss your choice
of data for these old rating schemes. I decided to use tournaments, matches,
and friendly series which were taken seriously enough to be noted in
sources at the time. Many results have different versions, and I used my
best judgement in many cases. In other cases, where the discrepancy was too
large, I had to discard the result. The data was derived from what I keep
on my webpage; if anyone wants to see the exact choices I made when results
are in doubt, I would be happy to send the program or answer questions as they
arise.

Incidentally, I also rated handicap matches. I let P+1 count for 100 rating
points, P+2 or exchange count as 200, N as 300. These are, of course, all
subjects for possible debate.

For initial ratings of players, I used a combination of what I felt the
perceived strength of the player was at the time, and a strength chosen to be
consistent with their earliest results. On occasion, we get direct evidence
of perceived strength; for example, the London 1862 handicap tournament
divides players into divisions on the basis of perceived strength. More
frequently, it was a judgement call on my part. I tried to go by the
following initial rules: 2500 = perceived clear world champ, 2400 = perceived
contender for world champ, 2300 = players appropriate for a match with
anyone, 2200 = other strong players, and went down from there.

My choices for rating formula varied somewhat depending on which aspect of
ratings I wanted to capture. Given the lack of good justification for my
initial ratings and the infrequency of play betwen players from different
regions, I used a very high K factor when trying to model the view of
people at the time (who have no knowledge of future events). For a single
game, this would be 24 points +/- 6% of difference in ratings to a maximum
of 350 points. When I was aiming instead for a more accurate rating
of players, which involved going both backward and forward over the data
(and thus went over the same result multiple times), I lowered the K factor
and made this 16 +/- 4% of the difference; other values could also be
justified.

I let K factor slide down by numbers of games played in what I view as
a "natural" way. We start with the simple case of a player who has only
one opponent during a given year. Suppose player A beats player B in
a match 3 games to 1. I would rate the match as a sequence of games, in
which player A repeatedly gets 3/4 of the points they would get for a win
- 1/4 of the points player B would win. This is not a constant K factor, since
as distance between players ratings change as a result of earlier games,
the number of rating points gained/lost for later games also changes.

For players who played a number of opponents in a given year, I computed
an average opponent rating (again with 350 point limits of difference), and
initially rated this as a series of games vs this average opponent. I had to
modify this later, as I will explain.

Rating deflation seems to be a big problem (if a player such as St Amant
establishes a rating in the 1840s and the system deflates, he will look
much better than he should when he comes back and plays a few games in the
1850s) with an easy solution. Historical ratings can be run both forwards,
and this allows you to reinflate a deflated system by running it backwards
(because, on average, the players in the system were better at the and
of their careers than at the beginning). For rating accuracy, I figured
that simply averaging the forward rating and backward rating for the year
after some number of iterations. However, I missed another source of
deflation which strikes me as much more subtle. Players who play multiple
matches in a year, and thus have lower K factors, turn out to be very
frequently rapidly improving players. This is true both in knockout
tournaments, where the winner plays more games, and in matches, where the
typical player with many matches is the young player first coming into
his own (you can find this for Staunton, for Morphy, and others).
Fortunately, I had prepared a solution for this, though deflation was
not my original motivation. If two players who are initially 100 points
apart play a long match with an even score, it seems clear that the lower
rated player should eventually gain up to 50 points and the higher player
lose up to 50; there is no reason to think one or the other is rated more
incorrectly. On the other hand, if a player plays multiple opponents with
average rating 100 points above him and gets an even score, it seems more
likely that the player is underrated than that all the others are overrated.
To deal with this, I had the program not only calculate the average rating
of opponents for a player during the year, but also the percentage of games
played against their most common single opponent. I then subdivivided my
series of games for the year; instead of playing a series of games against
a single average opponent, the player plays this series simultaneously
against an average "most common" opponent, whose rating will change during
the series, and against the remaining opponents, whose ratings do not
change during the series. This mitigates (though does not completely
eliminate) the deflation, and also seemd intuitively satisfactory to me.
It had a serious effect on the ratings, by the way; for example,
Anderssen in 1851 gained far more points by performing well against a
set of opponents than he did when this was treated as a single opponent,
and I think this was both more appropriate and did a better job of
reflecting perceptions at the time.

I doubt anyone wants more details on this, but if you do, feel free to
write me or ask on the newsgroup, and I will do my best to answer.

Future posts will involve anomalies and why I feel there must be a human
in the rating loop, and of course my results.

Jerry Spinrad




of deflation which



Ads
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dues increase Hans Poschmann rec.games.chess.politics (Chess Politics) 293 July 16th 04 07:48 PM
Free Program to compute your ICC ratings before a game. Dr. David Kirkby rec.games.chess.computer (Computer Chess) 10 April 14th 04 06:25 PM
Is there a ratings limit in a closed system DDEckerslyke rec.games.chess.computer (Computer Chess) 11 November 9th 03 04:10 AM
1995 anthropology paper analyzing r.g.c postings zhenevsky rec.games.chess.analysis (Chess Analysis) 1 November 4th 03 10:48 PM
Historical Rating Program Jeremy Spinrad rec.games.chess.misc (Chess General) 0 August 11th 03 04:21 PM


All times are GMT +1. The time now is 05:37 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright ©2004-2008 ChessBanter, part of the NewsgroupBanter project.
The comments are property of their posters.
Credit Cards - Online Loans - Mortgage - Free Advertising - Classical Education