![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: aspects, historical, ratings, system, technical |
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
This post deals with the details of the rating system I used for historical
evaluation of chess players, and some choices which I made in this system. Before the choice of any rating system, it is important to discuss your choice of data for these old rating schemes. I decided to use tournaments, matches, and friendly series which were taken seriously enough to be noted in sources at the time. Many results have different versions, and I used my best judgement in many cases. In other cases, where the discrepancy was too large, I had to discard the result. The data was derived from what I keep on my webpage; if anyone wants to see the exact choices I made when results are in doubt, I would be happy to send the program or answer questions as they arise. Incidentally, I also rated handicap matches. I let P+1 count for 100 rating points, P+2 or exchange count as 200, N as 300. These are, of course, all subjects for possible debate. For initial ratings of players, I used a combination of what I felt the perceived strength of the player was at the time, and a strength chosen to be consistent with their earliest results. On occasion, we get direct evidence of perceived strength; for example, the London 1862 handicap tournament divides players into divisions on the basis of perceived strength. More frequently, it was a judgement call on my part. I tried to go by the following initial rules: 2500 = perceived clear world champ, 2400 = perceived contender for world champ, 2300 = players appropriate for a match with anyone, 2200 = other strong players, and went down from there. My choices for rating formula varied somewhat depending on which aspect of ratings I wanted to capture. Given the lack of good justification for my initial ratings and the infrequency of play betwen players from different regions, I used a very high K factor when trying to model the view of people at the time (who have no knowledge of future events). For a single game, this would be 24 points +/- 6% of difference in ratings to a maximum of 350 points. When I was aiming instead for a more accurate rating of players, which involved going both backward and forward over the data (and thus went over the same result multiple times), I lowered the K factor and made this 16 +/- 4% of the difference; other values could also be justified. I let K factor slide down by numbers of games played in what I view as a "natural" way. We start with the simple case of a player who has only one opponent during a given year. Suppose player A beats player B in a match 3 games to 1. I would rate the match as a sequence of games, in which player A repeatedly gets 3/4 of the points they would get for a win - 1/4 of the points player B would win. This is not a constant K factor, since as distance between players ratings change as a result of earlier games, the number of rating points gained/lost for later games also changes. For players who played a number of opponents in a given year, I computed an average opponent rating (again with 350 point limits of difference), and initially rated this as a series of games vs this average opponent. I had to modify this later, as I will explain. Rating deflation seems to be a big problem (if a player such as St Amant establishes a rating in the 1840s and the system deflates, he will look much better than he should when he comes back and plays a few games in the 1850s) with an easy solution. Historical ratings can be run both forwards, and this allows you to reinflate a deflated system by running it backwards (because, on average, the players in the system were better at the and of their careers than at the beginning). For rating accuracy, I figured that simply averaging the forward rating and backward rating for the year after some number of iterations. However, I missed another source of deflation which strikes me as much more subtle. Players who play multiple matches in a year, and thus have lower K factors, turn out to be very frequently rapidly improving players. This is true both in knockout tournaments, where the winner plays more games, and in matches, where the typical player with many matches is the young player first coming into his own (you can find this for Staunton, for Morphy, and others). Fortunately, I had prepared a solution for this, though deflation was not my original motivation. If two players who are initially 100 points apart play a long match with an even score, it seems clear that the lower rated player should eventually gain up to 50 points and the higher player lose up to 50; there is no reason to think one or the other is rated more incorrectly. On the other hand, if a player plays multiple opponents with average rating 100 points above him and gets an even score, it seems more likely that the player is underrated than that all the others are overrated. To deal with this, I had the program not only calculate the average rating of opponents for a player during the year, but also the percentage of games played against their most common single opponent. I then subdivivided my series of games for the year; instead of playing a series of games against a single average opponent, the player plays this series simultaneously against an average "most common" opponent, whose rating will change during the series, and against the remaining opponents, whose ratings do not change during the series. This mitigates (though does not completely eliminate) the deflation, and also seemd intuitively satisfactory to me. It had a serious effect on the ratings, by the way; for example, Anderssen in 1851 gained far more points by performing well against a set of opponents than he did when this was treated as a single opponent, and I think this was both more appropriate and did a better job of reflecting perceptions at the time. I doubt anyone wants more details on this, but if you do, feel free to write me or ask on the newsgroup, and I will do my best to answer. Future posts will involve anomalies and why I feel there must be a human in the rating loop, and of course my results. Jerry Spinrad of deflation which |
| Ads |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Dues increase | Hans Poschmann | rec.games.chess.politics (Chess Politics) | 293 | July 16th 04 07:48 PM |
| Free Program to compute your ICC ratings before a game. | Dr. David Kirkby | rec.games.chess.computer (Computer Chess) | 10 | April 14th 04 06:25 PM |
| Is there a ratings limit in a closed system | DDEckerslyke | rec.games.chess.computer (Computer Chess) | 11 | November 9th 03 04:10 AM |
| 1995 anthropology paper analyzing r.g.c postings | zhenevsky | rec.games.chess.analysis (Chess Analysis) | 1 | November 4th 03 10:48 PM |
| Historical Rating Program | Jeremy Spinrad | rec.games.chess.misc (Chess General) | 0 | August 11th 03 04:21 PM |