![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: capa, chess, cuz, greatest, karpov, kasparov, kramnik, lie, order, players, puters |
|
|
Thread Tools | Display Modes |
|
#91
|
|||
|
|||
|
"Inconnux" wrote in message news:EvxZh.17127$_G.14256@edtnps89... Perhaps you just don't understand what we are getting at... In this study they crippled crafty down to 12 ply. Now What we are trying to get at is that many (if not all of them) of the world champions saw well beyond 12 ply. Crafty would mark these moves as Errors, where a stronger program would note that they were not. This is due to the playing styles of each of the world champions. Tal and Alekhine were calculating machines and would often complicate a position... The crippled crafty would simply mark these as errors. Two people who often post in these newsgroups [both with PhDs] are now almost ready to publicly produce their reviews of the new MAMS book, How to Fool Fritz, which addresses many of these subjects. ----------- A larger question looms from this thread: have you people not learned anything after nearly a generation of computer chess? That the 'puter is never wrong? lol but different programs are definately wrong. I have used several programs to examine my games from Chessmaster 10k, Fritz 7, Fritz 8 and Rybka. Its amazing to see the difference in what each program believes is 'right'. Guess which one it closest to chosing the 'right' answer? I put my money on the strongest program. Although MAMS concentrates on Fritz - so that at least the same evaluation can be produced uniformly throughout the book, the commentary makes clear that any software program can be evaluated this way. But the thesis of the book is much as Inconnux suggests above - re Alekhine and Tal, that software evaluation often provides a lousy guide to complex positional situations. In fact, is usually 'blind' to evaluating the respect worth even /within/ its search depth. Sometimes by overriding the computer move just a few times [or even once] during the game a radically different evaluation shows up just a few moves later - and these intercessions are usually positional and somehow Fritz can't find the same line itself - but after being shown it is quite capable of carrying on and winning the game. An end-note is that author Alberts suggests various means of successfully playing against Fritz's blindness to positional evaluation. Cordially, Phil Innes |
| Ads |
|
#92
|
|||
|
|||
|
"David Richerby" wrote in message news ![]() help bot wrote: As I have noticed over the years, the status on the computer rating list *changes* over time. No ****, Sherlock! For instance, at one time there was a big difference between chess programs from say, 1980, where now all such programs are "compressed" near the bottom of the current list. Old magazine ads might list a Mephisto at 2200, and a Fidelity at 1800, while now you could find both programs having been beaten to a pulp by their successors, scrunched together at say 1900 and 1650. You seem to be making the mistake of assuming that `2200' means some fixed level of strength. (Otherwise, it would be entirely unremarkable that a program that formerly scored 2200 now scores 1900.) Ratings do not measure strength. I never really understand that comment. IE, (a) what in your opinion does measure strength, and (b) what do ratings measure? Phil Innes Dave. -- David Richerby Pointy-Haired Newspaper (TM): it's www.chiark.greenend.org.uk/~davidr/ like a daily broadsheet that's completely clueless! |
|
#93
|
|||
|
|||
|
Chess One (Phil Innes) wrote: I never really understand that comment. That's pretty much true of *every* comment. When, as is true in your case, someone values stroking their own inflated ego higher than understanding or learning, they remain in willful ignorance. |
|
#94
|
|||
|
|||
|
On May 2, 1:00 am, Martin Brown
wrote: On May 2, 12:35 am, raylopez99 wrote: On Apr 30, 2:39 am, Martin Brown wrote: I am not convinced that scoring human GMs by how closely their play resembles any particular named chess engine has merit. Premise #1 Fact: The various named chess engines produce significantly different move rankings in key positions. This is already well studied in the literature. See for example this study of Fritz8 vs Junior9 which represent two extremes online athttp://www.dcs.bbk.ac.uk/~mark/download/fritz_junior_icga.pdf Thanks for this citation. This paper, whose abstract and a key paragraph I reproduce below, does not support anything of relevance to this thread. All it shows is that the various chess engines produce different move rankings. What is key, which the paper does not address, and which I intuitively surmise (having played various engines over the years) is whether the top few computer generated moves, when closely ranked together, are indeed the best moves in any given position. That is to say, whether these moves lead to winning positions (unless counteracted by another move of course). This is the key, not whether BxN or NxN is ranked first or second. The worlds greatest ever chess player rankings should not be a function of the engine they are compared against! But they are not. NORMALIZATION, I repeat. What this means is that you use ONE program, and you set the Rand() function seed to zero, so that the SAME repeatable move selection algorithm is used to rate every human player. So you will never have players ranked as a function of the engine, except for the trivial example where the engine goes through an evaluation to determine whether a given move is sound. For that matter, you can employ a human to mechanically go through an algorithm, if you fear computers. Further, if you object to an algorithm being used to determine chess moves, then say so, and be laughed at, given that microprocessors and chess software have shown they can beat or draw the best human in a match. Who is to say which of Crafty, Shredder, Junior, Fritz, Rybka, Fruit etc etc is the closest approximation to optimum GM play. That's not the test, whether PCs play close to GMs. In fact, it's well known that computers play different than humans. And again, normalization means it doesn't matter which of these engines are used, since largely they all use the famous Alpha-Beta and min-max algorithms, with pruning, and certain ranking functions like scoring positions with open files, central pawn rollers, good bishops, pins and the like more than the obverse. I suspect Crafty was only marginally adequate for this test, but looking at the apparent correlation of the blunder rates with overall rms player error in the original paper I think they do have a point. It will be interesting to see what happens if/when the test is repeated with other engines and a hefty search depth. Yes, that would be interesting, but it may not change the rankings much, see the above. My point here is that comparing them to a single engine produces an inherent systematic bias in favour of players with a style similar to the specific named engine and in this case at a rather limited ply depth. Unsupported by any evidence. THis is your intuition, and my intuition says the opposite (see the above). Of course in Chessmaster xxxx you can set the parameters slightly different so the 'puter 'plays like' Capa, or Fischer, or Petrosian, but at the end of the day, if you use the same parameters to rate all humans, you will not differ much in the ranking of their play, I intuit, since chess is largely tactics. I reckon that by about ply 22* with extensions any of the top the engines would be able to annotate GM level games authoritatively (though some would take much longer than others to do it). *The main exceptions are in nasty endgame transitional positions with active high mobility pieces but well out of range of the tablebases where even the top engines can still get lost. Maybe. I think they are already at 15 ply or so, without much pruning, no? But this is immaterial to this thread. I've noticed that at five seconds Fritz largely scores the winning moves (top 3) the same as at 30 seconds, and probably (never had time to test this) the same at 180 seconds. Of course certain positions are exceptions, that readers of this thread will gleefully point to, but these are rare exceptions, not the rule. Perhaps ranking them by percentage blunder rate might be meaningful though (and well within the capability of any good chess engine). It is surprising how effective blunder check can be even on GM level games given sufficient time. What I am saying here is that the detection of GM *blunders* is well within the capacity of any of the half decent chess engines and that these results are unambiguous. The problem with this is that the unforced error rate of top players is very low so this factor only determines the outcome of a small percentage of games. Yes, you're right, I understood this point about blunders, agreed-- blunders are rare and rarely decide games. Thanks for the link to the Hawaiian professor's site on error rates, which was interesting. You must have pretty dumb "logic" if you cannot see the difference between detecting *blunders* in games and scoring players according to much smaller deviations from the engines preferred "best" line. Even more so when the engine was not allowed sufficient time to look deep enough to match or exceed GM strength play. So, are you for or against the proposal made by the authors of the original paper that started this thread? Seems that you are both for and against. Please take a stand. I believe that where the engine evaluation has sufficient signal to noise to make a clear call on the best move being different to the GM choice the methodology will work just fine. Or moves (top two or three moves) where the top two or three moves are not substantially different but equally lead to winning positions. We agree. However, there are a lot of positions in most games where the continuation lines are too close to call even with the current crop of state of the art engines. Yes, but think logically for once Martin: if these continuation lines are "too close to call", how much are the human GMs penalized by Crafty? Answer: very little! Because the difference between what the GM chose (let's say move 3 of the top three by Crafty) and what Crafty chose (the first move), is, by definition, very close. So let's say in centipawns the 'best' Crafty move is +85, while the actual GM move was rated +75, meaning a penalty of -10 centipawns is applied. Overall, this is a trivial penalty, because the continuation lines were too close to call. However, if the GM move chosen rates only +5, then rightly Crafty is penalizing the GM a hefty +80 cpawns, and clearly, based on the superior knowledge that PCs have shown to have about the game of chess, the GM is (usually) picking an inferior line. Of course there are exceptions--most notably a positional sacrifice (not a pseudosacrifice, I trust you know the difference)-- well beyond the move horizon of the program, but these exceptions are rare in chess (which is why they are so delightful when seen). BTW on this last point: my Pentium IV PC at 30 second a move is great at scoring exchange sacs in the Sicilian where in certain lines Black exchanges QR for QK at about even--showing that indeed processors are not as bad as people think at even scoring positional sacrifices. Although the paper makes the claim that these will average out - the systematic bias in favour of playing like the specific named engine will not. Normalization. Irrelevant, since the 'heart' of these engines is the same and chess is primarily tactics. RL From the paper: Anecdotal evidence exists that in many positions two distinct chess engines will choose different moves and, moreover, that their top-n ranking of move choices also differ. Here we set out to quantify this difference, including the difference between move choices by chess engines and those made by humans. For our analysis we used FRITZ 8 and JUNIOR 9 as representative chess search engines and the POWERBOOK opening book as representing human choices. We collected the top-5 ranked moves and their scores as reported by FRITZ and JUNIOR, after 15 and 30 minutes of thinking time, and the top-5 moves recorded in the POWERBOOK, for the Nunn2 test positions and the initial board position. The data analysis was carried out using several nonparametric measures, including the amount of overlap in the top-5 choices of the engines and their association as measured by three variants of Spearman's footrule. Our preliminary results show that, overall, the engines differ substantially in their choice of moves, and, furthermore, the engines' choices also differ substantially from human choice. The results confirm that, overall, the engines differ in their choice of moves. Although the overlap in the top-5 move choices is about 3 on average, the top-1 overlap is close to 0 and the top-2 overlap is close to 1. The F, G, and M measures show that FRITZ and JUNIOR rank moves in a different order, and when there is agreement, it is not necessarily in the top-3 move choices. There is higher agreement between FRITZ's ranking and that of humans than there is between JUNIOR's and humans' rankings. Both FRITZ's and JUNIOR's rankings are stable over time, on average, although there are still fluctuations in the rankings. Furthermore, FRITZ's score difference between moves is slightly higher than JUNIOR's, possibly indicating that FRITZ is 'more confident' in its ranking than JUNIOR is. Finally, the average scores of moves per rank are similar and decreasing with rank, and they indicate a small advantage for White in the positions tested. |
|
#95
|
|||
|
|||
|
Chess One wrote:
David Richerby wrote: You seem to be making the mistake of assuming that `2200' means some fixed level of strength. (Otherwise, it would be entirely unremarkable that a program that formerly scored 2200 now scores 1900.) Ratings do not measure strength. I never really understand that comment. IE, (a) what in your opinion does measure strength, Nothing. (b) what do ratings measure? Performance. We've been through this a hundred times in these groups. While I'm prepared to explain it to newbies, I'm not doing it again for somebody who's been here longer than I have. Dave. -- David Richerby Expensive Sushi (TM): it's like a raw www.chiark.greenend.org.uk/~davidr/ fish but it'll break the bank! |
|
#96
|
|||
|
|||
|
wrote in message ... Chess One (Phil Innes) wrote: I never really understand that comment. That's pretty much true of *every* comment. When, as is true in your case, someone values stroking their own inflated ego higher than understanding or learning, they remain in willful ignorance. troll off nitwit! address the subject, or wait! do you even understand the subject - prove it! Pi |
|
#97
|
|||
|
|||
|
"David Richerby" wrote in message ... Chess One wrote: David Richerby wrote: You seem to be making the mistake of assuming that `2200' means some fixed level of strength. (Otherwise, it would be entirely unremarkable that a program that formerly scored 2200 now scores 1900.) Ratings do not measure strength. I never really understand that comment. IE, (a) what in your opinion does measure strength, Nothing. (b) what do ratings measure? Performance. We've been through this a hundred times in these groups. While I'm prepared to explain it to newbies, I'm not doing it again for somebody who's been here longer than I have. Fine. But though we have been through this so many times, perhaps the reason is that the explanations aren't very convincing? And that's why people continue to challenge it! ![]() I don't understand terms which are undefined, since they can mean about anything. ie, 'performance' is a measure quantifiable by rating, and isn't performance synonymous with strength in that people use the terms interchangably? Since Dave is perhaps exhausted explaining the issue, can anyone else actually say the difference between asking about the strength of a player in respect of other players, and asking about the performance of a player in respect of other players? As I say, there may be some worthwhile distinction, and I do not want to slight Dave, except to say that whatever the distinction is escapes me and presumably the previous 100 people who have inquired. Phil Innes Dave. -- David Richerby Expensive Sushi (TM): it's like a raw www.chiark.greenend.org.uk/~davidr/ fish but it'll break the bank! |
|
#98
|
|||
|
|||
|
David Richerby wrote:
(b) what do ratings measure? Performance. We've been through this a hundred times in these groups. While I'm prepared to explain it to newbies, I'm not doing it again for somebody who's been here longer than I have. Ratings measure win/lose predictability in the pool that is rated against. A side effect of rating, is to understand an *implied* strength based on that rating. It is difficult to provide a number of implied strength to two candidates in the pool. Those that lose the vast majority of the games, and those that win the vast majority of the games. When pools are small or closed to other populations, it is difficult to correlate the numbers of one pool to another, especially in implied strength, if not in predictability of win/loss ratio. Occasional cross checking of the pool (like the Fritz 10/Kramnik match), can help provide validity of the implied strength of the numbers. So long as the match was fair, the results were as predicted, and the match wasn't entirely one-sided. We have a couple of interesting candidates that are in the Computer Pool that haven't been fully calibrated by the human pool. Rybka and Hydra. But back to subject... The problem here, is that crafty is well off the pace. There are engines that soundly beat crafty in what appears to be purely based on strength on not just tricks, and that those engines are beat soundly by Rybka. The other problem is the 100cp measure for blunder. Because that does not appear to be enough. This means, that this comparison is an interesting set of questions, but that the questions raised about using a restrained version of crafty are so severe that even if the only easy method because it is open source that it calls into question the validity of its conclusions. Clearly, they are other opportunities to control the engine, and ask the questions that are in the researchers hands. In ways that use engines that are closer to world championship level than crafty. In ways that let you use engines of different styles. You will, ultimately, however always have to do with the difficult question of truth. Which is only implied through results. |
|
#99
|
|||
|
|||
|
Ron wrote:
It seems unfair to penalize a player who continues to play once his skills have deteriorated. Another interesting point. Which means that you really want to see this as some sort of career graph, rather than just a final number. |
|
#100
|
|||
|
|||
|
Martin Brown wrote:
Who is to say which of Crafty, Shredder, Junior, Fritz, Rybka, Fruit etc etc is the closest approximation to optimum GM play. This is probably too low a measure. "GM" play and World champion play are usually not close (latest FIDE world knockout champions aside). |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | February 19th 06 05:44 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | rec.games.chess.misc (Chess General) | 0 | January 7th 06 01:24 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | January 7th 06 01:22 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | December 29th 05 07:04 PM |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | October 19th 05 05:37 AM |