![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: capa, chess, cuz, greatest, karpov, kasparov, kramnik, lie, order, players, puters |
|
|
Thread Tools | Display Modes |
|
#71
|
|||
|
|||
|
help bot wrote:
On Apr 30, 1:33 pm, "David Kane" wrote: That alone should provide enough of a question as to the results here. The fact is that we don't know when the engines will be strong enough to represent the "truth". Sure we do. It will happen gradually, as the endgame table bases grow to include, first, all of the end game, and later, the late middle game, and so forth. For someone that is attempting to live in the "truth" here. How about instead of stating stuff like "the late middle game", just for giggles, how many *peta* bytes do you think that will be? |
| Ads |
|
#72
|
|||
|
|||
|
On Apr 30, 5:37 pm, JohnnyT wrote:
I will try not to laugh too hard. As a reward for having refrained from laughing, the elders have decided to award you "a win" of this thread. You are now free to go to other threads and brag about having "won" this one. If you can win just two more threads before the deadline, you can cash them in for a shot at what's behind door #1, #2, or #3. (Unfortunately, what is behind these doors is just more of the same drivel you already found here.) -- help bot |
|
#73
|
|||
|
|||
|
help bot wrote:
On Apr 30, 5:37 pm, JohnnyT wrote: I will try not to laugh too hard. As a reward for having refrained from laughing, the elders have decided to award you "a win" of this thread. You are now free to go to other threads and brag about having "won" this one. If you can win just two more threads before the deadline, you can cash them in for a shot at what's behind door #1, #2, or #3. (Unfortunately, what is behind these doors is just more of the same drivel you already found here.) Woo Hoo! |
|
#74
|
|||
|
|||
|
On Apr 30, 5:44 pm, JohnnyT wrote:
(Is there a FAQ?) No, there is no FAQ. In fact, this is perhaps the single most often asked question, and as such it is the very first one answered in the FAQ . I think this is important when looking at things like the computer rankings so you can understand how measurably stronger than the field Rybka is. And how far behind Crafty is is. As I have noticed over the years, the status on the computer rating list *changes* over time. For instance, at one time there was a big difference between chess programs from say, 1980, where now all such programs are "compressed" near the bottom of the current list. Old magazine ads might list a Mephisto at 2200, and a Fidelity at 1800, while now you could find both programs having been beaten to a pulp by their successors, scrunched together at say 1900 and 1650. By the same token, I would expect Rybka's now substantive lead to soon begin to evaporate slowly, once another program comes along which can draw or beat it. The computer vs. computer rating lists have certain advantages, but they also provide little in the way of information as to how well a program would handle humans, relative to one another. The one thing we can be sure of is that if Rybka can squash all the other top programs, it simply cannot be weak. -- help bot |
|
#75
|
|||
|
|||
|
help bot wrote:
The computer vs. computer rating lists have certain advantages, but they also provide little in the way of information as to how well a program would handle humans, relative to one another. The one thing we can be sure of is that if Rybka can squash all the other top programs, it simply cannot be weak. All true. But it will be even harder to find humans to fight, and all that we know, is that the computer program that tied Kramnik, loses to Rybka. Rybka has clearly drawn a new line in the sand, and I am sure that the major engine designers have a new bar to overcome. But I gotta say, the whole UCI concept, means that I can do it all in the interface that I like. Which for me, is the Chessbase one. So I guess they all win, at least from me. |
|
#76
|
|||
|
|||
|
On Apr 30, 2:34 pm, David Richerby
wrote: Martin Brown wrote: David Richerby wrote: What do you mean by `percentage blunder rate'? The proportion of the time that the GM plays a move that the engine thinks is, say, more than one pawn worse than the best move? That would probably do as a rough working definition. The search depth or time might also need to be specified. Sure. And in fact the graph for %blunder rate for every player is in the original article. http://www.chessbase.com/newsdetail.asp?newsid=3455 That its shape broadly correlates with the rms error graph of the players lends credence to the possibility that Crafty might have been adequate for the task. And to be fair to the authors they did say that others with access to the internals of stronger engines should repeat their tests to see how they compare. I would have liked to see the rms error graph with blunders excluded. That might have shed some more light. How does that make a difference? Unforced tactical errors play their part in the outcome of games. And these are precisely the sorts of thing that computer chess engines are very good at spotting. Subtle long term structural games are much harder for them to score. So you're suggesting that ``Player X makes a one-pawn blunder in n% of games'' is a better measure than ``Player X, on average scores n cp lower per move.'' That does sound like a reasonable statement, though I do worry that sacrifices of pawns are relatively common and might still be mis-evaluated quite often. Kasparov used to sacrifice a pawn for long-term initiative faster than you can say, ``My computer thinks that's a pretty dodgy move.'' :-) Although that may be true. If the program is analysing in blunder check mode or classical analysis mode it will know the outcome of the principle variation actually played as well as for its own hypothetical better move(s). Do you have any guess (or, shock!, data) on how often errors occur in WC games that an engine (given reasonable time) would score down by say 100cp? It is in the paper referred to by this thread. Sadly the link to the original article is broken. Anyone have a full copy? Capablanca maintained a blunder rate of 0.01% (1 blunder in every 10000 moves) and the worst performer was Steinitz at 0.054% (blunder every roughly every 2000 moves). These are interesting numbers and right at the limits of human error rates for purely trivial mechanical tasks like punch key data entry. It is quite astonishing how low these are! So as a rough guide if the average game lasts 40-50 moves (80-100 player actions) less than 4% of them will have their final outcome determined by a blunder at GM level. So the other 96% of cases clearly needs study. It would be interesting to know if the downward march of error rate with time in GM level play is actually due to improved training methods or sparring against computers which always seize on any minor tactical error. There look to be clusters of players from other eras with similar error rates (and a few notable exceptions). To put it into perspective commercial programming has an effective error rate around 1-2% (and in some shops 10% is not unknown). Much greater than 0.2% acheivable by the best formal development methods. But competitive GM level chess is more than an order of magnitude more accurate still. Human error rates for various tasks are online at: http://panko.cba.hawaii.edu/HumanErr/ Regards, Martin Brown |
|
#77
|
|||
|
|||
|
On Apr 30, 5:31 pm, JohnnyT wrote:
David Richerby wrote: Do you have any guess (or, shock!, data) on how often errors occur in WC games that an engine (given reasonable time) would score down by say 100cp? I will say, that in my own practical experience, running through games. That in the same, and not unusual positions, that Fritz 8,9,and 10 have evaluated positions over 100cp different than Rybka 2.3.1 And that different moves have been suggested. There is quite often a systematic difference between the absolute value of the evaluation function of different engines on a given position, but the relative difference between alternate continuation moves is what really matters as far as the decision making goes. Yes a stronger engine will find new resources, but provided the weaker engine is given sufficient time it can still make useful insights into a game. That is my main worry with Crafty here - it doesn't always see far enough into the future because its tree pruning is a lot more conservative than Shredder or Rybka. Usually moves where engines evaluations radically disagree are well worth investigating to see why. That alone should provide enough of a question as to the results here. The fact is that we don't know when the engines will be strong enough to represent the "truth". There may not be a "the truth" to be found...only successive approximations to it given our computational limitations. Like peeling an onion each time you make the engines an order of magnitude more powerful or add enhanced heuristics you allow deeper searching of the game tree that may alter the outcome. However, we have now crossed the point where the best computer programs are demonstrably better at match play than humans. Computer aided by a human in freestyle mode and where the blunder rate from human error is essentially nil is stronger still. Working back from the tablebases where absolute knowledge and theorem proof is possible may allow some further progress, but the storage requirements and intense computational effort needed even for the important 7 men tablebases is so great that it is only likely to be done in a research lab. Having said that in 2 decades the size of removable storage has gone from 360kb to 2GB (5000x) and consumer grade hard disks from 10MB to 1TB (100000x). If this trend continues then affordable PetaByte storage might well be available by 2030. I will say that I do not use Crafty for day-to-day analysis so I don't have an opinion other than that you need to remember in ELO that the difference between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think of it as TWICE as good. Or more likely to win MOST of the time. It is a HUGE difference. You should also note that in engine vs engine games there is a tendency for the strongest commercial engines to include a few tricks in their prepared opening book that exploit known weaknesses in other engines. This makes it a bit unfair to older engines that are not heavily maintained and prepared for engine-engine matches. Regards, Martin Brown |
|
#78
|
|||
|
|||
|
Martin Brown wrote:
There is quite often a systematic difference between the absolute value of the evaluation function of different engines on a given position, but the relative difference between alternate continuation moves is what really matters as far as the decision making goes. Yes a stronger engine will find new resources, but provided the weaker engine is given sufficient time it can still make useful insights into a game. That is my main worry with Crafty here - it doesn't always see far enough into the future because its tree pruning is a lot more conservative than Shredder or Rybka. Usually moves where engines evaluations radically disagree are well worth investigating to see why. This is a worry, but you really need to play with Rybka some, to understand what I am saying. You need to follow several games with Rybka and your engine of choice. You will find numerous positions where the programs will disagree violently (over 100cp) over the favorite moves. (Realize that many times it is not that different). It is precisely that difference where "strength" lies. Different engines simply do not come up with the same moves given enough time. Rybka seems to dramatically show that, and it is dramatically stronger. That alone should provide enough of a question as to the results here. The fact is that we don't know when the engines will be strong enough to represent the "truth". There may not be a "the truth" to be found...only successive approximations to it given our computational limitations. Like peeling an onion each time you make the engines an order of magnitude more powerful or add enhanced heuristics you allow deeper searching of the game tree that may alter the outcome. However, we have now crossed the point where the best computer programs are demonstrably better at match play than humans. Computer aided by a human in freestyle mode and where the blunder rate from human error is essentially nil is stronger still. Thank you. That is my point. But you are trying to determine some sort of "truth" by comparing to Crafty's hobbled play. Working back from the tablebases where absolute knowledge and theorem proof is possible may allow some further progress, but the storage requirements and intense computational effort needed even for the important 7 men tablebases is so great that it is only likely to be done in a research lab. Having said that in 2 decades the size of removable storage has gone from 360kb to 2GB (5000x) and consumer grade hard disks from 10MB to 1TB (100000x). If this trend continues then affordable PetaByte storage might well be available by 2030. Don't let physics get into the way. We will probably be storing stuff into the strings by then! I will say that I do not use Crafty for day-to-day analysis so I don't have an opinion other than that you need to remember in ELO that the difference between 2500 and 2800 is vast, and the difference between 2800 and ~ 3100 is as vast. It is not 10% better, it is closer to think of it as TWICE as good. Or more likely to win MOST of the time. It is a HUGE difference. You should also note that in engine vs engine games there is a tendency for the strongest commercial engines to include a few tricks in their prepared opening book that exploit known weaknesses in other engines. This makes it a bit unfair to older engines that are not heavily maintained and prepared for engine-engine matches. Yes, and you can take the exact same opening book and have the same issue between these engines. The interesting thing about Rybka, it really is *that* strong. Ultimately the point is, that they could have made their measurements in the client rather than the engine. They could have modified, any sort of client with source code available to do this. Then they could have used multiple engine choices to ask their questions. |
|
#79
|
|||
|
|||
|
"JohnnyT" wrote in message . .. I will try not to laugh too hard. The point of this WHOLE argument was comparing WORLD championship skills throughout the ages by comparing play to Crafty. I point out that the two strongest programs can be worlds apart, even by the magic 100cp measure in the same common positions. That people on the surface get confused by the huge and substantial difference between ~3100 and the 2500 quoted for Crafty, and that it is much farther than they would imagine. And you state that in this world championship case. The case through the ages. Is that the software could be too strong, and you use scholastics to try and prove that. I just can't give it to you here. You might have an argument is some other argument with a different set of facts. But it just has nothing to say here. You seem to misunderstand what is being said. I didn't say that 12-ply was too strong, but that the notion that human strength correlates with deeper analysis is something that should be proved, not asserted. As to whether 12-ply + quiescence analysis was sufficient to give meaningful results, the authors have addressed this point, though not fully. (They're certainly more credible than historical ELO, in any event.) One thing they did was to show that analyzing the games of stronger programs gave small errors - smaller than those of the humans. However, there are certainly things they could have done but didn't: showing results for plies less than 12, doing some analyses at higher ply, showing that the analysis gives sensible answers for weaker players whose ratings are known to a high degree of accuracy. There is only a brief discussion of the correlation of the selected measure of move quality with results, but that is really *the* key connection that has to be established. An interesting first step. David Kane wrote: \ In theory, the engine being too strong could be a source of error in the analysis, as much as the engines being too weak could. For example, the best move leads to a win in 20 moves based on a complicated calculation that no human considers. The second best move wins more slowly but in a way that strong GMs might be able to see. Player makes the best move (for the wrong reasons) overlooking the alternate way to win. That's evidence of weaker, not stronger, play. This happens all of the time if you look at scholastic games. Crafty sees the win of a rook at 8-ply and deems it superior to winning a piece at 3-ply. But the 8-ply analysis is essentially irrelevant to the game because the kids are not able to calculate that deeply. \ |
|
#80
|
|||
|
|||
|
David Kane wrote:
An interesting first step. You know, others apparently may disagree. But I do agree with you here. Even if the study had apparent flaws, and questions. It is, fundamentally, an interesting first step. |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | February 19th 06 06:44 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | rec.games.chess.misc (Chess General) | 0 | January 7th 06 02:24 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | January 7th 06 02:22 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | December 29th 05 08:04 PM |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | October 19th 05 06:37 AM |