![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: capa, chess, cuz, greatest, karpov, kasparov, kramnik, lie, order, players, puters |
|
|
Thread Tools | Display Modes |
|
#131
|
|||
|
|||
|
In article ,
David Richerby wrote: The researchers' crippled Crafty is spending less time than it would in a game. Of course; otherwise, it would take longer than "real time". You would never get through "Informant" if you spent five hours playing through each game .... OK, a team of you might manage that, but then you'd never get through "TWIC". OK, every strong player in the world put together might manage to get through all the games being played by strong players in a reasonable time, say a game or two per week each .... [...] The alternatives to crippled-Crafty rating every WC game are basically to have *no* results, 'cos it takes too long, or to have an utterly superficial analysis of *all* the games of Tal/Kasparov/.... False dichotomy. They could have gone to thirteen ply. Or fourteen. And it would have taken 3x or 9x [or whatever] as long. Whatever resources they had, they then had to balance time spent per move against number of games being analysed. Of course, this would still bring the same criticism. But the real problem is that you're assuming that only one computer is available. Well, that a certain amount of resource was available. The Kramnik-Fritz match was played on commodity hardware. An academic research grant will easily run to buying, say, ten PCs. Actually it probably won't. Grants [in the UK] anyway are usually based on a background assumption of a "well-found laboratory", so if you say "I want 10 PCs for this", they will reply "You should already have those lying around". You do better to say "I need a super-computer for this". Anyway .... (And this was academic research.) Right. Guid was/is a PhD student; but this does not seem to be his PhD topic. Most academic computer-chess research is done "privately" as a spare-time extra to "serious" work. But, yes, ... Or you go round your department asking if the people with modern computers will let you run your analysis on their machines overnight. In a university, it shouldn't be too hard to get night-time use of several tens of decent PCs. ... this research apparently occupied 36 modern PCs for 10 FTE days; that's probably a month or so of elapsed days. It's not the only thing they were doing in this area, as they also report on some other experiments, some designed to see whether Crafty was an adequate tool for this compared with other computers: "These results give some indication that using other strongest [sic] chess programs instead of Crafty would probably not affect the results significantly." So I guess there were limits on the number of months for which their computer lab could be diverted to this not-terribly-important effort. Whether or not what they have done is "worthwhile" is quite another matter. But the annals of computer chess are full of such papers. The only distinguishing feature of this one is that it has attracted attention in this forum. Most of the comments being made here have been addressed in the paper; you may or may not find their answers convincing. But it seems rather unfair to attack their work *without* first reading it. -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
| Ads |
|
#132
|
|||
|
|||
|
On May 3, 7:32 pm, michael adams wrote:
raylopez99 wrote: You bothered to reply to help bot? help bot is pretty clueless, I think he's essentially a troll. You're the troll here Lopez. No, it takes one to know one "michael adams". Help bot's contributions to this ng, while occasionally somewhat 'longwinded', are infinitely more entertaining than your characteristically boring & nugatory execrations Matter of opinion. I post hard facts, while you and your friend bot post opinions, like as wholes, which everybody has and leads to nothing. (I mean, do you seriously expect anyone [with a life, that is] to wade through your last but 1 or 2 or 3 posted screeds?) You seem to wade through. And the fact this thread has 131 replies and growing means it resonates with the crowd and adds value. This thread is surely flogged to death by now - no?.. No, otherwise you won't bother replying to this rebuttal, buttal. RL |
|
#133
|
|||
|
|||
|
In article .com,
help bot wrote: Ah, but you seem to forget that the reason for Crafty's success was simply the fact that Cray computers were faster. Whoa! Crafty is not Cray Blitz. Whatever success Crafty may have had owes nothing to Cray *except* that Dr Hyatt was able to use his experience with CB in designing and programming it. RMH himself describes Crafty as derived from CB or as a descendant of it, but the relationship is much more that of a grown-up baby than that of a clone. [...] As far as I know, the learning approach has stemmed from neural networks, not Cray supercomputers. There are several ways in which programs can learn, few of which exploit neural networks. Mr. Hyatt, the creator of Crafty, simply exploited the raw speed of a mainframe he had (virtually unique) access to. Actually, it was Larry Nelson who exploited the raw speed, and Bob Hyatt and Bert Gower who wrote the chess and the algorithms. The whole caboodle was *far* from simple, as anyone who tries to write highly-parallel chess code soon discovers. There was some very clever code, both high- and low-level, in Cray Blitz. Similar comments apply to Deep Thought and other programs "accused" of "simply" being fast. Other things being equal, faster is better. But other things are not equal, and there are seriously bright people behind the successful programs. -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
|
#134
|
|||
|
|||
|
Suppose that a version of Crafty favors "defensive" players like Capa
and Kramnik, while penalizing "attacking" players like Tal and Fischer. Call this version of Crafty "Defensive --- Interesting post, and the para above identifies the flaw in comparisons of player-to-player by engine analysis- that if you are evaluating players, then even Tal himself said that anybody at all could find his own flaws after the game, or the next day, or even next week. But he chose 'em because very few people could find them in real time OTB. Yes, but this problem is also present in computer chess! Even if the program is 'backsolving' or anotating a completed game, it has to look in the chess tree, which means that it faces time constraints similar to an OTB player vs a 'time is not of the essence' correspondence player. Quite so, Ray - in order to go to deeper plies and find a solution the program must find a way to prune possibilities - and it does this by scoring each branching possibility on, say, ply 7 or 8, and if worse than nominally -2 will prune that line [which may have been eg, a Knight sac] that can perhaps offer a return of 3 pawns and the initiaitive by ply 15 - or even something more compelling. So technically, this is the classical dilemma of the worth of the initiative from a positional basis - and the evaluation function is pre-set, no? That is, it is either copied from other programs, or based on in Crafty's case, consultation with a GM to the /general/ worth of a variety of forms of initiative. The problem then becomes of applying that general evaluation complex to any /specific/ instance encountered during play. Given enough time then Stenitz will likely always rank higher than Tal by analytical method, but OTB, super-solid Steintitz wouldn't know which way Tal hit him! The flaw is that this analysis is okay for games, ie picking some best theoretical line by objective and uniform measure against all other lines - but chess playing is not a theoretical activity - its a real-time performance. Yes, and again, it's also true for chess playing computers (since no program is given infinite time to analyse a game). Yes - one of the MAMS recommendations is to pick a long line, preferably giving up material for initiative and let the thing try to cook an answer - if its ply depth doesn't go long enough, then the interesting thing is that it begins to evaluate its own chances at +2 or even +4, and then half a dozen moves later [and too late!] adjusts the evaluation to +1 or less, or even to -1 etc. --- That's the way I see it, and until we get more research any rebuttal to the contrary will simply be speculation, since the data is just not there. We rehearsed this conversation before - but there is no data of GM play against raw chess engines. The engines are all optimised for winning, and not for anything useful, like learning - either about its on evaluation matrix of chess evaluation or even how people evaluate play. [because with book+table bases=off, it wins less] I disagree. I think winning is very closely correlated to 'learning which move was the best'. And I don't think you therefore need a database of GM play versus raw chess engines either, O - parenthetically, I didn't mean database of GM play, I meant GM play! since winning is immaterial as to how you win (whether thinking like a GM or thinking like a machine using algorithms), though the SSDF site does just that (rates PCs using humans). Winning is the result of whatever process there is. The point about 'learning' anything about the result of the game is that it is often not possible since the process is obscure. When, for example, a program uses a book or a table, what evaluation is possible if the program could not find the move without the book or table? Evidently, failing to find the move on its own identifies a poor evaluation algorithm. Isn't this an axiom? I see that someone else has written that they are bored by this thread which is overworked - and I have some sympathy with that, except that there seems to be no god consensus on identifying the problem! Now, maybe its a problem we can't fix, but we'll never fix it without saying what it is - and instead go the other way, to continue to emulate solving it by adding ever increasing numbers of look-ups in a brute force attempt to cover all specifics. I understand the commercial need to do that, but don't understand any academic reason to choose emulation paradigms over [exigesic] real-time engine evaluation. I always thought Crafty in particular would be such a base model, since it is born out of a university system, widely distributed and adapted, and lots of people might have had a go at it. But I think Crafty got caught up in its own early success as W CH, and continued to go for 'win', rather than for 'learn'. Phil Innes Crafty was once world champion? I didn't know that or it must have escaped me. Yeah - Bob Hyatt is its author. I am not sure if he will immediately evaluate the MAMS title, but I did offer to provide him a reader's copy of it for his interest - but its end of academic year, so possibly not conveninet. 2 other posters here are reviewing it, and it will be interesting for us all to read their opinion of Albert's explorations. Personally I find the /specific/ line evaluations in this new title to be revealing - and i think the author has done us all a service in removing these ideas from the general to the particular. Even in highly romantic play, its fascinating to see Fritz analysis - look at this! 1 e4 e5 2 Nf3 Nc6 3 Bc4 Nf6 4 Ng5 Bc5 [on 4... d5 Fritz scores it +0.15] 5 Nf7 Bf2 [now Fritz scores it -3.44] This is the infamous Traxler Gambit, and in a few moves there are immense complications. What the author says as a second reason to look at this opening [the first is that Tal played it against 10,000 Pravda readers] is 'this ultrasharp open position highly likely Unique Movement ["forced"] Sequences are on the verge. Now - 50 years of amateur and GM analysis [see de Zeeuw and Christophe&Moll, eg] have argued the hell out of 6. Kf1 compared to Kf2 - but Fritz is right onto it! I won't traduce his book by copying it out too exactly here, and we can wait for the reviews, but to follow the above, in a line [de Zeeuw] at move 9 the analyst saays equality, but Fritz is scoring it -1.1 for black. Now, the move that Fritz can't find in this line [which follows 6 Kf1], is the MAMS 9th move, Nd4 which brings down the risk factor from a huge +4.3 to a draw after recommended /10th/ move Bf7. There are so many points illustrated by this game in the text, and why Fritz insists on some lines for White which lose in 9 moves, and why it ignores other MAMS moves like de Zeeuw's line with 12. Qf1 which at 19 is = is the strangness of its evaluation matrix. Now - here it is - Fritz is no better at refuting this maddening line 1 e4 e5 2 Nf3 Nc6 3 Bc4 Nf6 4 Ng5 Bc5 5 Nf7 Bf2 than 50 years of players, BUT - if you want to know the secret of beating this Traxler line with its improvident 4. ... Bc5 then a few 'moves by hand' or MAMS moves, 8-13, allows Fritz to romp home with a solution to win for white in 20 ![]() Now, to return to our evaluation thesis [or at last mine] what is happening above at moves 8-13 that Fritz can't find: No matter what ply depth you allow it? Cordially, Phil Innes Thanks, Ray |
|
#135
|
|||
|
|||
|
Dr A. N. Walker wrote:
But it seems rather unfair to attack their work *without* first reading it. True. But they're kind of asking for it by publishing a summary publicly while only making the full paper available through a subscription-only journal. Dave. -- David Richerby Natural Accelerated Clock (TM): it's www.chiark.greenend.org.uk/~davidr/ like a clock but it's twice as fast and completely natural! |
|
#136
|
|||
|
|||
|
In article ,
David Richerby wrote: But it seems rather unfair to attack their work *without* first reading it. True. But they're kind of asking for it by publishing a summary publicly while only making the full paper available through a subscription-only journal. It seems rather unfair to attack *them* because *Chessbase* publishes a summary, and even more unfair when that summary includes a link [near the bottom left] to the full paper, which is available at Chessbase, in ICGAJ, and in the conference proceedings, available [free, as you're in the UK] from your nearest decent library. Try "http://www.chessbase.com/news/2006/world_champions2006.pdf". -- Andy Walker, School of MathSci., Univ. of Nott'm, UK. |
|
#137
|
|||
|
|||
|
On May 3, 1:44 pm, (Dr A. N. Walker) wrote:
In article . com, help bot wrote: [...] But the main objection was, of course, that since Rybka (and Hiarcs, etc.) is available, why mess around with something vastly inferior, unless ranking mere patzers? Um, because Crafty is not only available, but available in source-code form to anyone, so that anyone can instrument it, piggle with it, and generally use it as a tool to investigate things? If you want something reproducible, then, short of help from the commercial companies, you have to do a fair amount of tweaking. Also not entirely convinced by arguments about Crafty only being able to rank mere patzers. I'm well short of super-GM status myself, but I think I know enough about chess to be able to judge that [eg] Kramnik is a better player than the typical 2650-ish GM. I agree and although I think Crafty could easily rank the GMs for clear blunders of 50cp or greater I have absolutely no confidence in its fixed 12-ply + quiessence evaluation being anything like good enough to rank GMs or other programs normal moves for accuracy let alone world champions. My reasoning for this is simple - I have in the past posted a few of positions from my own games (at around 2000 ELO) where I found 14 ply effective search depth inadequate for blundercheck. If such positions are relatively common in ordinary club games I reckon they will be even more frequent in complex GM level positions. I think for analysing the games they would have been better off running the engines in blunder check mode and working back down the actual line of play. Having the transposition table preloaded with relevant future positions from the GM played game gives the engine a head start and enhances the proportion of cutoffs speeding up analysis. OK you would have irregular search depth but it could be engineered to be a minimum of 12 ply worst case. My experience is that 14 ply isn't always adequate to blunder check my own games using a much stronger engine. One more criticism: in looking over a game at one Web site, I noticed that the depth of search DURING PLAY achieved by GM Kramnik's opponent was around 18 plys. Now why on earth would anyone try to rate the play of the world champions by cutting of crippled-Crafty's search at only 12 plys? I mean, get a REAL computer, and a clue! Once you set parameters such that you want to investigate a reasonable corpus of games [and WC matches seems quite sensible] using a few weeks of time on a PC, the rest somewhat falls into place. Eg, if you want to analyse 1000 games and are willing to wait one month for the results, then you have perforce to analyse 30+ games/day, or 48 minutes/game, or around 30s/move, on whatever dedicated machine is available to you. Whether the resulting investigation is worthwhile is another matter. I think their analysis may be flawed because they appear to have worked forwards down the game tree from move 12 with a fixed ply search rather than backwards from the known endpoint (a la blundercheck/analysis mode) making effective use of all previous computations to save time. Using all the prior knowledge to give the engine its best chance of understanding the game. Calling it crippled Crafty is a bit unkind but on balance I do think it is probably justified. At 12ply + quiessence Crafty cannot see certain types of positional traps at all and would mark down players accordingly for playing to avoid them. A more brutal bias is present in the stronger Fritz8 engine which would mark down any GM that didn't simplify the pawn centre at the earliest safe opportunity. Crafty19 is somewhat more balanced than Fritz8 about choice of continuation lines, but it looks a bit out of its depth when asked to rate super GM games that it really doesn't understand (certainly not at 12 ply). Also when compared to stronger engines Crafty19.01 (which is what I have most experience with) shows a tendency to exaggerate the badness of some of the weaker moves from a given position. I am currently testing a series of engines on a slightly tricky French Advance Variation position where powerbooks considers only 7.Qd2 as playable. r3kbnr/pp1b1ppp/1qn1p3/2ppP3/3P4/2P1BN2/PP3PPP/RN1QKB1R w KQkq - 0 7 I am running each engine on the same posiition on a pair of 3GHz P4 1GB ram boxes. I will summarise the various engine results I have so far at roughly tournament rate of 4mins /move Shredder10 7. Qd2 0.49, Qc1,Qc2 0.19, Qb3 0.14, Qe2 0.01, Na3 -0.01 @ 17 ply in 4m 320kn/s Fritz8 7. Qd2 0.09, Qc1, Qc2, b3 -0.10, Qe2, Qb3 -0.22 @ 13 ply in 5m 900kn/s Rybka2.31 7. Qd2 -0.27, Qb3 -0.28, Na3 -0.31, Qc1 -0.33, Qe2 -0.35, Qc2 -0.39 @ 14 ply in 4m 44kn/s Crafty19.01 7. Qb3 -0.14, Qd2 -0.24, Qc2 -0.39, b3 -0.40, Qc1,Qe1 -0.43, Na3 -0.44 @ 12 ply in 7m @ 700kn/s If anyone else wants to try running the position for 48 hours and noting down the move evaluations at all plys reached beyond the 4 minute mark I would like to see what Fruit, Comet and Deep Sjeng make of this position. Infinite analysis, top 10 lines displayed (and also the actual top 10 preferred lines found after 48 hours). One thing is certain at a 12 ply fixed search on this position Crafty would penalise stronger programs which made the move which opening theory and Powerbooks2006 think is best 7. Qd2 as follows Shredder 0.73, Fritz8 0.23 and Rybka 0.13 Rybka is still running and Qd2 is slowly moving clear of the pack currently at 20ply. It still doesn't much like whites position though at -0.09. BTW This example wasn't chosen for this purpose I was already looking at it. (and I allowed Crafty a bit of extra time just in case Qd2 was top dog at 12 ply) Crafty is also still running but now the 3-4x time increase per extra ply is beginining to hurt. All the newer commercial engines now prune so efficiently that they are close to 2x increase in time per ply. Regards, Martin Brown (apologies if this gets posted twice but Google groups dropped the first copy on the floor) |
|
#138
|
|||
|
|||
|
raylopez99 wrote:
You seem to wade through. And the fact this thread has 131 replies and growing means it resonates with the crowd and adds value. This thread is surely flogged to death by now - no?.. No, otherwise you won't bother replying to this rebuttal, buttal. 131 & growing? get your factoids straight you geektoid. I mean, to be a 'toid' is to be a _toid_ - I suppose.. |
|
#139
|
|||
|
|||
|
Dr A. N. Walker wrote:
\ "http://www.chessbase.com/news/2006/world_champions2006.pdf". Thanks for the link... Just a quick look at stuff, and this is here --------- 2.1 Average difference between moves made and best evaluated moves The basic criterion was the average difference between numerical evaluations of moves that were played by the players and numerical evaluations of moves that were suggested by computer analysis as the best possible moves. MeanLoss = [best move evaluation − move played evaluation|/ number of moves (1) ---------- Why does "my" camp(?!?) find this whole report troubling? Because this is fundamentally flawed problem. Because best move evaluation by a crippled, non-world class program, simply can't be considered the "truth". That best move evaluation is likely to be worst, when the position most requires the judgement of the world class player. And ultimately, an idea, a critical idea that makes the difference between world champion and runner up, may be only expressed in a single move. A single fork in the road. That one move difference to be unfairly challenged by the quantity of the moves in the game. It is like measuring the length of the games, by how many comments are in the crafty code. It is a nearly useless measure, by a nearly useless measurer and from there we should be able to divine something, wow. |
|
#140
|
|||
|
|||
|
On May 3, 6:24 pm, raylopez99 wrote:
On May 3, 9:17 am, Ron wrote: In article . com, And, it seems to me, that you've basically conceded the point. The biases of the program will affect it's judgement. Just because it's not making 600-rating-point errors doesn't mean it isn't making 50-rating-point errors. No, you misread me. I am saying YOU (or Camp #1) is making that error. I happen to believe chess programs rarely make 50 centipawn errors. You can believe what you like, but it doesn't make it true. They typically make 10-20cp rms errors under normal match play time controls - the difficulty is in exploiting them. 50cp would be a roughly 2-3 sigma event depending on the choice of engine and position - so not all that rare. Lets step back a bit and see how many things we can agree upon. Let us start with the claim that every extra ply explored (or possibly 2 ply to avoid odd/even player to move parity problems) with a full depth search will result in an improved or at least a no worse estimate of the value of a position. Even this is a bit tricky now that some engines use speculative pruning and search extensions. However, the general point should hold that the longer you run a chess engine the more accurate its evaluation becomes. Based on this we can compare what it can see at tournament rates of play eg 4m / move against what it can see after 2 days looking at the same position. The results are enlightening on the handful of positions I have tried. In the French Advance Variation I posted elsewhere in the thread the average absolute error made by various engines running at 15 moves/ hour when compared to their own evaluation for the top 5 lines after 2 days was as follows: Shredder10 19cp (17 ply 4m, 24ply 60h) Fritz8 11cp (13 ply 5m, 16 ply 33h) actually 2 big errors and 3x 0 Rybka2.31 11cp (14 ply 4m, 20 ply 5h) Crafty19.1 6cp (12 ply 7m, 15 ply 3h) I reckon about 10-20 cp rms evaluation error is typical of current engines at tournament rates of play. Shredders main discreprancy is being too optimistic about Qd2, Qc1, Qc2, Qb3 early on. Fritz8 is locked into a swapoff pawns plan with 3 zero deviations and two large errors. Rybka is worthy of note since it has an even spread of small errors and is going deep quickly. Crafty is not really learning much new - like Fritz it is stuck in swapoff material mode and it has slowed down a lot. I bought Rybka on the strength of recommendations in this thread, and based on watching it analyse this first test position I am favourably impressed. Its evaluations tend to remain more internally self consistent than Shredder and it sees insightful positional lines rather than getting bogged down in swapoffs. So far so good.... early days yet. Regards, Martin Brown |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | February 19th 06 05:44 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | rec.games.chess.misc (Chess General) | 0 | January 7th 06 01:24 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | January 7th 06 01:22 AM |
| Play chess online! Internet chess games. | nateg5@yahoo.com | alt.chess (Alternative Chess Group) | 0 | December 29th 05 07:04 PM |
| rec.games.chess.misc FAQ [2/4] | pribut@yahoo.com | rec.games.chess.misc (Chess General) | 0 | October 19th 05 05:37 AM |