![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: hash, size, table |
|
|
|
Thread Tools | Display Modes |
|
#11
|
|||
|
|||
|
"Andy Walker" wrote in message
... In article , David Richerby wrote: In a well-written engine, the optimal hash table size really is `as big as you can have it without swapping.' Whoa, it's not *quite* as simple as that. (a) Even in a well-written engine, you may want/need to "clear out" the table from time to time. Initialising several terabytes may take a little while even on a modern machine .... Actually, current programs don't do that. They leave the hash alone. They don't even initialize it when the program is first run. Either they actually use the data from the previous search, or they invalidate it at the time of access... There are several ways to invalidate at access... 1) Change all the hash keys so you get a new random value. This isn't a prefered choice because many people use the hash as a way to quickly check for repetition. 2) The hash entry contains some sort of counter as part of the 'lock'. 3) At time of access into the hash, XOR in an additional key that changes for every search. This way old entries from the previous search (or uninitialized hash memory) will very very rarely be considered valid. The odds are so remote that you are more likely to encounter a normal false hash hit during the search. And if you actually check the returned move as valid (not everybody does because it takes time and they feel the odds are acceptible), then the odds are so remote that you are accessing old data as current data that you can totally ignore it. You have a better chance of getting random bit flip errors in the memory or in the cpu. As for terabytes... well, not too many people have access to machines with terabytes of physical memory.... (b) Some of the standard operations on a hash table involve indexing and finding remainders modulo the table size. These may be more efficient if the table size is a power of 2, and if the relevant numbers fit into [in C terms] ints rather than longs. It is true that doing an AND is faster than a modulo. But the time difference isn't really all that significant considering how long main memory is going to take to access. And the savings can greatly outweigh the few extra cycles a modulo would take. But yes, many people do prefer a power of two size. And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of 16 bit compilers, but these days on 32 bit systems, both int & long will be 32 bits. And on a 32 bit system, 32 bits will be enough to access all of physical memory. (With the exception of certain hacks that Intel did for some servers to increase memory by paging it.) So there will be no need for 64 bit 'long long' which would involve a more complicated modulo. And on 64 bit systems (like the powerful chess programs all use and even many hobbiests use), int's and long's are both going to be 64 bits. (Well, actually that needs to be qualified a bit, since the C99 standard is a little vague on how to handle all 8, 16, 32, 64 and 128 bit data sizes that can be done on a 64 bit system.) And no 64 bit system is going to have more than 2^64 bytes of memory, so there's no need to get into 128 bit 'long long'. (c) If you manage to lock down all, or most, of the storage of a machine, you may perhaps be unpopular, in a multi-user environment, Well DUH. However, on a multi-user environment you are likely to have memory limitations imposed so you can't use more than your fair share. In fact, you may only be given a couple meg of physical memory and all the rest of your data will be paged in & out as needed. Definetly not a good thing for large hash tables. If you are on a system without that restriction, then it's probably not a problem for other reasons, such as verbal agreements between users, etc. etc. And if you are on a system without that restriction and you don't have agreements etc. and you do it anyway and cause problems... (shrug) Tough. That's your problem when the sysadmin catches you. with other users of the machine. *Your* program may not be swapping, but *theirs* are! [Even on a home PC, the other users may be *you*, trying to run other things, such as the engine that this program is playing against.] No. If you are runing a chess program for a reason (such as a tournament), then you know not to use it for other things. If you intend to use it for other things, then just don't tell it to use all your available memory for the hash. (Although your other programs will reduce the cpu power available to the chess program, which may cost it a full ply of search.) And if you do want to run some other program and a chess program with massive hash tables at the same time.... (shrug) Get a second computer. It's not like they are expensive. A second computer can be bought for $400 that's more than good enough for basic usage, web surfing, etc. For just a few dollars more, you could get a low end laptop instead. |
| Ads |
|
#12
|
|||
|
|||
|
In article , Guest wrote:
(a) Even in a well-written engine, you may want/need to "clear out" the table from time to time. Initialising several terabytes may take a little while even on a modern machine .... Actually, current programs don't do that. They leave the hash alone. They don't even initialize it when the program is first run. I don't have access to the source code of the commercial programs, so you seem to have an advantage in this respect. But: Either they actually use the data from the previous search, or they invalidate it at the time of access... There are several ways to invalidate at access... [...] OK. But they all involve trade-offs, inc extra computation and/or extra data stored in the table; which way the trade-offs balance would seem to depend on depth of search, expected number of reads vs writes, relative speed of memory, etc., and it would be at least a slight surprise if they balanced the same way for all engines on all hardware for all time. (b) Some of the standard operations on a hash table involve indexing and finding remainders modulo the table size. [...] It is true that doing an AND is faster than a modulo. But the time difference isn't really all that significant considering how long main memory is going to take to access. [...] OK, but DR's contention was that swapping was the *only* reason not to have the table as large as possible. Presumably, it is very likely that a table with [eg] 2^25 entries is better than one with 2^25 + 1? And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of 16 bit compilers, but these days on 32 bit systems, both int & long will be 32 bits. And on a 32 bit system, 32 bits will be enough to access all of physical memory. [...] I was recently using a system with 32-bit ints and 64-bit longs and pointers; I didn't think it particularly unusual. But perhaps it was. (c) If you manage to lock down all, or most, of the storage of a machine, you may perhaps be unpopular, in a multi-user environment, Well DUH. However, on a multi-user environment you are likely to have memory limitations imposed so you can't use more than your fair share. In fact, you may only be given a couple meg of physical memory and all the rest of your data will be paged in & out as needed. Definetly not a good thing for large hash tables. I don't think I've used a system like *that* since the days when 2M of physical memory was more than "elephantine"! ... [...] And if you are on a system without that restriction and you don't have agreements etc. and you do it anyway and cause problems... (shrug) Tough. That's your problem when the sysadmin catches you. ... And ever since those days I've *been* the sysadmin, at least for the machine on my desk. But not really the point: If you are runing a chess program for a reason (such as a tournament), then you know not to use it for other things. OK. If you intend to use it for other things, then just don't tell it to use all your available memory for the hash. (Although your other programs will reduce the cpu power available to the chess program, which may cost it a full ply of search.) Then we're back to contention with DR's claim that the larger the better until swapping occurs. Personally, I've just acquired my first "Vista" machine [no flowers, by request], with its annoying habits of firing off all sorts of stuff to and from M$ without any notice, suddenly deciding to re-arrange its discs, telling me at awkward moments to insert such-and-such disc to do dumping, and so on. But even my Unix box is liable to receive e-mail, or to start "cron" jobs. None of these things seem likely to occupy that much CPU, but they do load up some surprisingly large programs. And if you do want to run some other program and a chess program with massive hash tables at the same time.... (shrug) Get a second computer. It's not like they are expensive. A second computer can be bought for $400 that's more than good enough for basic usage, web surfing, etc. For just a few dollars more, you could get a low end laptop instead. Perhaps that's a left- vs right-pondian viewpoint. The machines you can get for that sort of price over here really aren't "good enough" at all. If they were merely last year's low-end models reduced to clear, that might be different, but we only seem to get a teensy discount for that. Instead, the very cheap ones are totally inadequate. YMMV. As it happens, Prof Kraft [the OP] really gave us no idea what hardware or software he was asking about, if any [eg, perhaps he is writing his own program, or perhaps he needs background info for an academic paper] so all bets are off. Your initial advice ["Basically, run a simple test..."] to him is sound, of course. My only concern was that one should not be quite so confident that "bigger is better" in all [non-swapping] cases. -- Andy Walker Nottingham |
|
#13
|
|||
|
|||
|
"Andy Walker" wrote in message
... In article , Guest wrote: (a) Even in a well-written engine, you may want/need to "clear out" the table from time to time. Initialising several terabytes may take a little while even on a modern machine .... Actually, current programs don't do that. They leave the hash alone. They don't even initialize it when the program is first run. I don't have access to the source code of the commercial programs, so you seem to have an advantage in this respect. But: I don't really either. But the methods have been widely discussed for years. (Well, I do have the same kind of access that everybody else has. The early Fruit & Toga, Strelka (the reverse enginered Rybka), Crafty, and so on. However I generally make a point not to look at them. I much prefer people talking about chess programming ideas and what they do rather than actually looking at the code itself.) Sure, you can clear out the memory. That'll work, and for most memory sizes it wont take an excessive amount of time. Or you can leave it in and use the results for the next search. That's been done by programs too. The current methods of invalidating at time of access have really only become used in the last few years, when processors got so much faster than memories. In the older days, probably right up to the P3 days, processor speeds were closer to main memory, and memory sizes were small enough that directly clearing it wasn't a problem. I don't think even the mainframe programs worried about it. (Although I can't say for sure, since there weren't too many mainframe chess programs left after the mid-80s.) Right off hand, I don't know when the 'invalidate at access' ideas actually came about. Either they actually use the data from the previous search, or they invalidate it at the time of access... There are several ways to invalidate at access... [...] OK. But they all involve trade-offs, inc extra computation and/or extra data stored in the table; which way the trade-offs Just an extra 64 bit XOR. Just a couple cycles per hash check. And considering you'll probably be spending 200 to 400 cycles waiting for a random access to main memory, it's neglible. Some people prefer to use a few extra bits in the hash entry to hold a search ID number. It depends on how much info you try to fit into a hash entry and what cache line size you expect. Since you don't want a hash entry (or group of entries) to span more than one cache line, you may have a few extra bits to spare for a unique search ID. balance would seem to depend on depth of search, expected number of reads vs writes, relative speed of memory, etc., and it would be at least a slight surprise if they balanced the same way for all engines on all hardware for all time. That is true. There are lots of ways to do things in a chess program. And as I said above, it's only been relatively recent that cpu's became so much faster than main memory, so in the 'old days' it just didn't matter. (b) Some of the standard operations on a hash table involve indexing and finding remainders modulo the table size. [...] It is true that doing an AND is faster than a modulo. But the time difference isn't really all that significant considering how long main memory is going to take to access. [...] OK, but DR's contention was that swapping was the *only* reason not to have the table as large as possible. Presumably, It is the main reason. But not the only one. it is very likely that a table with [eg] 2^25 entries is better than one with 2^25 + 1? Not really any great reason for it to be. There would only be a few reasons it might. 1) The extra size of the hash table. Not too likely one more entry is going to matter! Even doubling a large cache is only going to help a few ratings points once you get past a reasonable size. (I don't have any current numbers for that.) 2) The odd size of the hash table might spread out the entries more or less uniformly. Some truth to that, but for chess hashes, which use random Zobrist keys, it's not really a major issue. Probably wouldn't hurt, though. 3) It being easier to make a hash index out of the 64 bit hash value. Working with powers of two (or +1 or -1 of 2^x) is certainly more convenient. But not really a major problem. True, a 64 bit divide is likely to take around 64 cycles. Some cpu's will be faster and some slower. But slow. Some may even be "dead dog" slow. But you (or the compiler) can probably schedule things so you can do some other stuff while waiting for the divide to finish. If you are on a system where you know it has a slow divide and you still want an odd size hash, all you have to do is multiply by the reciprocal. Since you'll be working with one size hash for the whole program run, it'll be quick & easy to set up. And since we are doing this for an index rather than a math answer, you wont even need to do a 'fix-up' to get the exact right answer. It being off by one will be tolerable since it will be consistantly off by one for that index. (But if you want to do the fix-up, it'd take only another few cycles. Still much faster than a divide.) Do any current programs do it that way.... (shrug) I have no idea. I haven't heard anybody actually say. But it's a simple tweak so I would certainly assume at least some do if they want to use odd size hashes. If you have 4g of memory, the OS and the program is going to use some, so you'd be left with about 1.5g of unused memory if you used only powers of two.. Or they may just go ahead and do a power of two but do a multi-bucket hash entry. They may be able to fit 3 entries per cache line, so their cache size would actually be 3*2^x. There's a lot of room for doing things different ways. It depends a lot on personal preference and what you know about your target hardware. And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of 16 bit compilers, but these days on 32 bit systems, both int & long will be 32 bits. And on a 32 bit system, 32 bits will be enough to access all of physical memory. [...] I was recently using a system with 32-bit ints and 64-bit longs and pointers; I didn't think it particularly unusual. But perhaps it was. I'm not sure which you are meaning. 1) An unusual system where the cpu really does work with 32 bit ints but extended pointers. Well... there are quite a few non-x86 systems around. And not all of them do things the same way. Admittedly I am more familiar with regular x86 style consumer processors, rather than RISC, mainframes, workstations, etc. And the C89 & C99 standards provide quite a bit of flexibility in how big things can be. You can indeed have a 32 bit system with a 64 bit address space. The C standard doesn't really care what the system requires or even what the compiler writer chooses to do (61 bit integers and 47 bit pointers on an 8 bit micro, for example.) 2) Simply an extended 32 bit cpu, where the pointers have been hacked in the cpu to be longer. Somewhat like the segment registers of the old 8088 or the stuff Intel put into its servers before they were ready to release 64 bit cpus. I haven't used those Intel servers, but I don't think you can access the extra memory directly. You have to go through some hoops to acess the extra memory. Would the cost be excessive for that increased hash size... Don't know. That'll depend on the chess program & the cpu. However, the extra cost for each access is not going to be large. It'd probably be swamped by the slow memory access times. If main memory often takes 300 cycles for a random access, then what is an extra 4-10 cycles? Your program is still at a dead stop waiting for the hash entry to be returned. 3) A real 64 bit cpu where the compiler just happens to use 32 bits for an int but 64 for longs and pointers. In this case, it doesn't matter. It's still all 64 bit hardware. It's not like a 32 bit cpu working with 64 bit math. The only issues you have to deal with are the unspecified & inconsistant sizes of int & long. The C standard doesn't really care as long as sizeof(short) = sizeof(int) = sizeof(long). You could have all of them be 64 bits, if you wanted. It's so hopelessly screwed up that the C99 standard gave up and created brand new data types that you should use whenever size is important. I wouldn't be surprised if the next C standard actually started depricating the use of 'int'. So it is certainly possible you have a 64 bit system where the C compiler does 32 bit ints and 64 bit longs. That's just the compiler. The hardware is still in 64 bit mode, so it's not an issue. You will likely have issues if you switch compilers, though. That's why it's better to use the new data types that C99 specified. I generally never use 'int' for anything except general vars, loop indexes, etc. where there is never any possibility that it would even aproach 32 bits. (Yeah, I am kind of gambling that my programs will never be ported to a system that has less than 32 bit integers. Which is probably safe....) But I guess I shouldn't have done a blanket statement like that. (grin) I guess I sit corrected.... Now, having said all of that.... Hashes are going to be at least 64 bits. Nothing smaller is going to be acceptable. 32 bits stopped being tolerable back in the 70s or early 80s, I guess. 48 bits by the early 80s. At 64 bits, the false hash hit rate on current processors is low enough that it's not a major concern (provided the occasional false hash hit doesn't crash your engine. That's why some authors always verify the suggested move from the hash.) So you aren't really going to be getting into 'int' vs. 'long' anyway. About the only issue would be whether or not your cpu has 64 bit modulo or not. If it doesn't, then a power of two might be a better choice else you might have to do a slower 'software' based modulo. You can do a full cache line of hash buckets per entry. That way you get the advantage of using more memory than a simple power of two while at the same time keeping the hash-index conversion quick & simple. If you have terabytes of memory, then you probably have a 64 bit cpu, with a full 64 bit ALU with a full 64 bit divide (and mul) instruction. (c) If you manage to lock down all, or most, of the storage of a machine, you may perhaps be unpopular, in a multi-user environment, Well DUH. However, on a multi-user environment you are likely to have memory limitations imposed so you can't use more than your fair share. In fact, you may only be given a couple meg of physical memory and all the rest of your data will be paged in & out as needed. Definetly not a good thing for large hash tables. I don't think I've used a system like *that* since the days when 2M of physical memory was more than "elephantine"! ... Well, a couple meg of memory was just an example. But no multi-user environment today is going to let one user hog all the resources of the system. I don't think even the most primative multi-user systems in use today lets a user get away with that. They wouldn't be much of an OS if they did. If you are on a 1g system with 8 other people, no reasonable multi-user OS is going to let one user have 900meg of that and starve the rest of the users. Not without elevating their privileges. The amount of physical memory each user gets is likely to be somewhat variable, changing depending on the load etc. But the user is likely to only be 'guarnateed' a small amount of memory. Everything else will be paged to disk as needed and you can't really count on any particular amount of memory for a hash. So it wouldn't be a good idea for a user to even try to do a big hash. It just wouldn't work well, the program would run slowly and it'd play terrible because stuff would be paging to & from disk. [...] And if you are on a system without that restriction and you don't have agreements etc. and you do it anyway and cause problems... (shrug) Tough. That's your problem when the sysadmin catches you. ... And ever since those days I've *been* the sysadmin, at least for the machine on my desk. But not really the point: If you are runing a chess program for a reason (such as a tournament), then you know not to use it for other things. OK. If you intend to use it for other things, then just don't tell it to use all your available memory for the hash. (Although your other programs will reduce the cpu power available to the chess program, which may cost it a full ply of search.) Then we're back to contention with DR's claim that the larger the better until swapping occurs. Personally, I've just acquired my first "Vista" machine [no flowers, by request], with its annoying I got my first Vista system (a laptop) a few months ago.... Can't say I like it. habits of firing off all sorts of stuff to and from M$ without any notice, suddenly deciding to re-arrange its discs, telling me at You can turn off the background defrag. Along with much of the other background & scheduled stuff. awkward moments to insert such-and-such disc to do dumping, and so Never had Vista ask me for a disk. on. But even my Unix box is liable to receive e-mail, or to start "cron" jobs. None of these things seem likely to occupy that much CPU, but they do load up some surprisingly large programs. Right. On a highly variable system like that.... well, you are screwed, basically. That's just the nature of having a system that isn't suitable for running a processor intensive, memory hungry program. About the best you can do is only use a few meg and hope that'll be enough and that the OS wont page out even that small amount. Or you live with the variability. But on a system that is fairly stable. With no sudden background task etc. (like my XP box, where very little happens without me doing it), you can get pretty consistant memory & processor availability. And if you do want to run some other program and a chess program with massive hash tables at the same time.... (shrug) Get a second computer. It's not like they are expensive. A second computer can be bought for $400 that's more than good enough for basic usage, web surfing, etc. For just a few dollars more, you could get a low end laptop instead. Perhaps that's a left- vs right-pondian viewpoint. The machines you can get for that sort of price over here really aren't "good enough" I don't know where you are. I'm in the U.S. A low end system with a 1.8ghz AMD x2, 1g and 250g drive would be around $400. A little shopping around would probably improve that somewhat. Not a massive cost. Eating out at a fast food resturant 15 times. Eating at a regular resturant 10 times. 10-15 tanks of gas for the car. Most people (and even families) can blow that much money in a month or two and never even notice it. The US, Canada, & UK should have roughly comparable average household incomes. at all. If they were merely last year's low-end models reduced to clear, that might be different, but we only seem to get a teensy discount for that. Instead, the very cheap ones are totally inadequate. YMMV. Depends on what you are wanting to do. The 'average' person is going to surf the web, check email, do some instant chatting, watch a few youtube videos, etc. A more advanced person might do a little programming or video or photo editing, or a game or two. No serious number crunching or heavy 3d game playing. (Yes, they may like heavy 3d games, but they probably aren't going to be playing one in the short time the main system is occupied.) For that, most low end systems *are* good enough to be a secondary computer. That's why they are a secondary system, rather than a second main system. Most low end systems have more computing power than many of the more expensive systems of just a couple years ago. We just expect more from them and the OS (Vista) takes up more of the power. I have two computers at my desk. My laptop and a desktop. Before I got the laptop, I had two desktop systems hooked up to a KVM switch. My main system and a slower old secondary system. The second one usually didn't get used unless I was doing something on the main one that I didn't want to interrupt, or I was wanting to do a long term program that would run several days and I didn't want to tie up the main system. (As a side note, I've read that a lot of chess programmers actually use a small collection of old or cheap computers to do their chess program tuning test games. They can get a couple cheap low end computers and do more test games that way than if they spent the same amount on a high end system.) As it happens, Prof Kraft [the OP] really gave us no idea what I have no idea who the author was. I guess you recognised him, though. hardware or software he was asking about, if any [eg, perhaps he is Since he was only talking about using up to 1.5g of hash memory, it obviously was either a small desktop system or a laptop. A high end desktop would have had more memory. Perhaps up to 4g. A server would have had more. A mainframe or workstation would have had more. So it probably was just a plain desktop or laptop. writing his own program, or perhaps he needs background info for an academic paper] so all bets are off. Your initial advice ["Basically, If he's needing background info for a paper, then he's in the wrong newsgroup! I don't think he wants to hear much about Sam Sloan etc. etc... He should head over to TalkChess.com That's where the chess programmers hang out these days. run a simple test..."] to him is sound, of course. My only concern was that one should not be quite so confident that "bigger is better" in all [non-swapping] cases. Maybe a blanket statement is a little unwise. Just because of the few situations and systems and such where issues might arise. But it's not too far off though. Close enough that you could safely say something like: "Using as much memory as what's available without paging is nearly always a good idea." Just the addition of the word 'nearly' covers the few cases where it wouldn't. -- Andy Walker Nottingham |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| move table | Folkert van Heusden | rec.games.chess.computer (Computer Chess) | 2 | August 31st 07 08:35 AM |
| WEB SINGLES DIRECTORY INTRNET DATING SERVICE PERSONALS SEX DATING | asdhidsfus983jdd@yahoo.com | rec.games.chess.politics (Chess Politics) | 0 | July 6th 07 05:07 AM |
| Prodeo : Set hashtable size in Arena GUI | Jerome | rec.games.chess.computer (Computer Chess) | 0 | June 2nd 07 07:33 AM |
| Karmnick vs Fritz starts November 25 i belive | SAT W-7 | rec.games.chess.computer (Computer Chess) | 10 | October 19th 06 12:19 AM |
| The USCF breeched its contract with the membership by cutting the size of Chess Life in half | A1 | alt.chess (Alternative Chess Group) | 1 | October 28th 03 08:43 PM |