![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: hash, size, table |
|
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
Hi,
How large should the hash table size be? The large the better? Or is 1536 MB wasting hash memory? And is 512 MB enough? Thanks Dieter |
| Ads |
|
#2
|
|||
|
|||
|
Basically, the larger the better provided it all fits into physical memory
with no chance of paging out to disk. It's not going to fit into the L2 cache anyway.... Basically, run a simple test... With whatever test suite you have, try it with various sizes. The odds are good you'll see that the larger it is, the better the program performs, with the improvements slowing down the larger the table size gets. If you have the memory and it's not being used for anything else, then why not use it? You have nothing to loose by doing so because a larger hash table will never make a program play worse (unless there are bugs in the program, and a few pathological test cases.) Incidentally, this newsgroup is no longer suitable for discussions of computer chess. Too much spam, Sam Sloan, USCF, etc. etec. garbage. Most chess programmers have moved over to: http://www.talkchess.com/forum/viewf...opic_view=flat (They have several forums, but that's the programmer's forum.) "dkraft" wrote in message ... Hi, How large should the hash table size be? The large the better? Or is 1536 MB wasting hash memory? And is 512 MB enough? Thanks Dieter |
|
#3
|
|||
|
|||
|
Guest wrote:
Basically, the larger the better provided it all fits into physical memory with no chance of paging out to disk. It's not going to fit into the L2 cache anyway.... I don't quite agree: every hash table entry imposes some administrative overhead that offsets the benefit of having a hash table. If that overhead gets too large, you lose. Just where the cutoff is depends on how the hash table was implemented. Testing can be used to detect the cut-off, but it will probably take a lot of careful testing to discover. So: ask the software writers. They should have an idea if there is a sweet spot or not. -- Anders Thulin anders*thulin.name http://www.anders.thulin.name/ |
|
#4
|
|||
|
|||
|
Anders Thulin wrote:
I don't quite agree: every hash table entry imposes some administrative overhead that offsets the benefit of having a hash table. What would that administrative overhead be? -- GCP |
|
#5
|
|||
|
|||
|
"Anders Thulin" wrote in message
... Guest wrote: Basically, the larger the better provided it all fits into physical memory with no chance of paging out to disk. It's not going to fit into the L2 cache anyway.... I don't quite agree: every hash table entry imposes some administrative overhead that offsets the benefit of having a hash table. If that overhead gets too large, you lose. The only overhead is: HashIndex = Hash % TableSize; That's constant regardless of table size. And most tables are implented one of four ways. 1) Check one entry. 2) Check two entries. 3) Check one hash line, which may contain two or more entries. 4) One small table and one large table. All four cases behave the same regardless of table size. The only additional overhead would be an increase in the number of TLB entries. (The cpu internal entries that convert addresses into physical memory addresses.) But since there are already a lot of them and they aren't going to be able to stay in the L2 cache, every hash check is nearly guaranteed to be a full cost random memory access. So unless your hash table is *very* small (like small enough to fit into L2 or only slightly larger), there is no extra overhead when increasing the size, as long as you stay within physical memory. Now, if you try to clear the hash, then a larger hash will take more time. But that's a one time thing before each search, and most people don't actually clear the hash. Instead other things are done so that old entries are very unlikely to match for a new search. Just where the cutoff is depends on how the hash table was implemented. Testing can be used to detect the cut-off, but it will probably take a lot of careful testing to discover. So: ask the software writers. They should have an idea if there is a sweet spot or not. I am a writer. Not a major program writer, no. But I have written a few chess programs over the years. And this question has come up repeatedly over the years. And the answer is always the same by the major chess program writers. (Hyatt, etc. etc.) People who have done these kinds of tests and always get the same basic answer. As to whether there is a 'sweet spot', that would suggest the most benefit for the amount of memory consumed. Like 75% at 16m, 95% at 32m and 96% at 64m. In that case, the 'sweet spot' would be 95% at 32m. The most 'bang for the buck'. Finding that 'sweet spot' would require testing. Which I did suggest he do. But although going to 64m would cost you an extra 32m for only 1% improvement would make it unatractive, it's still not a bad setting if you have the extra memory to spare. Except for the occasional random 'bad' position that might hurt from a larger hash, the majority of time, for normal positions, more memory is better. |
|
#6
|
|||
|
|||
|
Guest wrote:
The only overhead is: HashIndex = Hash % TableSize; and later: All four cases behave the same regardless of table size. No, there's more to it. You show it yourself when you say: 1) Check one entry. 2) Check two entries. no further overhead here, true, except that the second may increase code space, and so cause other kinds of overhead. If it's implemented as a 1..2 loop, there is obviously counting overhead. 3) Check one hash line, which may contain two or more entries. and here you have (I presume) a bucket search, which may be more or less complex depending on how you implement it. I don't see how you can claim no overhead in this situation. 4) One small table and one large table. But why stop there? The table is not static -- you don't only do searches. You also have to insert new stuff, and weed out old. In the single entry case, it's easy, but in the bucket hash it tends to get hairy: a fast search usually means a slow insert. Now all that adds up: code has to execute to search the table, code hash to execute to change the table. As long as that code executes faster than the code the hash replaces, all is fine. But does it? If the hash table is poorly implemented, you may turn out to have a bucket hash with linear searches of the buckets, and one that resizes by increasing bucket chain length only. I've seen those in actual use -- and turned out by professional programmers, too. It doesn't matter one whit what theory says, if the programmer didn't do a good job implementing it. In this particular case, we don't have a single clue to what code base the question applies to. I assume it applies to code base, of course -- if the question is purely theoretical, one of the standard books on search algorithms will probably be the best source for a reply. -- Anders Thulin anders*thulin.name http://www.anders.thulin.name/ |
|
#7
|
|||
|
|||
|
"Anders Thulin" wrote in message
... Guest wrote: The only overhead is: HashIndex = Hash % TableSize; and later: All four cases behave the same regardless of table size. No, there's more to it. You show it yourself when you say: 1) Check one entry. 2) Check two entries. no further overhead here, true, except that the second may increase code space, and so cause other kinds of overhead. If it's implemented as a 1..2 loop, there is obviously counting overhead. Say what??? A loop, if you did it that way, would only cost a few cycles. Say 10 cycles. An access to main memory, which is essentially a random access at full cost, will cost at least 200 cycles or even up to 400 cycles. Yeah, main memory is very expensive! Is an extra 10 cycles really significant? And that has *nothing* to do whether you are using 32m, 128m or even 1.5g of memory. That's an implementation detail of how you choose to implement the hash table. It will be a constant and will not be influenced by how big the hash table is. Which is what the original question was.... Which was whether he should go ahead and use 1536m or stay with 512m. 3) Check one hash line, which may contain two or more entries. and here you have (I presume) a bucket search, which may be more Why a bucket search?? Due to cache line sizes (the only reason to do this kind of hash organization), you'll only have 2 or almost certainly no more than 4 entries. No need for any fancy search when a straight comparison will do fine. Don't go trying to make things more complex than you need to. or less complex depending on how you implement it. I don't see how you can claim no overhead in this situation. I'm not claiming no overhead for how the hash table is implemented. I'm claiming there would be no additional overhead (or penalty) if you increase your cache size from 512m to 1.5g, which is what the original question was. It's even what the thread subject says... "hash table size". Not "hash table implementation cost". Go read the original message. I'll save you the trouble and quote it here... *** How large should the hash table size be? The large the better? Or is 1536 MB wasting hash memory? And is 512 MB enough? *** Again, his question wasn't about how to implement it. Whether one method was better than the other or easier or whatever. It was how much memory was enough. Whether using his whole 1.5g available memory was a waste or whether he should cut it back to just 512m. 4) One small table and one large table. But why stop there? The table is not static -- you don't only do searches. You also have to insert new stuff, and weed out old. In the single entry case, it's easy, but in the bucket hash it tends to get hairy: a fast search usually means a slow insert. Which will be a constant cost regardless of how big the hash is. Which is what the original question was. Now all that adds up: code has to execute to search the table, code hash to execute to change the table. As long as that code executes faster than the code the hash replaces, all is fine. But does it? If the hash table is poorly implemented, you may turn You are talking more about bugs etc. than what the original question was about. out to have a bucket hash with linear searches of the buckets, and one that resizes by increasing bucket chain length only. I've seen those in actual use -- and turned out by professional programmers, too. It doesn't matter one whit what theory says, if the programmer didn't do a good job implementing it. And how is that at all relevant to the original question? In this particular case, we don't have a single clue to what code base the question applies to. I assume it applies to code base, Assuming his hash table isn't out right broken, and assuming he hasn't gone out of his way to implement it in an idiotic maner, then the cost of the accessing the hash entry in main memory will be larger than all the other mainenance costs. So, as long as his hash table manages to work, and do the job it's designed for, it will be saving move generation costs, reducing the search bounds, which makes the tree smaller which cuts out evals and move gens and q-searches, etc... So it really all comes down to: 1) Asuming his table isn't broken. 2) Asuming his table wasn't deliberately designed to be inefficient at large memory sizes 3) the implementation details will effect the performance of the hash table (ie: hit rate) 4) the cost of actually maintaining the table will be constant (for whatever method he chooses) regardless of table size. 5) then it all comes down to his original question... How about that. Even after going through your totally off subject stuff, it all still comes down to his original question... Whether it was a waste of memory to use his full 1.5g of available memory or just cut it down to 512m. Unless your next message actually talks about the original subject (which is also what the subject header says), I wont be replying to any more of your trolling. If you want to start a brand new thread that talks about implementation ideas and methods and the overheads involved with each, and so on, then I might be willing to read and participate in that discussion. Although actually, it would be much more convenient to do this over at the TalkChess forum. That's where the major chess programmers are. That way you could get the feedback from dozens of chess programmers with 10 or 20 or more years experience. I don't go there every day, but I'm sure there are many others that would be willing to immediately discuss implementation issues. But if you keep trying to change the meaning of the original question in this thread.... bye. of course -- if the question is purely theoretical, one of the standard books on search algorithms will probably be the best source for a reply. -- Anders Thulin anders*thulin.name http://www.anders.thulin.name/ |
|
#8
|
|||
|
|||
|
Anders Thulin wrote:
Guest wrote: The only overhead is: HashIndex = Hash % TableSize; and later: All four cases behave the same regardless of table size. No, there's more to it. You show it yourself when you say: 1) Check one entry. 2) Check two entries. no further overhead here, true, except that the second may increase code space, and so cause other kinds of overhead. If it's implemented as a 1..2 loop, there is obviously counting overhead. Yes but that's insignificant and independent of table size. 3) Check one hash line, which may contain two or more entries. and here you have (I presume) a bucket search, which may be more or less complex depending on how you implement it. I don't see how you can claim no overhead in this situation. Because the buckets are of constant size, independent of the size of the hash table. 4) One small table and one large table. But why stop there? The table is not static -- you don't only do searches. You also have to insert new stuff, and weed out old. Insertion overwrites old data; it's not deleted `manually'. The hash table is operating as a cache, not a dictionary: if the data you want isn't in its `primary' location, you look in a constant number of other places for it and then give up, deleting the stalest or shallowest data you have and replacing it with the new evaluation. Now all that adds up: code has to execute to search the table, code hash to execute to change the table. Yes but the speed of this code is completely independent of the size of the hash table. If the hash table is poorly implemented .... then you've already lost. Any code might have been badly written and behave stupidly as a result. Yes, it's possible that the hash table was written by an idiot but, if it was, the engine will probably be weak for other reasons, too. In a well-written engine, the optimal hash table size really is `as big as you can have it without swapping.' Dave. -- David Richerby Disgusting Atlas (TM): it's like www.chiark.greenend.org.uk/~davidr/ a map of the world but it'll turn your stomach! |
|
#9
|
|||
|
|||
|
"David Richerby" wrote in message
... Anders Thulin wrote: 4) One small table and one large table. But why stop there? The table is not static -- you don't only do searches. You also have to insert new stuff, and weed out old. Insertion overwrites old data; it's not deleted `manually'. The hash table is operating as a cache, not a dictionary: if the data you want You pointing out that a chess hash table is being done as a cache rather than a dictionary, and him saying that a table isn't static, actually explains a few things about his message. I bet he's thinking along the line of a a linked list or tree or b-tree or some sort of data structure that gets updated dynamically. Maybe even a structure who's size and organization can change during the search. Stuff gets inserted, removed, re-organized, searched, etc. Data structures and algorithms that are simply not used in a chess hash table. Anders, a chess hash table is done as a fixed sized table. It's a straight array with no other structure. It's not a linked list or a tree or a b-tree or any other 'fancy' data structure. Just a basic linear vector array. To access a hash entry, you do: HashEntry = HashTable [ HashIndex % HashSize]; if (HashEntry.Lock == MAGIC(HashIndex)) return HashEntry; return NULL; To save it, you do: OldEntry = HashTable [ HashIndex % HashSize]; if (OldEntry.Draft = NewEntry.Draft) HashTable[HashIndex % HashSize] = NewEntry; And so on. If you do a hash entry that actually contains two or more (a fixed number) entries, then the code will be slightly more complicted, but still fairly simple and a fixed cost regardless of the size of the table. Other modifications to the hash table will change things slightly, but again, they will be of a fixed cost that is independant of the size of the table. It didn't occur to me until now that you might be thinking of too high a level of data structures. Since this is a computer chess newsgroup, most people in here are already familar with the basic algorithms, and I just assumed you were. |
|
#10
|
|||
|
|||
|
In article ,
David Richerby wrote: In a well-written engine, the optimal hash table size really is `as big as you can have it without swapping.' Whoa, it's not *quite* as simple as that. (a) Even in a well-written engine, you may want/need to "clear out" the table from time to time. Initialising several terabytes may take a little while even on a modern machine .... (b) Some of the standard operations on a hash table involve indexing and finding remainders modulo the table size. These may be more efficient if the table size is a power of 2, and if the relevant numbers fit into [in C terms] ints rather than longs. (c) If you manage to lock down all, or most, of the storage of a machine, you may perhaps be unpopular, in a multi-user environment, with other users of the machine. *Your* program may not be swapping, but *theirs* are! [Even on a home PC, the other users may be *you*, trying to run other things, such as the engine that this program is playing against.] -- Andy Walker Nottingham |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| move table | Folkert van Heusden | rec.games.chess.computer (Computer Chess) | 2 | August 31st 07 09:35 AM |
| WEB SINGLES DIRECTORY INTRNET DATING SERVICE PERSONALS SEX DATING | asdhidsfus983jdd@yahoo.com | rec.games.chess.politics (Chess Politics) | 0 | July 6th 07 06:07 AM |
| Prodeo : Set hashtable size in Arena GUI | Jerome | rec.games.chess.computer (Computer Chess) | 0 | June 2nd 07 08:33 AM |
| Karmnick vs Fritz starts November 25 i belive | SAT W-7 | rec.games.chess.computer (Computer Chess) | 10 | October 19th 06 01:19 AM |
| The USCF breeched its contract with the membership by cutting the size of Chess Life in half | A1 | alt.chess (Alternative Chess Group) | 1 | October 28th 03 09:43 PM |