A Chess forum. ChessBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » ChessBanter forum » Chess Newsgroups » rec.games.chess.computer (Computer Chess)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Tags: , ,

hash table size



 
 
Thread Tools Display Modes
  #1  
Old February 20th 08, 05:23 PM posted to rec.games.chess.computer
dkraft
external usenet poster
 
Posts: 2
Default hash table size

Hi,

How large should the hash table size be?
The large the better?
Or is 1536 MB wasting hash memory?
And is 512 MB enough?

Thanks
Dieter
Ads
  #2  
Old February 20th 08, 06:10 PM posted to rec.games.chess.computer
Guest[_2_]
external usenet poster
 
Posts: 76
Default hash table size

Basically, the larger the better provided it all fits into physical memory
with no chance of paging out to disk. It's not going to fit into the L2
cache anyway....

Basically, run a simple test...

With whatever test suite you have, try it with various sizes.

The odds are good you'll see that the larger it is, the better the program
performs, with the improvements slowing down the larger the table size gets.

If you have the memory and it's not being used for anything else, then why
not use it? You have nothing to loose by doing so because a larger hash
table will never make a program play worse (unless there are bugs in the
program, and a few pathological test cases.)


Incidentally, this newsgroup is no longer suitable for discussions of
computer chess. Too much spam, Sam Sloan, USCF, etc. etec. garbage.

Most chess programmers have moved over to:

http://www.talkchess.com/forum/viewf...opic_view=flat

(They have several forums, but that's the programmer's forum.)


"dkraft" wrote in message
...
Hi,

How large should the hash table size be?
The large the better?
Or is 1536 MB wasting hash memory?
And is 512 MB enough?

Thanks
Dieter




  #3  
Old February 20th 08, 08:25 PM posted to rec.games.chess.computer
Anders Thulin
external usenet poster
 
Posts: 152
Default hash table size

Guest wrote:
Basically, the larger the better provided it all fits into physical memory
with no chance of paging out to disk. It's not going to fit into the L2
cache anyway....


I don't quite agree: every hash table entry imposes some administrative
overhead that offsets the benefit of having a hash table. If that overhead
gets too large, you lose.

Just where the cutoff is depends on how the hash table was implemented.
Testing can be used to detect the cut-off, but it will probably take a lot
of careful testing to discover.

So: ask the software writers. They should have an idea if there
is a sweet spot or not.

--
Anders Thulin anders*thulin.name http://www.anders.thulin.name/
  #4  
Old February 20th 08, 08:42 PM posted to rec.games.chess.computer
Gian-Carlo Pascutto[_2_]
external usenet poster
 
Posts: 4
Default hash table size

Anders Thulin wrote:

I don't quite agree: every hash table entry imposes some
administrative overhead that offsets the benefit of having a hash
table.


What would that administrative overhead be?

--
GCP
  #5  
Old February 21st 08, 01:08 AM posted to rec.games.chess.computer
Guest[_2_]
external usenet poster
 
Posts: 76
Default hash table size

"Anders Thulin" wrote in message
...
Guest wrote:
Basically, the larger the better provided it all fits into physical
memory
with no chance of paging out to disk. It's not going to fit into the L2
cache anyway....


I don't quite agree: every hash table entry imposes some administrative
overhead that offsets the benefit of having a hash table. If that
overhead
gets too large, you lose.


The only overhead is:

HashIndex = Hash % TableSize;

That's constant regardless of table size.

And most tables are implented one of four ways.

1) Check one entry.

2) Check two entries.

3) Check one hash line, which may contain two or more entries.

4) One small table and one large table.

All four cases behave the same regardless of table size.

The only additional overhead would be an increase in the number of TLB
entries. (The cpu internal entries that convert addresses into physical
memory addresses.) But since there are already a lot of them and they
aren't going to be able to stay in the L2 cache, every hash check is nearly
guaranteed to be a full cost random memory access.

So unless your hash table is *very* small (like small enough to fit into L2
or only slightly larger), there is no extra overhead when increasing the
size, as long as you stay within physical memory.


Now, if you try to clear the hash, then a larger hash will take more time.
But that's a one time thing before each search, and most people don't
actually clear the hash. Instead other things are done so that old entries
are very unlikely to match for a new search.

Just where the cutoff is depends on how the hash table was implemented.
Testing can be used to detect the cut-off, but it will probably take a lot
of careful testing to discover.

So: ask the software writers. They should have an idea if there
is a sweet spot or not.



I am a writer. Not a major program writer, no. But I have written a few
chess programs over the years.

And this question has come up repeatedly over the years. And the answer is
always the same by the major chess program writers. (Hyatt, etc. etc.)
People who have done these kinds of tests and always get the same basic
answer.


As to whether there is a 'sweet spot', that would suggest the most benefit
for the amount of memory consumed.

Like 75% at 16m, 95% at 32m and 96% at 64m.

In that case, the 'sweet spot' would be 95% at 32m. The most 'bang for the
buck'.

Finding that 'sweet spot' would require testing. Which I did suggest he do.

But although going to 64m would cost you an extra 32m for only 1%
improvement would make it unatractive, it's still not a bad setting if you
have the extra memory to spare.

Except for the occasional random 'bad' position that might hurt from a
larger hash, the majority of time, for normal positions, more memory is
better.




  #6  
Old February 21st 08, 05:49 PM posted to rec.games.chess.computer
Anders Thulin
external usenet poster
 
Posts: 152
Default hash table size

Guest wrote:

The only overhead is:

HashIndex = Hash % TableSize;


and later:

All four cases behave the same regardless of table size.


No, there's more to it. You show it yourself when you say:

1) Check one entry.
2) Check two entries.


no further overhead here, true, except that the second may
increase code space, and so cause other kinds of overhead.
If it's implemented as a 1..2 loop, there is obviously
counting overhead.

3) Check one hash line, which may contain two or more entries.


and here you have (I presume) a bucket search, which may be more
or less complex depending on how you implement it. I don't see how
you can claim no overhead in this situation.

4) One small table and one large table.


But why stop there? The table is not static -- you don't only
do searches. You also have to insert new stuff, and weed out old.
In the single entry case, it's easy, but in the bucket hash it
tends to get hairy: a fast search usually means a slow insert.


Now all that adds up: code has to execute to search the table,
code hash to execute to change the table. As long as that code
executes faster than the code the hash replaces, all is fine. But
does it? If the hash table is poorly implemented, you may turn
out to have a bucket hash with linear searches of the buckets,
and one that resizes by increasing bucket chain length only.
I've seen those in actual use -- and turned out by professional
programmers, too.

It doesn't matter one whit what theory says, if the programmer
didn't do a good job implementing it.

In this particular case, we don't have a single clue to what code
base the question applies to. I assume it applies to code base,
of course -- if the question is purely theoretical, one of the
standard books on search algorithms will probably be the best source for
a reply.

--
Anders Thulin anders*thulin.name http://www.anders.thulin.name/
  #7  
Old February 21st 08, 06:57 PM posted to rec.games.chess.computer
Guest[_2_]
external usenet poster
 
Posts: 76
Default hash table size

"Anders Thulin" wrote in message
...
Guest wrote:

The only overhead is:

HashIndex = Hash % TableSize;


and later:

All four cases behave the same regardless of table size.


No, there's more to it. You show it yourself when you say:

1) Check one entry.
2) Check two entries.


no further overhead here, true, except that the second may
increase code space, and so cause other kinds of overhead.
If it's implemented as a 1..2 loop, there is obviously
counting overhead.


Say what???

A loop, if you did it that way, would only cost a few cycles. Say 10
cycles.

An access to main memory, which is essentially a random access at full cost,
will cost at least 200 cycles or even up to 400 cycles. Yeah, main memory
is very expensive!

Is an extra 10 cycles really significant?

And that has *nothing* to do whether you are using 32m, 128m or even 1.5g of
memory.

That's an implementation detail of how you choose to implement the hash
table. It will be a constant and will not be influenced by how big the hash
table is. Which is what the original question was.... Which was whether he
should go ahead and use 1536m or stay with 512m.



3) Check one hash line, which may contain two or more entries.


and here you have (I presume) a bucket search, which may be more


Why a bucket search?? Due to cache line sizes (the only reason to do this
kind of hash organization), you'll only have 2 or almost certainly no more
than 4 entries. No need for any fancy search when a straight comparison
will do fine.

Don't go trying to make things more complex than you need to.

or less complex depending on how you implement it. I don't see how
you can claim no overhead in this situation.


I'm not claiming no overhead for how the hash table is implemented. I'm
claiming there would be no additional overhead (or penalty) if you increase
your cache size from 512m to 1.5g, which is what the original question was.

It's even what the thread subject says... "hash table size". Not "hash
table implementation cost".

Go read the original message.

I'll save you the trouble and quote it here...

***
How large should the hash table size be?
The large the better?
Or is 1536 MB wasting hash memory?
And is 512 MB enough?
***

Again, his question wasn't about how to implement it. Whether one method
was better than the other or easier or whatever.

It was how much memory was enough. Whether using his whole 1.5g available
memory was a waste or whether he should cut it back to just 512m.



4) One small table and one large table.


But why stop there? The table is not static -- you don't only
do searches. You also have to insert new stuff, and weed out old.
In the single entry case, it's easy, but in the bucket hash it
tends to get hairy: a fast search usually means a slow insert.


Which will be a constant cost regardless of how big the hash is.

Which is what the original question was.


Now all that adds up: code has to execute to search the table,
code hash to execute to change the table. As long as that code
executes faster than the code the hash replaces, all is fine. But
does it? If the hash table is poorly implemented, you may turn


You are talking more about bugs etc. than what the original question was
about.


out to have a bucket hash with linear searches of the buckets,
and one that resizes by increasing bucket chain length only.
I've seen those in actual use -- and turned out by professional
programmers, too.

It doesn't matter one whit what theory says, if the programmer
didn't do a good job implementing it.


And how is that at all relevant to the original question?


In this particular case, we don't have a single clue to what code
base the question applies to. I assume it applies to code base,


Assuming his hash table isn't out right broken, and assuming he hasn't gone
out of his way to implement it in an idiotic maner, then the cost of the
accessing the hash entry in main memory will be larger than all the other
mainenance costs.

So, as long as his hash table manages to work, and do the job it's designed
for, it will be saving move generation costs, reducing the search bounds,
which makes the tree smaller which cuts out evals and move gens and
q-searches, etc...

So it really all comes down to:

1) Asuming his table isn't broken.
2) Asuming his table wasn't deliberately designed to be inefficient at large
memory sizes
3) the implementation details will effect the performance of the hash table
(ie: hit rate)
4) the cost of actually maintaining the table will be constant (for whatever
method he chooses) regardless of table size.
5) then it all comes down to his original question...

How about that.

Even after going through your totally off subject stuff, it all still comes
down to his original question...

Whether it was a waste of memory to use his full 1.5g of available memory or
just cut it down to 512m.


Unless your next message actually talks about the original subject (which is
also what the subject header says), I wont be replying to any more of your
trolling.

If you want to start a brand new thread that talks about implementation
ideas and methods and the overheads involved with each, and so on, then I
might be willing to read and participate in that discussion. Although
actually, it would be much more convenient to do this over at the TalkChess
forum. That's where the major chess programmers are. That way you could
get the feedback from dozens of chess programmers with 10 or 20 or more
years experience. I don't go there every day, but I'm sure there are many
others that would be willing to immediately discuss implementation issues.

But if you keep trying to change the meaning of the original question in
this thread.... bye.



of course -- if the question is purely theoretical, one of the
standard books on search algorithms will probably be the best source for
a reply.

--
Anders Thulin anders*thulin.name http://www.anders.thulin.name/



  #8  
Old February 21st 08, 10:08 PM posted to rec.games.chess.computer
David Richerby
external usenet poster
 
Posts: 2,591
Default hash table size

Anders Thulin wrote:
Guest wrote:
The only overhead is:

HashIndex = Hash % TableSize;


and later:

All four cases behave the same regardless of table size.


No, there's more to it. You show it yourself when you say:

1) Check one entry.
2) Check two entries.


no further overhead here, true, except that the second may increase
code space, and so cause other kinds of overhead. If it's
implemented as a 1..2 loop, there is obviously counting overhead.


Yes but that's insignificant and independent of table size.

3) Check one hash line, which may contain two or more entries.


and here you have (I presume) a bucket search, which may be more or
less complex depending on how you implement it. I don't see how you
can claim no overhead in this situation.


Because the buckets are of constant size, independent of the size of
the hash table.

4) One small table and one large table.


But why stop there? The table is not static -- you don't only
do searches. You also have to insert new stuff, and weed out old.


Insertion overwrites old data; it's not deleted `manually'. The hash
table is operating as a cache, not a dictionary: if the data you want
isn't in its `primary' location, you look in a constant number of
other places for it and then give up, deleting the stalest or
shallowest data you have and replacing it with the new evaluation.

Now all that adds up: code has to execute to search the table,
code hash to execute to change the table.


Yes but the speed of this code is completely independent of the size
of the hash table.

If the hash table is poorly implemented


.... then you've already lost. Any code might have been badly written
and behave stupidly as a result. Yes, it's possible that the hash
table was written by an idiot but, if it was, the engine will probably
be weak for other reasons, too.

In a well-written engine, the optimal hash table size really is `as
big as you can have it without swapping.'


Dave.

--
David Richerby Disgusting Atlas (TM): it's like
www.chiark.greenend.org.uk/~davidr/ a map of the world but it'll turn
your stomach!
  #9  
Old February 21st 08, 10:36 PM posted to rec.games.chess.computer
Guest[_2_]
external usenet poster
 
Posts: 76
Default hash table size

"David Richerby" wrote in message
...
Anders Thulin wrote:
4) One small table and one large table.


But why stop there? The table is not static -- you don't only
do searches. You also have to insert new stuff, and weed out old.


Insertion overwrites old data; it's not deleted `manually'. The hash
table is operating as a cache, not a dictionary: if the data you want


You pointing out that a chess hash table is being done as a cache rather
than a dictionary, and him saying that a table isn't static, actually
explains a few things about his message.

I bet he's thinking along the line of a a linked list or tree or b-tree or
some sort of data structure that gets updated dynamically. Maybe even a
structure who's size and organization can change during the search.

Stuff gets inserted, removed, re-organized, searched, etc.

Data structures and algorithms that are simply not used in a chess hash
table.


Anders, a chess hash table is done as a fixed sized table. It's a straight
array with no other structure. It's not a linked list or a tree or a b-tree
or any other 'fancy' data structure. Just a basic linear vector array.

To access a hash entry, you do:

HashEntry = HashTable [ HashIndex % HashSize];
if (HashEntry.Lock == MAGIC(HashIndex)) return HashEntry;
return NULL;

To save it, you do:
OldEntry = HashTable [ HashIndex % HashSize];
if (OldEntry.Draft = NewEntry.Draft)
HashTable[HashIndex % HashSize] = NewEntry;

And so on.

If you do a hash entry that actually contains two or more (a fixed number)
entries, then the code will be slightly more complicted, but still fairly
simple and a fixed cost regardless of the size of the table.

Other modifications to the hash table will change things slightly, but
again, they will be of a fixed cost that is independant of the size of the
table.


It didn't occur to me until now that you might be thinking of too high a
level of data structures.

Since this is a computer chess newsgroup, most people in here are already
familar with the basic algorithms, and I just assumed you were.



  #10  
Old February 22nd 08, 03:18 AM posted to rec.games.chess.computer
Andy Walker
external usenet poster
 
Posts: 70
Default hash table size

In article ,
David Richerby wrote:
In a well-written engine, the optimal hash table size really is `as
big as you can have it without swapping.'


Whoa, it's not *quite* as simple as that.

(a) Even in a well-written engine, you may want/need to "clear
out" the table from time to time. Initialising several terabytes
may take a little while even on a modern machine ....

(b) Some of the standard operations on a hash table involve
indexing and finding remainders modulo the table size. These may
be more efficient if the table size is a power of 2, and if the
relevant numbers fit into [in C terms] ints rather than longs.

(c) If you manage to lock down all, or most, of the storage of
a machine, you may perhaps be unpopular, in a multi-user environment,
with other users of the machine. *Your* program may not be swapping,
but *theirs* are! [Even on a home PC, the other users may be *you*,
trying to run other things, such as the engine that this program is
playing against.]

--
Andy Walker
Nottingham
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
move table Folkert van Heusden rec.games.chess.computer (Computer Chess) 2 August 31st 07 09:35 AM
WEB SINGLES DIRECTORY INTRNET DATING SERVICE PERSONALS SEX DATING asdhidsfus983jdd@yahoo.com rec.games.chess.politics (Chess Politics) 0 July 6th 07 06:07 AM
Prodeo : Set hashtable size in Arena GUI Jerome rec.games.chess.computer (Computer Chess) 0 June 2nd 07 08:33 AM
Karmnick vs Fritz starts November 25 i belive SAT W-7 rec.games.chess.computer (Computer Chess) 10 October 19th 06 01:19 AM
The USCF breeched its contract with the membership by cutting the size of Chess Life in half A1 alt.chess (Alternative Chess Group) 1 October 28th 03 09:43 PM


All times are GMT +1. The time now is 03:57 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright ©2004-2008 ChessBanter, part of the NewsgroupBanter project.
The comments are property of their posters.
Hedge Funds - Almudena grandes - Internet Advertising - Bad Credit Mortgages - Free Wordpress Blogs Hosting