View Single Post
  #10  
Old January 10th 04, 05:36 AM
Noah Roberts
external usenet poster
 
Posts: n/a
Default Average size of database

Peter Sch?fer wrote:
Noah Roberts wrote in message ...

I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32



16 bits are certainly not sufficient, 32 bits are be OK.

Storing a complete game takes some hundred bytes, so I wouldn't
waste too many thoughts about saving 1 or 2 bytes ;-)


Well, if I am going to do positional indexes then think of it this way:

There are 2^32 games, assume each has average of 60 positions, positions
can be repeated in games - assume average of 2x per position. That's
2^32 * 30 positions which I have smashed down to 26 bytes each just for
the key. Each position has an average of two game links, which are each
4 bytes long. This is a minimum of 34 bytes * 30 * 4G, which is 4
terrabytes, less 16G, for the positional indexing alone. I could also
be greately underestimating if the average is not 2+ for repeating
positions.

So, we are talking about huge amounts of storage. This is why I wanted
to know the average size of a DB so I could get an estimate on the size
of the average position index to see if it is reasonable. There are not
going to be a lot of 4G game databases so terrabytes for such a DB may
not be totally unreasonable.

I had basically already arrived at the same conclusion everyone is
stating; 16 bit limit is really too small.

What you should do, is think of a compact move encoding.
1 byte per move would be good.


Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.



I think there are some commercial databases that build position indexes.

I've thought about this too for my database (jose-chess.sourceforge.net)
but I didn't get very far, because such an index would become really huge
and take too long to build.


Yes, it becomes big :P I think it would be very interesting just to see
how close I got to my estimates.

SCID does a linear search with some shortcuts.


I haven't been able to figure out which source file does the searching
work. I found the in-memory tree, but not the file manipulations. You
know where I need to go?

NR

Ads
 

Loans - Tax - Loans - Car Finance - Secured Loans