![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: average, database, size |
|
|
|
Thread Tools | Display Modes |
|
#11
|
|||
|
|||
|
Noah Roberts wrote:
There are 2^32 games, assume each has average of 60 positions, positions can be repeated in games - assume average of 2x per position. That's 2^32 * 30 positions which I have smashed down to 26 bytes each just for the key. Each position has an average of two game links, which are each 4 bytes long. This is a minimum of 34 bytes * 30 * 4G, which is 4 terrabytes, less 16G, for the positional indexing alone. I could also be greately underestimating if the average is not 2+ for repeating positions. Lot of assumptions in there. Can you verify any of them? Are averages useful to design by? It's pretty clear that the position after 1. e4 is going to cover at least 40% of the games. That means a *lot* of game links for that one. Will that upset the design, or any expectations the user will have on response time? Have you decided on any goal for searches? Not more than 10 seconds? Or is half an hour's search time OK? Or will you handle these specially -- for instance by breaking off, and saying 'too many hits'? -- Anders Thulin http://www.algonet.se/~ath |
| Ads |
|
#12
|
|||
|
|||
|
Noah Roberts wrote:
There are 2^32 games If you assume your database contains four billion games, it isn't surprising that the index is big. Dave. -- David Richerby Revolting Lotion (TM): it's like a www.chiark.greenend.org.uk/~davidr/ soothing hand lotion but it'll turn your stomach! |
|
#13
|
|||
|
|||
|
On Sat, 10 Jan 2004 00:51:14 +0000, Simon Waters
wrote: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigE7CDD57EF707188187033C2E Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Peter Sch?fer wrote: Noah Roberts wrote in message ... I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 16 bits are certainly not sufficient, 32 bits are be OK. Storing a complete game takes some hundred bytes, so I wouldn't waste too many thoughts about saving 1 or 2 bytes ;-) I think the problem being hit these days on the desktop is ye-olde 32 (or 31 more usually) bit file pointer limit. 32 bit indexes give you 4 billion games. Which I think is unlikely to be exceeded in the near future. But if each game is 100 bytes - and you use 32 bit file pointers that's only 4GBytes, or ~40 million games, per file and you probably don't want to code for handling multiple files in the database. Which for western chess is probably getting close or exceeded. The 40 million game limit might be close to being reached if every game of chess that has been played so far were recorded. In practice a database that had all master (or otherwise important) games ever recorded would probabbly be at the 4-5 million games level today. [The largest database I have heard mentioned here is about 3.5 million games.] A conservative estimate for the growth rate is is about 300,000 games per year. [As a calibration, a little over 71,000 games were added to TWIC in 2002.] This suggests that a 40 million limit won't be reached for about 100 years. Some languages expect you to ask nicely if you want files bigger an 2GB, as do some OSes, but this will change quickly. --------------enigE7CDD57EF707188187033C2E Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE//0yDGFXfHI9FVgYRAnCKAJ9sXwJ2e0r5ab8ZUhKnIl4LfP6rhwC eMR4V P3tWD960XF9Hlz7nS5gcxnk= =FWve -----END PGP SIGNATURE----- --------------enigE7CDD57EF707188187033C2E-- |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| scid vs chessbase | Dr. David Kirkby | rec.games.chess.computer (Computer Chess) | 25 | January 8th 04 02:16 AM |
| Sorting database by ECO codes | andrew chapman | rec.games.chess.computer (Computer Chess) | 5 | December 17th 03 10:12 PM |
| tip: bigRAM & bigHASH in WindowsXP | Euc1id | rec.games.chess.computer (Computer Chess) | 12 | September 26th 03 07:39 PM |
| tip: remove "source" from CB database | Euc1id | rec.games.chess.computer (Computer Chess) | 0 | September 17th 03 01:06 PM |
| SCID database | Arnold Meijster | rec.games.chess.analysis (Chess Analysis) | 3 | September 6th 03 12:39 PM |