![]() |
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Tags: average, database, size |
|
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 bits this greatly increases the number of games that can be included, but also greatly increases the maximum size of the database; my design will be greatly changed just by this choice. Anyway, if anyone interested in helping could answer these questions: What is the current game count in your database? How large is the file(s)? How long does it take to search for a given position? How long for material? [ the last two may require that you restart the program because of caching ] What is the maximum you could conceve of ever having in your database? Also, something I am VERY interested in, but I doubt a real number can be produced, is what is the average repeat rate of any given position. Currently I am assuming 2x because many in the beginning are going to be repeated a lot, but toward the end this becomes rare. I am not sure if I am over/underestimating. Perhaps someone in the know could also help me with this: I am currently thinking that an index by position would be very important, yet AFAICT scid does not do it this way. Without this, how could the database be sorted so that searches based on position happen rappidly - the only way I can see of finding a position is to linearly search the entire database and play out each and every game! The more I think on it the more I want a positional index, but this also becomes a rather large item. That is it for now, thanks for any responses. NR |
| Ads |
|
#2
|
|||
|
|||
|
Noah Roberts wrote:
define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 bits this greatly increases the number of games that can be included, but also greatly increases the maximum size of the database; my design will be greatly changed just by this choice. Get it to work first -- then you can worry about minimizing size. That is, begin with 32 bits. Disk space is only getting cheaper. What is the maximum you could conceve of ever having in your database? As many as fits a hard disk -- say 240 Gb at latest count. In other words, I'd be upset if I ran across an artifical limit of any kind. There's a fairly nice rule about limits in computer programs -- I think it's by van der Poel. It says that you only have three choices: 0, 1 or infinity. Anything else will cause problems. Infinity in this context probably means 32 bits, unless you're on a system that allows 64-bit file pointers. -- Anders Thulin http://www.algonet.se/~ath |
|
#3
|
|||
|
|||
|
Noah Roberts wrote:
I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 bits this greatly increases the number of games that can be included, but also greatly increases the maximum size of the database; my design will be greatly changed just by this choice. I'd strongly advise against using 16-bit indices: 65535 games isn't a lot at all. There are people on FICS who've played over 30,000 games, for instance. Fritz 8 ships with a database of over 150,000 games. If your database can handed millions of games well, it'll fly with databases of a mere hundred thousand. Dave. -- David Richerby Expensive Miniature Atlas (TM): it's www.chiark.greenend.org.uk/~davidr/ like a map of the world but you can hold in it your hand and it'll break the bank! |
|
#4
|
|||
|
|||
|
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk.
Using http://scid.sourceforge.net Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory) If I paste a game (say 12 moves) and open the 'tree' window it responds in 2-3 seconds. SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher). (previously 4,000,000) Material search in about 23 seconds. Have a look at SCID, which is GPL'ed. "Noah Roberts" wrote in message ... I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 bits this greatly increases the number of games that can be included, but also greatly increases the maximum size of the database; my design will be greatly changed just by this choice. Anyway, if anyone interested in helping could answer these questions: What is the current game count in your database? How large is the file(s)? How long does it take to search for a given position? How long for material? [ the last two may require that you restart the program because of caching ] What is the maximum you could conceve of ever having in your database? Also, something I am VERY interested in, but I doubt a real number can be produced, is what is the average repeat rate of any given position. Currently I am assuming 2x because many in the beginning are going to be repeated a lot, but toward the end this becomes rare. I am not sure if I am over/underestimating. Perhaps someone in the know could also help me with this: I am currently thinking that an index by position would be very important, yet AFAICT scid does not do it this way. Without this, how could the database be sorted so that searches based on position happen rappidly - the only way I can see of finding a position is to linearly search the entire database and play out each and every game! The more I think on it the more I want a positional index, but this also becomes a rather large item. That is it for now, thanks for any responses. NR |
|
#6
|
|||
|
|||
|
"Kym" wrote in message ...
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk. Using http://scid.sourceforge.net Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory) If I paste a game (say 12 moves) and open the 'tree' window it responds in 2-3 seconds. ccct ccctournament SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher). (previously 4,000,000) Material search in about 23 seconds. Have a look at SCID, which is GPL'ed. "Noah Roberts" wrote in message ... I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 bits this greatly increases the number of games that can be included, but also greatly increases the maximum size of the database; my design will be greatly changed just by this choice. Anyway, if anyone interested in helping could answer these questions: What is the current game count in your database? How large is the file(s)? How long does it take to search for a given position? How long for material? [ the last two may require that you restart the program because of caching ] What is the maximum you could conceve of ever having in your database? Also, something I am VERY interested in, but I doubt a real number can be produced, is what is the average repeat rate of any given position. Currently I am assuming 2x because many in the beginning are going to be repeated a lot, but toward the end this becomes rare. I am not sure if I am over/underestimating. Perhaps someone in the know could also help me with this: I am currently thinking that an index by position would be very important, yet AFAICT scid does not do it this way. Without this, how could the database be sorted so that searches based on position happen rappidly - the only way I can see of finding a position is to linearly search the entire database and play out each and every game! The more I think on it the more I want a positional index, but this also becomes a rather large item. That is it for now, thanks for any responses. NR The size is unlimited |
|
#7
|
|||
|
|||
|
Noah Roberts wrote in message ...
I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 16 bits are certainly not sufficient, 32 bits are be OK. Storing a complete game takes some hundred bytes, so I wouldn't waste too many thoughts about saving 1 or 2 bytes ;-) What you should do, is think of a compact move encoding. 1 byte per move would be good. Perhaps someone in the know could also help me with this: I am currently thinking that an index by position would be very important, yet AFAICT scid does not do it this way. Without this, how could the database be sorted so that searches based on position happen rappidly - the only way I can see of finding a position is to linearly search the entire database and play out each and every game! The more I think on it the more I want a positional index, but this also becomes a rather large item. I think there are some commercial databases that build position indexes. I've thought about this too for my database (jose-chess.sourceforge.net) but I didn't get very far, because such an index would become really huge and take too long to build. SCID does a linear search with some shortcuts. |
|
#8
|
|||
|
|||
|
"Kym" wrote in message ...
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk. Using http://scid.sourceforge.net Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory) If I paste a game (say 12 moves) and open the 'tree' window it responds in 2-3 seconds. SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher). (previously 4,000,000) Material search in about 23 seconds. openings story settings Video Camera Samsung Dealers Norway multi picking cracking spying warez games espionage bugs counter measures cable tv test chips red box computer hacking cable tv converter boxes videocipher smart card chip computer credit cards phrozen crew cracks telephone decoder descrambler wireless ecm programming telecode scanner crypto digital phone phreaking red boxes cable tv smart card hacking surveillance equipment my-deja.com red my nu zite box machines troubleshoots ccctournament header key fields "Pea soup" green t616 rate my camel toe Keehn chessbase converter radi caraib Techman Head clothing jewish nobel prize winners beaulieu alaska landscaping fmccoy board +british ccct island custom tripp plane sore casio mr g watches snore the hispanics i river mp3 debt http://amateurschach.de negotiation british pop charts terminator stocks theatre in the uk landlord representation @my-deja.com florida evasion CCCT refurbished computers phrocrew magnetostriction amplify himage dvd 8800 sulzer orthopedics attorneys florida www.ancestry.com microsystems cellularseparation anxiety "business decreased libido refurbished cdr downloads st-europe @my-deja.com currants worksheet organization pdf puddle clay pile Compose colonic irrigation uk jordan La Maison Picassiette supply small looks almaty "villa monaco" overland hairstyles design weld glasses 200246 2500 ebay nikon cf-d100 arizona income property abc player christmas cat florist kent panasonic portable player locksmith motivation letter winDVD restart.com unpot emortenson Hammargren article papers rpg computer games chess looney tunes back lunatic alcohol hadieh tehrany ladiest emmanuel avi test blade computer telephone with debt consolodation flight of the navigator mobile ringtone fntbl skipper lyrics hard knocked life apartheid cure hampton keywords unops Philippine Western Visayas mediacleaner new generation water rats series Karl Mas Sveiby qnx cannot finalize dvd location hitler Have a look at SCID, which is GPL'ed. "Noah Roberts" wrote in message ... I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 bits this greatly increases the number of games that can be included, but also greatly increases the maximum size of the database; my design will be greatly changed just by this choice. Anyway, if anyone interested in helping could answer these questions: What is the current game count in your database? How large is the file(s)? How long does it take to search for a given position? How long for material? [ the last two may require that you restart the program because of caching ] What is the maximum you could conceve of ever having in your database? Also, something I am VERY interested in, but I doubt a real number can be produced, is what is the average repeat rate of any given position. Currently I am assuming 2x because many in the beginning are going to be repeated a lot, but toward the end this becomes rare. I am not sure if I am over/underestimating. Perhaps someone in the know could also help me with this: I am currently thinking that an index by position would be very important, yet AFAICT scid does not do it this way. Without this, how could the database be sorted so that searches based on position happen rappidly - the only way I can see of finding a position is to linearly search the entire database and play out each and every game! The more I think on it the more I want a positional index, but this also becomes a rather large item. That is it for now, thanks for any responses. NR A database should be at least 1 MB |
|
#9
|
|||
|
|||
|
Peter Sch?fer wrote:
Noah Roberts wrote in message ... I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 16 bits are certainly not sufficient, 32 bits are be OK. Storing a complete game takes some hundred bytes, so I wouldn't waste too many thoughts about saving 1 or 2 bytes ;-) I think the problem being hit these days on the desktop is ye-olde 32 (or 31 more usually) bit file pointer limit. 32 bit indexes give you 4 billion games. Which I think is unlikely to be exceeded in the near future. But if each game is 100 bytes - and you use 32 bit file pointers that's only 4GBytes, or ~40 million games, per file and you probably don't want to code for handling multiple files in the database. Which for western chess is probably getting close or exceeded. Some languages expect you to ask nicely if you want files bigger an 2GB, as do some OSes, but this will change quickly. -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE//0yDGFXfHI9FVgYRAnCKAJ9sXwJ2e0r5ab8ZUhKnIl4LfP6rhwC eMR4V P3tWD960XF9Hlz7nS5gcxnk= =FWve -----END PGP SIGNATURE----- |
|
#10
|
|||
|
|||
|
Peter Sch?fer wrote:
Noah Roberts wrote in message ... I am interested in how many games are in your databases. I am working on the design of an OSS database for chinese chess, which currently does not exist, and would like to know what a reasonable count of games is to define my types. For instance, if I choose a game index of 16 bits I have a limit in the 10's of thousands but if I choose an index of 32 16 bits are certainly not sufficient, 32 bits are be OK. Storing a complete game takes some hundred bytes, so I wouldn't waste too many thoughts about saving 1 or 2 bytes ;-) Well, if I am going to do positional indexes then think of it this way: There are 2^32 games, assume each has average of 60 positions, positions can be repeated in games - assume average of 2x per position. That's 2^32 * 30 positions which I have smashed down to 26 bytes each just for the key. Each position has an average of two game links, which are each 4 bytes long. This is a minimum of 34 bytes * 30 * 4G, which is 4 terrabytes, less 16G, for the positional indexing alone. I could also be greately underestimating if the average is not 2+ for repeating positions. So, we are talking about huge amounts of storage. This is why I wanted to know the average size of a DB so I could get an estimate on the size of the average position index to see if it is reasonable. There are not going to be a lot of 4G game databases so terrabytes for such a DB may not be totally unreasonable. I had basically already arrived at the same conclusion everyone is stating; 16 bit limit is really too small. What you should do, is think of a compact move encoding. 1 byte per move would be good. Perhaps someone in the know could also help me with this: I am currently thinking that an index by position would be very important, yet AFAICT scid does not do it this way. Without this, how could the database be sorted so that searches based on position happen rappidly - the only way I can see of finding a position is to linearly search the entire database and play out each and every game! The more I think on it the more I want a positional index, but this also becomes a rather large item. I think there are some commercial databases that build position indexes. I've thought about this too for my database (jose-chess.sourceforge.net) but I didn't get very far, because such an index would become really huge and take too long to build. Yes, it becomes big :P I think it would be very interesting just to see how close I got to my estimates. SCID does a linear search with some shortcuts. I haven't been able to figure out which source file does the searching work. I found the in-memory tree, but not the file manipulations. You know where I need to go? NR |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| scid vs chessbase | Dr. David Kirkby | rec.games.chess.computer (Computer Chess) | 25 | January 8th 04 01:16 AM |
| Sorting database by ECO codes | andrew chapman | rec.games.chess.computer (Computer Chess) | 5 | December 17th 03 09:12 PM |
| tip: bigRAM & bigHASH in WindowsXP | Euc1id | rec.games.chess.computer (Computer Chess) | 12 | September 26th 03 06:39 PM |
| tip: remove "source" from CB database | Euc1id | rec.games.chess.computer (Computer Chess) | 0 | September 17th 03 12:06 PM |
| SCID database | Arnold Meijster | rec.games.chess.analysis (Chess Analysis) | 3 | September 6th 03 11:39 AM |