A Chess forum. ChessBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » ChessBanter forum » Chess Newsgroups » rec.games.chess.computer (Computer Chess)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Tags: , ,

Average size of database



 
 
Thread Tools Display Modes
  #1  
Old January 7th 04, 08:04 PM
Noah Roberts
external usenet poster
 
Posts: n/a
Default Average size of database

I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.

Anyway, if anyone interested in helping could answer these questions:

What is the current game count in your database?

How large is the file(s)?

How long does it take to search for a given position?

How long for material?

[ the last two may require that you restart the program because of caching ]

What is the maximum you could conceve of ever having in your database?

Also, something I am VERY interested in, but I doubt a real number can
be produced, is what is the average repeat rate of any given position.
Currently I am assuming 2x because many in the beginning are going to be
repeated a lot, but toward the end this becomes rare. I am not sure if
I am over/underestimating.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.

That is it for now, thanks for any responses.
NR

Ads
  #2  
Old January 7th 04, 08:39 PM
Anders Thulin
external usenet poster
 
Posts: n/a
Default Average size of database

Noah Roberts wrote:

define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.


Get it to work first -- then you can worry about minimizing size.
That is, begin with 32 bits. Disk space is only getting cheaper.

What is the maximum you could conceve of ever having in your database?


As many as fits a hard disk -- say 240 Gb at latest count. In other
words, I'd be upset if I ran across an artifical limit of any kind.

There's a fairly nice rule about limits in computer programs -- I think
it's by van der Poel. It says that you only have three choices: 0, 1
or infinity. Anything else will cause problems.

Infinity in this context probably means 32 bits, unless you're on a
system that allows 64-bit file pointers.

--
Anders Thulin http://www.algonet.se/~ath

  #3  
Old January 7th 04, 08:56 PM
David Richerby
external usenet poster
 
Posts: n/a
Default Average size of database

Noah Roberts wrote:
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.


I'd strongly advise against using 16-bit indices: 65535 games isn't a lot
at all. There are people on FICS who've played over 30,000 games, for
instance. Fritz 8 ships with a database of over 150,000 games. If your
database can handed millions of games well, it'll fly with databases of a
mere hundred thousand.


Dave.

--
David Richerby Expensive Miniature Atlas (TM): it's
www.chiark.greenend.org.uk/~davidr/ like a map of the world but you can
hold in it your hand and it'll break
the bank!
  #4  
Old January 9th 04, 10:18 AM
Kym
external usenet poster
 
Posts: n/a
Default Average size of database

My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk.
Using http://scid.sourceforge.net
Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory)
If I paste a game (say 12 moves) and open the 'tree' window it responds in
2-3 seconds.
SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher).
(previously 4,000,000)
Material search in about 23 seconds.

Have a look at SCID, which is GPL'ed.

"Noah Roberts" wrote in message
...
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.

Anyway, if anyone interested in helping could answer these questions:

What is the current game count in your database?

How large is the file(s)?

How long does it take to search for a given position?

How long for material?

[ the last two may require that you restart the program because of

caching ]

What is the maximum you could conceve of ever having in your database?

Also, something I am VERY interested in, but I doubt a real number can
be produced, is what is the average repeat rate of any given position.
Currently I am assuming 2x because many in the beginning are going to be
repeated a lot, but toward the end this becomes rare. I am not sure if
I am over/underestimating.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.

That is it for now, thanks for any responses.
NR



  #5  
Old January 9th 04, 05:45 PM
Bill S
external usenet poster
 
Posts: n/a
Default Average size of database

"Kym" wrote in message ...
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk.
Using http://scid.sourceforge.net
Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory)
If I paste a game (say 12 moves) and open the 'tree' window it responds in
2-3 seconds.
SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher).
(previously 4,000,000)
Material search in about 23 seconds.
phrocrew
Have a look at SCID, which is GPL'ed.

"Noah Roberts" wrote in message
...
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.

Anyway, if anyone interested in helping could answer these questions:

What is the current game count in your database?

How large is the file(s)?

How long does it take to search for a given position?

How long for material?

[ the last two may require that you restart the program because of

caching ]

What is the maximum you could conceve of ever having in your database?

Also, something I am VERY interested in, but I doubt a real number can
be produced, is what is the average repeat rate of any given position.
Currently I am assuming 2x because many in the beginning are going to be
repeated a lot, but toward the end this becomes rare. I am not sure if
I am over/underestimating.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.


That is it for now, thanks for any responses.
NR

  #6  
Old January 9th 04, 06:03 PM
Robert Hyatt
external usenet poster
 
Posts: n/a
Default Average size of database

"Kym" wrote in message ...
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk.
Using http://scid.sourceforge.net
Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory)
If I paste a game (say 12 moves) and open the 'tree' window it responds in
2-3 seconds. ccct ccctournament
SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher).
(previously 4,000,000)
Material search in about 23 seconds.

Have a look at SCID, which is GPL'ed.

"Noah Roberts" wrote in message
...
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.

Anyway, if anyone interested in helping could answer these questions:

What is the current game count in your database?

How large is the file(s)?

How long does it take to search for a given position?

How long for material?

[ the last two may require that you restart the program because of

caching ]

What is the maximum you could conceve of ever having in your database?

Also, something I am VERY interested in, but I doubt a real number can
be produced, is what is the average repeat rate of any given position.
Currently I am assuming 2x because many in the beginning are going to be
repeated a lot, but toward the end this becomes rare. I am not sure if
I am over/underestimating.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.

That is it for now, thanks for any responses.
NR


The size is unlimited
  #7  
Old January 9th 04, 06:27 PM
Peter Sch?fer
external usenet poster
 
Posts: n/a
Default Average size of database

Noah Roberts wrote in message ...
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32


16 bits are certainly not sufficient, 32 bits are be OK.

Storing a complete game takes some hundred bytes, so I wouldn't
waste too many thoughts about saving 1 or 2 bytes ;-)

What you should do, is think of a compact move encoding.
1 byte per move would be good.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.


I think there are some commercial databases that build position indexes.

I've thought about this too for my database (jose-chess.sourceforge.net)
but I didn't get very far, because such an index would become really huge
and take too long to build.

SCID does a linear search with some shortcuts.
  #8  
Old January 9th 04, 06:56 PM
Mike S.
external usenet poster
 
Posts: n/a
Default Average size of database

"Kym" wrote in message ...
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk.
Using http://scid.sourceforge.net
Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory)
If I paste a game (say 12 moves) and open the 'tree' window it responds in
2-3 seconds.
SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher).
(previously 4,000,000)
Material search in about 23 seconds.

openings story settings Video Camera Samsung Dealers Norway multi
picking cracking spying warez games espionage bugs counter measures
cable tv test chips red box computer hacking cable tv converter boxes
videocipher smart card chip computer credit cards phrozen crew cracks
telephone decoder descrambler wireless ecm programming telecode scanner
crypto digital phone phreaking red boxes cable tv
smart card hacking surveillance equipment my-deja.com red my nu zite box
machines troubleshoots ccctournament header key fields "Pea soup" green
t616 rate my camel toe Keehn chessbase converter radi caraib Techman Head
clothing jewish nobel prize winners beaulieu alaska landscaping fmccoy
board +british ccct island custom
tripp plane sore
casio mr g
watches snore the hispanics i river mp3
debt
http://amateurschach.de negotiation british pop charts terminator
stocks theatre in the uk landlord representation @my-deja.com florida
evasion CCCT refurbished computers phrocrew magnetostriction amplify
himage dvd 8800 sulzer orthopedics attorneys florida www.ancestry.com
microsystems cellularseparation anxiety "business
decreased libido refurbished cdr downloads st-europe @my-deja.com currants
worksheet organization pdf puddle clay pile
Compose
colonic irrigation uk jordan La Maison Picassiette supply small looks
almaty "villa monaco" overland hairstyles
design
weld glasses 200246 2500 ebay nikon cf-d100 arizona income property abc
player christmas cat florist kent panasonic portable player locksmith
motivation letter winDVD restart.com unpot emortenson Hammargren article
papers rpg computer games chess looney tunes back lunatic alcohol
hadieh tehrany ladiest emmanuel avi test blade computer telephone with
debt consolodation flight of the navigator mobile ringtone fntbl skipper
lyrics hard knocked life apartheid cure
hampton
keywords unops Philippine Western Visayas mediacleaner new generation
water rats series Karl Mas Sveiby qnx cannot finalize dvd location hitler

Have a look at SCID, which is GPL'ed.

"Noah Roberts" wrote in message
...
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.

Anyway, if anyone interested in helping could answer these questions:

What is the current game count in your database?

How large is the file(s)?

How long does it take to search for a given position?

How long for material?

[ the last two may require that you restart the program because of

caching ]

What is the maximum you could conceve of ever having in your database?

Also, something I am VERY interested in, but I doubt a real number can
be produced, is what is the average repeat rate of any given position.
Currently I am assuming 2x because many in the beginning are going to be
repeated a lot, but toward the end this becomes rare. I am not sure if
I am over/underestimating.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.

That is it for now, thanks for any responses.
NR


A database should be at least 1 MB
  #9  
Old January 10th 04, 12:51 AM
Simon Waters
external usenet poster
 
Posts: n/a
Default Average size of database

Peter Sch?fer wrote:
Noah Roberts wrote in message ...

I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32



16 bits are certainly not sufficient, 32 bits are be OK.

Storing a complete game takes some hundred bytes, so I wouldn't
waste too many thoughts about saving 1 or 2 bytes ;-)


I think the problem being hit these days on the desktop is ye-olde 32
(or 31 more usually) bit file pointer limit.

32 bit indexes give you 4 billion games. Which I think is unlikely to be
exceeded in the near future.

But if each game is 100 bytes - and you use 32 bit file pointers that's
only 4GBytes, or ~40 million games, per file and you probably don't want
to code for handling multiple files in the database. Which for western
chess is probably getting close or exceeded.

Some languages expect you to ask nicely if you want files bigger an 2GB,
as do some OSes, but this will change quickly.

-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE//0yDGFXfHI9FVgYRAnCKAJ9sXwJ2e0r5ab8ZUhKnIl4LfP6rhwC eMR4V
P3tWD960XF9Hlz7nS5gcxnk=
=FWve
-----END PGP SIGNATURE-----

  #10  
Old January 10th 04, 04:36 AM
Noah Roberts
external usenet poster
 
Posts: n/a
Default Average size of database

Peter Sch?fer wrote:
Noah Roberts wrote in message ...

I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32



16 bits are certainly not sufficient, 32 bits are be OK.

Storing a complete game takes some hundred bytes, so I wouldn't
waste too many thoughts about saving 1 or 2 bytes ;-)


Well, if I am going to do positional indexes then think of it this way:

There are 2^32 games, assume each has average of 60 positions, positions
can be repeated in games - assume average of 2x per position. That's
2^32 * 30 positions which I have smashed down to 26 bytes each just for
the key. Each position has an average of two game links, which are each
4 bytes long. This is a minimum of 34 bytes * 30 * 4G, which is 4
terrabytes, less 16G, for the positional indexing alone. I could also
be greately underestimating if the average is not 2+ for repeating
positions.

So, we are talking about huge amounts of storage. This is why I wanted
to know the average size of a DB so I could get an estimate on the size
of the average position index to see if it is reasonable. There are not
going to be a lot of 4G game databases so terrabytes for such a DB may
not be totally unreasonable.

I had basically already arrived at the same conclusion everyone is
stating; 16 bit limit is really too small.

What you should do, is think of a compact move encoding.
1 byte per move would be good.


Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.



I think there are some commercial databases that build position indexes.

I've thought about this too for my database (jose-chess.sourceforge.net)
but I didn't get very far, because such an index would become really huge
and take too long to build.


Yes, it becomes big :P I think it would be very interesting just to see
how close I got to my estimates.

SCID does a linear search with some shortcuts.


I haven't been able to figure out which source file does the searching
work. I found the in-memory tree, but not the file manipulations. You
know where I need to go?

NR

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
scid vs chessbase Dr. David Kirkby rec.games.chess.computer (Computer Chess) 25 January 8th 04 01:16 AM
Sorting database by ECO codes andrew chapman rec.games.chess.computer (Computer Chess) 5 December 17th 03 09:12 PM
tip: bigRAM & bigHASH in WindowsXP Euc1id rec.games.chess.computer (Computer Chess) 12 September 26th 03 06:39 PM
tip: remove "source" from CB database Euc1id rec.games.chess.computer (Computer Chess) 0 September 17th 03 12:06 PM
SCID database Arnold Meijster rec.games.chess.analysis (Chess Analysis) 3 September 6th 03 11:39 AM


All times are GMT +1. The time now is 10:50 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright ©2004-2008 ChessBanter, part of the NewsgroupBanter project.
The comments are property of their posters.
Loans - Secured Loan - Vegas Hotel - Xbox Mod Chip - Remortgages