A Chess forum. ChessBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » ChessBanter forum » Chess Newsgroups » rec.games.chess.computer (Computer Chess)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Tags: , ,

Average size of database



 
 
Thread Tools Display Modes
  #11  
Old January 10th 04, 10:19 AM
Anders Thulin
external usenet poster
 
Posts: n/a
Default Average size of database

Noah Roberts wrote:

There are 2^32 games, assume each has average of 60 positions, positions
can be repeated in games - assume average of 2x per position. That's
2^32 * 30 positions which I have smashed down to 26 bytes each just for
the key. Each position has an average of two game links, which are each
4 bytes long. This is a minimum of 34 bytes * 30 * 4G, which is 4
terrabytes, less 16G, for the positional indexing alone. I could also
be greately underestimating if the average is not 2+ for repeating
positions.


Lot of assumptions in there. Can you verify any of them?

Are averages useful to design by? It's pretty clear that
the position after 1. e4 is going to cover at least 40% of the games.
That means a *lot* of game links for that one. Will that upset
the design, or any expectations the user will have on response time?

Have you decided on any goal for searches? Not more than 10 seconds?
Or is half an hour's search time OK? Or will you handle these
specially -- for instance by breaking off, and saying 'too many hits'?

--
Anders Thulin http://www.algonet.se/~ath

Ads
  #12  
Old January 10th 04, 06:55 PM
David Richerby
external usenet poster
 
Posts: n/a
Default Average size of database

Noah Roberts wrote:
There are 2^32 games


If you assume your database contains four billion games, it isn't
surprising that the index is big.


Dave.

--
David Richerby Revolting Lotion (TM): it's like a
www.chiark.greenend.org.uk/~davidr/ soothing hand lotion but it'll turn
your stomach!
  #13  
Old January 12th 04, 11:20 PM
Mike Ogush
external usenet poster
 
Posts: n/a
Default Average size of database

On Sat, 10 Jan 2004 00:51:14 +0000, Simon Waters
wrote:

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigE7CDD57EF707188187033C2E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Peter Sch?fer wrote:
Noah Roberts wrote in message ...

I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32



16 bits are certainly not sufficient, 32 bits are be OK.

Storing a complete game takes some hundred bytes, so I wouldn't
waste too many thoughts about saving 1 or 2 bytes ;-)


I think the problem being hit these days on the desktop is ye-olde 32
(or 31 more usually) bit file pointer limit.

32 bit indexes give you 4 billion games. Which I think is unlikely to be
exceeded in the near future.

But if each game is 100 bytes - and you use 32 bit file pointers that's
only 4GBytes, or ~40 million games, per file and you probably don't want
to code for handling multiple files in the database. Which for western
chess is probably getting close or exceeded.


The 40 million game limit might be close to being reached if every
game of chess that has been played so far were recorded. In practice
a database that had all master (or otherwise important) games ever
recorded would probabbly be at the 4-5 million games level today.
[The largest database I have heard mentioned here is about 3.5 million
games.] A conservative estimate for the growth rate is is about
300,000 games per year. [As a calibration, a little over 71,000 games
were added to TWIC in 2002.] This suggests that a 40 million limit
won't be reached for about 100 years.


Some languages expect you to ask nicely if you want files bigger an 2GB,
as do some OSes, but this will change quickly.

--------------enigE7CDD57EF707188187033C2E
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE//0yDGFXfHI9FVgYRAnCKAJ9sXwJ2e0r5ab8ZUhKnIl4LfP6rhwC eMR4V
P3tWD960XF9Hlz7nS5gcxnk=
=FWve
-----END PGP SIGNATURE-----

--------------enigE7CDD57EF707188187033C2E--


  #14  
Old January 20th 04, 01:25 PM
amateurschach
external usenet poster
 
Posts: n/a
Default Average probivnoi@yahoo.com size of database


(Mike S.) wrote in message . com...
"Kym" wrote in message ...
My main corr. analysis DB has 3,445,151 games and uses 496Mb of disk.
Using
http://scid.sourceforge.net
Takes about 6 seconds to open (2.6Ghz P IV 1Gb memory)
If I paste a game (say 12 moves) and open the 'tree' window it responds in
2-3 seconds.
SCID 3.5 allows 16,000,000 games in any one DB (this can be changed higher).
(previously 4,000,000)
Material search in about 23 seconds.

openings story settings Video Camera Samsung Dealers Norway multi
picking cracking spying warez games espionage bugs counter measures
cable tv test chips red box computer hacking cable tv converter boxes
videocipher smart card chip computer credit cards phrozen crew cracks
telephone decoder descrambler wireless ecm programming telecode scanner
crypto digital phone phreaking red boxes cable tv
smart card hacking surveillance equipment my-deja.com red my nu zite box
machines troubleshoots ccctournament header key fields "Pea soup" green
t616 rate my camel toe Keehn chessbase converter radi caraib Techman Head
clothing jewish nobel prize winners beaulieu alaska landscaping fmccoy
board +british ccct island custom
tripp plane sore
casio mr g
watches snore the hispanics i river mp3
debt
http://amateurschach.de negotiation british pop charts terminator
stocks theatre in the uk landlord representation @my-deja.com florida
evasion CCCT refurbished computers phrocrew magnetostriction amplify
himage dvd 8800 sulzer orthopedics attorneys florida www.ancestry.com
microsystems cellularseparation anxiety "business
decreased libido refurbished cdr downloads st-europe @my-deja.com currants
worksheet organization pdf puddle clay pile
Compose
colonic irrigation uk jordan La Maison Picassiette supply small looks
almaty "villa monaco" overland hairstyles
design
weld glasses 200246 2500 ebay nikon cf-d100 arizona income property abc
player christmas cat florist kent panasonic portable player locksmith
motivation letter winDVD restart.com unpot emortenson Hammargren article
papers rpg computer games chess looney tunes back lunatic alcohol
hadieh tehrany ladiest emmanuel avi test blade computer telephone with
debt consolodation flight of the navigator mobile ringtone fntbl skipper
lyrics hard knocked life apartheid cure
hampton
keywords unops Philippine Western Visayas mediacleaner new generation
water rats series Karl Mas Sveiby qnx cannot finalize dvd location hitler

Have a look at SCID, which is GPL'ed.

"Noah Roberts" wrote in message
...
I am interested in how many games are in your databases. I am working
on the design of an OSS database for chinese chess, which currently does
not exist, and would like to know what a reasonable count of games is to
define my types. For instance, if I choose a game index of 16 bits I
have a limit in the 10's of thousands but if I choose an index of 32
bits this greatly increases the number of games that can be included,
but also greatly increases the maximum size of the database; my design
will be greatly changed just by this choice.

Anyway, if anyone interested in helping could answer these questions:

What is the current game count in your database?

How large is the file(s)?

How long does it take to search for a given position?

How long for material?

[ the last two may require that you restart the program because of

caching ]

What is the maximum you could conceve of ever having in your database?

Also, something I am VERY interested in, but I doubt a real number can
be produced, is what is the average repeat rate of any given position.
Currently I am assuming 2x because many in the beginning are going to be
repeated a lot, but toward the end this becomes rare. I am not sure if
I am over/underestimating.

Perhaps someone in the know could also help me with this: I am
currently thinking that an index by position would be very important,
yet AFAICT scid does not do it this way. Without this, how could the
database be sorted so that searches based on position happen rappidly -
the only way I can see of finding a position is to linearly search the
entire database and play out each and every game! The more I think on
it the more I want a positional index, but this also becomes a rather
large item.

That is it for now, thanks for any responses.
NR


A database should be at least 1 MB

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
scid vs chessbase Dr. David Kirkby rec.games.chess.computer (Computer Chess) 25 January 8th 04 02:16 AM
Sorting database by ECO codes andrew chapman rec.games.chess.computer (Computer Chess) 5 December 17th 03 10:12 PM
tip: bigRAM & bigHASH in WindowsXP Euc1id rec.games.chess.computer (Computer Chess) 12 September 26th 03 07:39 PM
tip: remove "source" from CB database Euc1id rec.games.chess.computer (Computer Chess) 0 September 17th 03 01:06 PM
SCID database Arnold Meijster rec.games.chess.analysis (Chess Analysis) 3 September 6th 03 12:39 PM


All times are GMT +1. The time now is 03:15 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright ©2004-2008 ChessBanter, part of the NewsgroupBanter project.
The comments are property of their posters.
Buy Anything On eBay - Problem Mortgage - Quality Hosting - Affordable Prices - Auto Loans - Mortgage