Reply
 
LinkBack Thread Tools Display Modes
  #1   Report Post  
Old August 17th 07, 01:52 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2007
Posts: 43
Default Statistical significance of score differences - new release of ChessDB

I've made a new release of ChessDB, a chess database based on Scid from
Shane Hudson. There is a fork too of ChessDB from the lying
plagiarist Pascal Georges who passes of work of mine as his own, as I
have documented at:

http://groups.google.co.uk/group/rec...b9e3c5e4e7266a

Anyway, the main reason for my post it to introduce a new feature in
ChessDB and I would be interested in comments from others about this.

Basically ChessDB has a tree window, like many databases (Scid,
ChessBase, Chess Assistant etc). But I've added code that will determine
if the difference in score between two moves is really real
('statistically significant') , or if it could be due to chance. (If you
toss a coin 20 times and it lands on heads 12 times and tails 8 times,
you can't deduce the coin is biased) - such a small difference can be
due to chance with only 20 tosses. In contrast, if it landed on the head
19 times and the tail only once, you be pretty sure it is biased.

It is assumed that the difference in score between two moves is not due
to chance if the probability of the observed (or any larger) difference
being due to chance, with no underlying reason, is less than 0.05.

See:

http://chessdb.sourceforge.net/tutor...earch_tree.php

Some interesting observations can be seen looking at my database of 3.5
million games

1) 1.d4 scores better than 1.e4 with a p-value of less than 0.01. In
other words, the chance of the observed or any large score difference
being due to chance is less than 1%.

2) In my database, the opening move with the highest score is 1.Na3.
Despite the score being a lot higher than 1.e4 or 1.d4, this is *not*
statistically significant. In other words, whilst we can't say it 1.Na3
is any better or worst than 1.e4 or 1.d4, we can say that that there is
a high probability that the observed difference is due to chance. As
such, we should pay very little attention to the relative scores.

3) In my database, 3.Nd2 (Tarrash variation) in the French (1.e4 e6 2.d4
d5) scores higher than 3.Nc3 (main line) and is statistically
significant at the 5% level, but not at the 1% level. In other words, we
can be 95% sure there is a real difference in score between 3.Nf3 and
3.Nd2 in my database, but we can't be 99% sure.

In contrast, the difference in scores of 3.Nf3 (or 3.Nd2) to the
exchange (3.exd5) or advance (3.e5) variations is statistically
significant at the p=0.01 level, so there is less than a 1% chance the
observed difference in score is due to chance and can be more than 99%
sure there is an underlying reason. (The reason can't be determined in
ChessDB, but one might strongly suspect the advance or exchange are
inferior for white than the main line (3.Nc3) or Tarrasch (3.Nd2).

(I personally have a much better success rate with the Tarrasch than the
advance too. I will not contemplate the exchange as it is too boring and
while it is drawish, it scores pretty low for white.)

Anyone with a reasonable knowledge of statistics might guess I am using
a chi-squared test, which is what I am doing. Chi-squared is calculated
then the p-value determined from that, using an algorithm good to 4
decimal places. I intend changing that to a more accurate approximation
soon.

Other changes in ChessDB include

* Native support for UCI engines (using some code from P. Georges, which
I fully acknowledge, unlike him when he uses my code).

* The facility to download a database of either 100,000 or 3.5 million
games. The database is split into multiple parts for easy downloading,
then reconstructed by ChessDB and an MD5 checksum used to verify the
database has not been corrupted in transmission.

* Quickly download games from the history of anyone on ICC or FICS.

* Numerous other changes documented at:
http://chessdb.sourceforge.net/Scid/



I'm interested in what others think of the idea of testing the
statistical significance in the difference of two moves. To the best of
my knowledge, no other chess database does this, yet it seems to me
quite logical.
  #2   Report Post  
Old August 17th 07, 01:55 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2007
Posts: 43
Default Statistical significance of score differences - new release ofChessDB

Dave wrote:
I've made a new release of ChessDB, a chess database based on Scid from
Shane Hudson.


I forgot to say, if you want to try ChessDB or use it to download a
large database, see:

http://chessdb.sourceforge.net/downloads/
  #3   Report Post  
Old August 17th 07, 05:10 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Dec 2005
Posts: 101
Default Statistical significance of score differences - new release of ChessDB

That is a very cool idea for a feature. I haven't used databases much
in general, so I don't know much about existing features, but I could
see how something like that could be very useful to a master trying to
refine their opening preparation. At my patzer level (1400ish USCF),
it's something that could be cool just for the sake of curiousity.

So is this a free program? As I said, I don't know much about chess
databases (which is ironic, since I specialize in SQL databases in my
profession). I'd like to get a database program and a large database
of master and GM games eventually, so I can see how better players
than me handle certain openings and the positions that result from
them. For now, I tend to just go to chesslab.com and look at games
there in the openings I play.

--Fromper

  #4   Report Post  
Old August 17th 07, 05:18 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2007
Posts: 2
Default Statistical significance of score differences - new release ofChessDB

Well, if you like chess databases with a lot of features you can grab
Scid at http://scid.sourceforge.net or http://prolinux.free.fr/scid (the
lattest with some training features, and the ability to play against
various engines).

Richard a écrit :
That is a very cool idea for a feature. I haven't used databases much
in general, so I don't know much about existing features, but I could
see how something like that could be very useful to a master trying to
refine their opening preparation. At my patzer level (1400ish USCF),
it's something that could be cool just for the sake of curiousity.

So is this a free program? As I said, I don't know much about chess
databases (which is ironic, since I specialize in SQL databases in my
profession). I'd like to get a database program and a large database
of master and GM games eventually, so I can see how better players
than me handle certain openings and the positions that result from
them. For now, I tend to just go to chesslab.com and look at games
there in the openings I play.

--Fromper

  #5   Report Post  
Old August 17th 07, 05:38 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: May 2006
Posts: 155
Default Statistical significance of score differences - new release ofChessDB

At least other people around the world clearly got who is that guy !!

Pascal

http://prolinux.free.fr/alex_guestbook/
Dave a écrit :
I've made a new release of ChessDB, a chess database based on Scid from
Shane Hudson. There is a fork too of ChessDB from the lying
plagiarist Pascal Georges who passes of work of mine as his own, as I
have documented at:

http://groups.google.co.uk/group/rec...b9e3c5e4e7266a


Anyway, the main reason for my post it to introduce a new feature in
ChessDB and I would be interested in comments from others about this.

Basically ChessDB has a tree window, like many databases (Scid,
ChessBase, Chess Assistant etc). But I've added code that will determine
if the difference in score between two moves is really real
('statistically significant') , or if it could be due to chance. (If you
toss a coin 20 times and it lands on heads 12 times and tails 8 times,
you can't deduce the coin is biased) - such a small difference can be
due to chance with only 20 tosses. In contrast, if it landed on the head
19 times and the tail only once, you be pretty sure it is biased.

It is assumed that the difference in score between two moves is not due
to chance if the probability of the observed (or any larger) difference
being due to chance, with no underlying reason, is less than 0.05.

See:

http://chessdb.sourceforge.net/tutor...earch_tree.php

Some interesting observations can be seen looking at my database of 3.5
million games

1) 1.d4 scores better than 1.e4 with a p-value of less than 0.01. In
other words, the chance of the observed or any large score difference
being due to chance is less than 1%.

2) In my database, the opening move with the highest score is 1.Na3.
Despite the score being a lot higher than 1.e4 or 1.d4, this is *not*
statistically significant. In other words, whilst we can't say it 1.Na3
is any better or worst than 1.e4 or 1.d4, we can say that that there is
a high probability that the observed difference is due to chance. As
such, we should pay very little attention to the relative scores.

3) In my database, 3.Nd2 (Tarrash variation) in the French (1.e4 e6 2.d4
d5) scores higher than 3.Nc3 (main line) and is statistically
significant at the 5% level, but not at the 1% level. In other words, we
can be 95% sure there is a real difference in score between 3.Nf3 and
3.Nd2 in my database, but we can't be 99% sure.

In contrast, the difference in scores of 3.Nf3 (or 3.Nd2) to the
exchange (3.exd5) or advance (3.e5) variations is statistically
significant at the p=0.01 level, so there is less than a 1% chance the
observed difference in score is due to chance and can be more than 99%
sure there is an underlying reason. (The reason can't be determined in
ChessDB, but one might strongly suspect the advance or exchange are
inferior for white than the main line (3.Nc3) or Tarrasch (3.Nd2).

(I personally have a much better success rate with the Tarrasch than the
advance too. I will not contemplate the exchange as it is too boring and
while it is drawish, it scores pretty low for white.)

Anyone with a reasonable knowledge of statistics might guess I am using
a chi-squared test, which is what I am doing. Chi-squared is calculated
then the p-value determined from that, using an algorithm good to 4
decimal places. I intend changing that to a more accurate approximation
soon.

Other changes in ChessDB include

* Native support for UCI engines (using some code from P. Georges, which
I fully acknowledge, unlike him when he uses my code).

* The facility to download a database of either 100,000 or 3.5 million
games. The database is split into multiple parts for easy downloading,
then reconstructed by ChessDB and an MD5 checksum used to verify the
database has not been corrupted in transmission.

* Quickly download games from the history of anyone on ICC or FICS.

* Numerous other changes documented at:
http://chessdb.sourceforge.net/Scid/



I'm interested in what others think of the idea of testing the
statistical significance in the difference of two moves. To the best of
my knowledge, no other chess database does this, yet it seems to me
quite logical.



  #6   Report Post  
Old August 17th 07, 05:41 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Nov 2006
Posts: 364
Default Statistical significance of score differences - new release of ChessDB

Den 2007-08-17 14:52:16 skrev Dave :

I've made a new release of ChessDB, a chess database based on Scid from
Shane Hudson. There is a fork too of ChessDB from the lying
plagiarist Pascal Georges who passes of work of mine as his own, as I
have documented at:..........



Befrore I download it I would like to know whether it handles transpositions, i.e., are
*unplayed* moves visible in the tree that lead to a played position?

Mats

  #7   Report Post  
Old August 17th 07, 06:15 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Dec 2005
Posts: 383
Default Statistical significance of score differences - new release ofChessDB

17.08.2007 14:52, Dave:
But I've added code that will determine
if the difference in score between two moves is really real
('statistically significant') , or if it could be due to chance.


Two important factors are completely ignored in this calculation:
development over time and strength of the involved players. The first is
important because once a refuation or at least a very strong answer for
a move is found, its frequency drops. So, the old statistics of this
move stay unchanged over a long time - possibly with a favourable result
for this move, although it might be well known that this move should be
avoided.

The second factor is quite obvious: games of higher rated players tend
to be less erratic, so that those results are more expressive.
Therefore, if I look at numbers, I check the average Elo and the
performance.

Greetings,
Ralf
  #8   Report Post  
Old August 17th 07, 08:25 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2007
Posts: 43
Default Statistical significance of score differences - new release ofChessDB

Anonymous wrote:
Well, if you like chess databases with a lot of features you can grab
Scid at http://scid.sourceforge.net or http://prolinux.free.fr/scid (the
lattest with some training features, and the ability to play against
various engines).


And the latter of which has code taken from ChessDB but not acknowledged.
  #9   Report Post  
Old August 17th 07, 08:41 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2007
Posts: 43
Default Statistical significance of score differences - new release ofChessDB

Richard wrote:
That is a very cool idea for a feature.


I'm glad you like it.

I haven't used databases much
in general, so I don't know much about existing features, but I could
see how something like that could be very useful to a master trying to
refine their opening preparation. At my patzer level (1400ish USCF),
it's something that could be cool just for the sake of curiousity.

So is this a free program?


Yes, its open source and free.

homepage
http://chessdb.sourceforge.net/

tutorial
http://chessdb.sourceforge.net/tutorial/

download page:
http://chessdb.sourceforge.net/downloads/

As I said, I don't know much about chess
databases (which is ironic, since I specialize in SQL databases in my
profession). I'd like to get a database program and a large database
of master and GM games eventually, so I can see how better players
than me handle certain openings and the positions that result from
them.


If you download it, then go to the Tools menu, Select "Download games
from" then select "3.5 million games site #1" it will download you a 3.5
million game database.


For now, I tend to just go to chesslab.com and look at games
there in the openings I play.


Well with 3.5 million you have quite a few. ChessDB also has the
facility to download from The Week In Chess (TWIC), so you can update
the database every week (usually a Monday), when new games are added to
TWIC. See:

http://chessdb.sourceforge.net/tutor...-retriveal.php

(The program has an http client to connect to the external sources of
data. There is also a telnet client which is used to download games from
FICS and ICC).

I do have a larger database, which I could make available, but as
databases get larger, the quality of the games goes down.
  #10   Report Post  
Old August 17th 07, 08:43 PM posted to rec.games.chess.analysis,rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2007
Posts: 43
Default Statistical significance of score differences - new release ofChessDB

M Winther wrote:

Befrore I download it I would like to know whether it handles
transpositions, i.e., are
*unplayed* moves visible in the tree that lead to a played position?

Mats


Yes. It shown positions, not moves. Hence sometimes you will find there
are no games in the database at move 5, but by move 6 there are thousands.
Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Statistical significance of score differences - new release of ChessDB Dave rec.games.chess.analysis (Chess Analysis) 30 August 22nd 07 08:13 PM


All times are GMT +1. The time now is 03:40 PM.

Powered by vBulletin® Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
Copyright ©2004-2019 ChessBanter.
The comments are property of their posters.
 

About Us

"It's about Chess"

 

Copyright © 2017