Reply
 
LinkBack Thread Tools Display Modes
  #1   Report Post  
Old December 15th 03, 03:00 AM
James Thornton
 
Posts: n/a
Default Identify Position with CRC of Move Text?

If you perform a 32 bit cyclic redundancy check (crc32) on
consistently formatted PGN move text (i.e. remove annotations, same
spacing, line breaks, etc.), would that be sufficient for uniquely
identifying a chess position. To identify each position of a game, you
would need perform two crc32s per move (one crc32 per ply).
  #2   Report Post  
Old December 15th 03, 05:10 AM
Fortknight
 
Posts: n/a
Default Identify Position with CRC of Move Text?

um, no...
It will, likely (there is a chance of crc collision), only give a file
position (not a chess position)in a file, it will tell you nothing
about a given chess position, or even more importantly that the
position is repeated in another PGN file. (All it will tell you is
that the game is likely repeated up to a given point...)

PGN, only provides a move list, the chess positions are only by
implication.

Rather, what you will probably want to do is create a position code,
(there are many possible ways to do this) for each position in the PGN
file and store that... You could even get rid of the move list, and
have that only by "implication", and create the move list on the fly.
I suspect that your database would be slightly larger than the PGN
version, but would have the advantage of creating a cross searchable
database of positions.

Cheers




On 14 Dec 2003 19:00:47 -0800, (James Thornton)
wrote:

If you perform a 32 bit cyclic redundancy check (crc32) on
consistently formatted PGN move text (i.e. remove annotations, same
spacing, line breaks, etc.), would that be sufficient for uniquely
identifying a chess position. To identify each position of a game, you
would need perform two crc32s per move (one crc32 per ply).


  #3   Report Post  
Old December 15th 03, 07:12 AM
James Thornton
 
Posts: n/a
Default Identify Position with CRC of Move Text?

Fortknight wrote:

Rather, what you will probably want to do is create a position code,
(there are many possible ways to do this) for each position in the PGN
file and store that...


What papers discuss creating a position code (by position code, I assume
you mean a fingerprint/checksum)?

Thanks.

  #4   Report Post  
Old December 15th 03, 09:50 AM
David Richerby
 
Posts: n/a
Default Identify Position with CRC of Move Text?

James Thornton wrote:
If you perform a 32 bit cyclic redundancy check (crc32) on
consistently formatted PGN move text (i.e. remove annotations, same
spacing, line breaks, etc.), would that be sufficient for uniquely
identifying a chess position.


No. Many positions share the same CRC as there are only 2^32, which is
about four billion, possible values that a 32-bit checksum can take and
there are many, many more chess positions than that. Secondly, many
positions can arise through many different move orders, each of which
will, most likely, have different CRCs.


To identify each position of a game, you would need perform two crc32s
per move (one crc32 per ply).


Which doesn't help much as the move text itself is unlikely to take up
much more than 32 bits..

Personally, I'd use 64-bit Zobrist keys (Google if you don't know what
those are). Make sure you correctly differentiate positions that have
different en passant squares and different castling rights.


Dave.

--
David Richerby Expensive Moistened Shack (TM): it's
www.chiark.greenend.org.uk/~davidr/ like a house in the woods but it's
moist and it'll break the bank!
  #5   Report Post  
Old December 15th 03, 04:40 PM
Anders Thulin
 
Posts: n/a
Default Identify Position with CRC of Move Text?

James Thornton wrote:

If you perform a 32 bit cyclic redundancy check (crc32) on
consistently formatted PGN move text (i.e. remove annotations, same
spacing, line breaks, etc.), would that be sufficient for uniquely
identifying a chess position.


No. It's possible that two different move sequences will lead to the
same position. As they are different, chances are pretty good that thge
CRC will end up different, but that's not what you want.

It would make more sense to CRC the FEN string of the resulting position,
but even that will carry the risk of collisions. One estimate of the number
of different chess positions (disregarding positions with promoted pieces)
is 10^42. It's a bit high, but not so high as to rival 2^32.

--
Anders Thulin http://www.algonet.se/~ath



  #6   Report Post  
Old December 15th 03, 05:52 PM
David Richerby
 
Posts: n/a
Default Identify Position with CRC of Move Text?

Anders Thulin wrote:
It would make more sense to CRC the FEN string of the resulting
position, but even that will carry the risk of collisions. One estimate
of the number of different chess positions (disregarding positions with
promoted pieces) is 10^42. It's a bit high, but not so high as to rival
2^32.


10^42 is vastly, vastly bigger than 2^32 (which is about 4x10^9). With
a 32-bit hash, if you have more than 65,000 positions, the probability of
having that two of them have the same hash is greater than 1/2. That's
unlikely to be acceptable.


Dave.

--
David Richerby Electronic Slimy Spoon (TM): it's like
www.chiark.greenend.org.uk/~davidr/ a piece of cutlery but it's covered
in goo and it uses electricity!
  #7   Report Post  
Old December 15th 03, 09:11 PM
James Thornton
 
Posts: n/a
Default Identify Position with CRC of Move Text?

Anders Thulin wrote:

It would make more sense to CRC the FEN string of the resulting position


Is FEN notation the current standard for representing chess positions?
Do most chess databases store it in string form, or do they use compression?

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 01:43 PM.

Powered by vBulletin® Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
Copyright 2004-2019 ChessBanter.
The comments are property of their posters.
 

About Us

"It's about Chess"

 

Copyright © 2017