Reply
 
LinkBack Thread Tools Display Modes
  #1   Report Post  
Old September 6th 06, 08:28 AM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Sep 2006
Posts: 9
Default PGN Specification Revision

The 1994 PGN specification is immense, filled with information
unrelated to the implementation of the standard. As a step towards
writing my own program to import and export PGN data, I took some time
to strip it down to the essentials. In case anyone wants to comment on
it (or, better yet, use it), the PDF is available on my site. TeX
sources are also available to those who
ask.

http://research.strangeabacus.com/sources/pgnspec.pdf

  #2   Report Post  
Old September 6th 06, 12:31 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2006
Posts: 157
Default PGN Specification Revision

Adam Blinkinsop wrote:

to strip it down to the essentials. In case anyone wants to comment on
it (or, better yet, use it), the PDF is available on my site.


What changes did you introduce? (Best: document them in the document.)

For instance: the byte equivalence requirement (3.2.1) seems to be gone,
comment lines (the ; comment) seems to be gone, the requirement that
tags outside the STR appear in ASCII order, and that as many movetext
tokens as possible must appear on the same line.

They're rarely implemented, but they are requirements, and should not
be dropped without at least mention that they have been dropped and the
reason why, I think.

Oh, and ISO 8859-1 does not define any control characters, no matter
how much PGN insists it does. Any references to carriage return and
line feed are meaningless in the context of Latin-1 only.

--
Anders Thulin ath*algonet.se http://www.algonet.se/~ath

  #3   Report Post  
Old September 6th 06, 03:18 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Sep 2006
Posts: 9
Default PGN Specification Revision

Thanks for bringing these things to my attention! I'll make sure to
put them into the document. For expediency's sake, my reasoning
follows.

Anders Thulin wrote:
For instance: the byte equivalence requirement (3.2.1) seems to be gone,


That's right. I took it out because there is no way for a parser to
know whether a given file follows that part of the spec, and it seemed
redundant anyway (unless an exporter includes some entropy internally,
it will _always_ be byte equivalent to itself, which is the letter of
the requirement). I'll document it anyway, though.

comment lines (the ; comment) seems to be gone,


This is actually still in the document, in the formal syntax spec: see
page 4, the second section, under "rest-of-line-comment." Should I
explain the comments somewhere else to make sure they aren't missed?

the requirement that
tags outside the STR appear in ASCII order,


Absolutely -- I thought I had written it in there, but I can't find it
now. I'll make sure to put it in today.

and that as many movetext
tokens as possible must appear on the same line.


Hmm. That's one of the archival requirements as well. Now that you
mention it, I notice that I didn't emphasize either the need for no
empty lines until the movetext is over. One more thing to change.

Oh, and ISO 8859-1 does not define any control characters, no matter
how much PGN insists it does. Any references to carriage return and
line feed are meaningless in the context of Latin-1 only.


I figured as much. I'll change it to 8-bit ASCII (which is what they
were talking about anyway) to make it consistent with itself.

Thanks for the proofread! Expect v2 to be up later today (around 9
PST).

  #4   Report Post  
Old September 6th 06, 04:05 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Jul 2006
Posts: 625
Default PGN Specification Revision

* Adam Blinkinsop (16:18) schrieb:

I figured as much. I'll change it to 8-bit ASCII (which is what they
were talking about anyway) to make it consistent with itself.


There is no such thing as 8-bit ASCII.

mfg, simon .... l
  #5   Report Post  
Old September 6th 06, 04:09 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Sep 2006
Posts: 9
Default PGN Specification Revision

Alright, changes made. Updated version available at the same URL:

http://research.strangeabacus.com/sources/pgnspec.pdf



  #6   Report Post  
Old September 6th 06, 04:56 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Sep 2006
Posts: 9
Default PGN Specification Revision

Simon Krahnke wrote:
There is no such thing as 8-bit ASCII.


http://en.wikipedia.org/wiki/Extended_ASCII

I'll be more descriptive -- it is technically the Latin 1 form of
extended (I always call it eight-bit, but I guess that's not standard)
ASCII. That change has been made.

  #7   Report Post  
Old September 6th 06, 05:13 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Aug 2006
Posts: 157
Default PGN Specification Revision

Adam Blinkinsop wrote:

comment lines (the ; comment) seems to be gone,


This is actually still in the document, in the formal syntax spec:


Oops, right ... I didn't double-check carefully enough.
I remember ... I was looking for that other weirdness that involves
integer and symbol tokens, but I suspect you may have sidestepped it
as I didn't find any 'token' production. As far as I can make out,
it's impossible to decide if you have an integer token or a symbol token
consisting only of digits.

(But I see now that you do mention token without defining it ... hm.)

I've still not figured out if the game termination markers are
tokens. They have to be, as only tokens, white space separators,
and comments are allowed in movetext. But '1/2-1/2' and '*' contain
characters that are not legal in tokens, so they can't be ...

Here's another: what's a 'printing character' (needed to decide
if a line exceeds recommended length)? This may be solved
if you switch to 8-bit ASCII for character set, but with Latin-1
it is a bit of a poser: is SHY (10/13) a printing character or not?
Is it always one or the other, or does it depend on the context?

At one time I used these conundrums instead of counting sheep ...

and that as many movetext
tokens as possible must appear on the same line.


Hmm. That's one of the archival requirements as well.


It's an interesting requirement. It says that '1. e4' is illegal,
if '1.e4' allows one more token on the line. It would probably
be a disaster if any program seriously checked for that kind of
problems.

I also think your annotation production says to much: it includes
! and ?, but I don't think they're allowed -- those things are
done as NAGs. (Or is this one of those incompatible changes?)

--
Anders Thulin ath*algonet.se http://www.algonet.se/~ath

  #8   Report Post  
Old September 6th 06, 05:36 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Sep 2006
Posts: 9
Default PGN Specification Revision

Anders Thulin wrote:
(But I see now that you do mention token without defining it ... hm.)


Where do I mention it? Most likely text copied straight from the old
spec... I tried to stay away from the entire "token" idea, because
it's generally unnecessary. The original spec sounds like he just
found lex and yacc and wanted to do something cool with them

I've still not figured out if the game termination markers are
tokens. They have to be, as only tokens, white space separators,
and comments are allowed in movetext. But '1/2-1/2' and '*' contain
characters that are not legal in tokens, so they can't be ...


That's one of the internal inconsistencies in the standard, and one
major reason why I hesitate to define a token.

Here's another: what's a 'printing character' (needed to decide
if a line exceeds recommended length)? This may be solved
if you switch to 8-bit ASCII for character set, but with Latin-1
it is a bit of a poser: is SHY (10/13) a printing character or not?
Is it always one or the other, or does it depend on the context?


I used the RFC's definitions of printing characters, an attempt to
avoid conflict by bowing to another standard. Do you think the set
should be defined differently?

At one time I used these conundrums instead of counting sheep ...


Doesn't help for me -- keeps me awake :-P (at midnight, when I wrote
the first version of this)

[Maximize the number of movetext tokens is] an interesting requirement.
It says that '1. e4' is illegal,
if '1.e4' allows one more token on the line. It would probably
be a disaster if any program seriously checked for that kind of
problems.


Absolutely. The thing is, any parser that works to spec will be
working with the "import" format, which doesn't care how many
tokens-per-line there are. However, the problem you raise is a moot
one: export data is required to place the move immediately after the
period (no spaces). Unnecessary, if you ask me.

I also think your annotation production says to much: it includes
! and ?, but I don't think they're allowed -- those things are
done as NAGs. (Or is this one of those incompatible changes?)


The import spec allows it, the export spec forbids it (see section
8.2.3.8 of the original). As I combined them into one, I attempted to
treat the import spec as the MUSTs and the export spec as the SHOULDs,
which is less complex to understand. All my changes (so far) are
completely compatible with any current program that runs to spec.

  #9   Report Post  
Old September 6th 06, 06:10 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Jun 2006
Posts: 5
Default PGN Specification Revision

On 2006-09-06, Anders Thulin wrote:

Here's another: what's a 'printing character' (needed to decide
if a line exceeds recommended length)? This may be solved
if you switch to 8-bit ASCII for character set, but with Latin-1
it is a bit of a poser: is SHY (10/13) a printing character or not?
Is it always one or the other, or does it depend on the context?


IMO one of the biggest problems with the PGN is the limited character
set. It works quite well if one is from Northern America, Northen Europe
or the Western Europe like you and me but it does bother me that
PGN cannot be written in any language.

Then again, it's very nice to see some development!

--
Ari Makela late autumn -
a single chair waiting
http://arska.org/hauva/ for someone yet to come
-- Arima Akito
  #10   Report Post  
Old September 6th 06, 06:29 PM posted to rec.games.chess.computer
external usenet poster
 
First recorded activity by ChessBanter: Sep 2006
Posts: 9
Default PGN Specification Revision

Ari Makela wrote:
IMO one of the biggest problems with the PGN is the limited character
set. It works quite well if one is from Northern America, Northen Europe
or the Western Europe like you and me but it does bother me that
PGN cannot be written in any language.


Absolutely. Ideally, a revised spec will pave the way for a _new_
spec, one that takes over 12 years of experience with the old one into
account. I have a running list of problems people have noted with the
current PGN format, so I'll add yours to it.

Then again, it's very nice to see some development!


Thanks!

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 09:02 PM.

Powered by vBulletin® Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
Copyright 2004-2019 ChessBanter.
The comments are property of their posters.
 

About Us

"It's about Chess"

 

Copyright © 2017