[relaxng-user] line terminator in compact syntax

David Tolpin dvd at davidashen.net
Sat Dec 6 18:36:27 ICT 2003


> On Sat, 2003-12-06 at 05:09, David Tolpin wrote:
> 
> > But if I replace literal characters in the original source, the interpretation will
> > be different, since #xD, when escaped, is not normalized to newline marker and is not
> > a line terminator.
> 
> Any line terminators (whether #xD #xA, #xD or #xA)in the original source
> have to be escaped as \x{A}.  Replacing a literal #xD by \x{D} would
> only work if newline normalization happened _after_ escape
> interpretation, but this wouldn't have been a good idea because plenty
> of environments do line termination normalization on input.

But this is not what the specification says. It explicitly distinguishes between
the newline marker and  \x{A}, and says that both newline markers and \x{A}
can be present in the stream after newline normalization. The grammar, further,
mentions newline marker separate from \x{A}; and I understand it is that the newline
marker is not any character but a special entry in the source stream.

: 2.4. Escape interpretation
: 
: In this stage, each escape sequence of the form \x{n}, where n is a hexadecimal
: number, is replaced by the character with Unicode code n. The escape sequence
: must match the production escapeSequence; the value computed in the BNF is the
: Unicode code of the replacement character. It is an error if the replacement
: character does not match the Char production of [XML 1.0]. It is an error if
: the input character sequence contains a character sequence escapeOpen that does
: not start an escapeSequence. After an escape sequence has been replaced,
: scanning for escape sequences continues following the replacement character;
: thus \x{5C}x{5C} is transformed to \x{5C} not to \. The replacement for \x{A}
: or \x{D} is a character, as for all other escape sequences, not a newline
: marker. Thus the sequence that results from this stage can contain #xA and #xD
: characters as well as newline markers.

The grammar lists both &newline; and &#A; as line terminators. They are
separate  in the grammar, and I understand that these are two different things.
Am I wrong?

I would understand, if neither \x{A} nor \x{D} were line terminators or spaces, but
rather normal characters, disallowed everywhere  but in literals. I just don't understand
the difference between xA and xD with regard to the language of the specification.

If I understand it correctly, nXML does not use \x{A} as a newline marker, but rather
uses #0 for this purpose, and explicitely checks for either \x{A} or #0 as the end-of-line.

I will keep it as it is, but want to be sure that I've done it right.

David


More information about the relaxng-user mailing list