[relaxng-user] Documentation tokenization
Bob Foster
bob at objfac.com
Mon May 3 22:58:20 ICT 2004
Jeff Rafter wrote:
> Thanks so much for the comments Bob, I have tried to answer inline.
> Obviously I am still confused a bit... : )
>
>>I don't see any way a comment could jump into the middle of
>>documentation, which is a named terminal, which is a token.
>
> Okay, this makes sense-- I think that I was getting confused by the grammar.
> Specifically, I was getting confused by documentationLineContent in the
> branch:
>
> [^&newline;
 #]x restOfLiney
>
> I don't really understand what the "#" is doing in there. My interpretation
> was that this was there to forbid additional "#" after the start of content.
> But this must be wrong because XMLDistilled and Jing accept it.
You have to look at the whole documentationLineContent production. The #
is not allowed in the alternative you quote to distinguish it from the
| "#" documentationLineContent
alternative. If it weren't there, any documentationLine that began with
three ### characters would match both alternatives, and the grammar
would be ambiguous.
>
>
>>>For that matter, is:
>>>
>>># Comment # Comment
>>>
>>>allowable? Again it seems to me that A.2 says it is not. But Jing and
>>>XMLDistilled accept it.
>>
>>Perhaps you should say why you think A.2 says it is not. Comments start
>>with # and continue to the end of the line. If a succeeding # is on the
>>same line (and not immediately following) why would it be treated
>
> specially?
>
> Again, this is my confusion probably. The branches of the serparator
> production are confusing me this time.
>
> separator returns Void ::=
> [	
 &newline;]
> | "#" [^&newline;
#] restOfLine
> | "#"
>
> I read this as saying: A separator may consist of #9, #A, #20, NEWLINE, or
> it may have a comment, which can either be "#" followed by not NEWLINE, #A,
> or "#", then the rest of the line or a "#" by itself (presumably this could
> only happen at the end of the document, otherwsie there would be a NEWLINE
> to end it?)
Yes, that's how I read it, too. The second alternative distinguishes a
separator from a documentationLine.
> I could also add that I am slightly confused by the additional [^Chars]
> productions. Does this mean that those chars, if encountered, end the
> production? Or that they may not appear in that position (or both?).
None of the above. E.g., [^"&newline;] means it matches any character
except " or &newline;.
Bob Foster
> <sidenote>
> At this point, I have a custom parser written which can decode and tokenize
> RNC (checking for errors and reporting them if encountered). Once I fix
> these tokenization questions I can move on to the conversion to RNG (which
> is my near term goal). Ultimately I _hope_ to release this as either public
> domain or GPL with Library exception (I am still trying to figure out the
> licensing)-- both in C# and VB (so that it can be used with VBRELAXNG.DLL).
> </sidenote>
>
> Thanks again,
> Jeff Rafter
>
>
More information about the relaxng-user
mailing list