[relaxng-user] Documentation tokenization

Bob Foster bob at objfac.com
Mon May 3 22:58:20 ICT 2004


Jeff Rafter wrote:
> Thanks so much for the comments Bob, I have tried to answer inline.
> Obviously I am still confused a bit... : )
> 
>>I don't see any way a comment could jump into the middle of
>>documentation, which is a named terminal, which is a token.
> 
> Okay, this makes sense-- I think that I was getting confused by the grammar.
> Specifically, I was getting confused by documentationLineContent in the
> branch:
> 
> [^&newline;
 #]x  restOfLiney
> 
> I don't really understand what the "#" is doing in there. My interpretation
> was that this was there to forbid additional "#" after the start of content.
> But this must be wrong because XMLDistilled and Jing accept it.

You have to look at the whole documentationLineContent production. The # 
is not allowed in the alternative you quote to distinguish it from the

  |  "#"  documentationLineContent

alternative. If it weren't there, any documentationLine that began with 
three ### characters would match both alternatives, and the grammar 
would be ambiguous.

> 
> 
>>>For that matter, is:
>>>
>>># Comment # Comment
>>>
>>>allowable? Again  it seems to me that A.2 says it is not. But Jing and
>>>XMLDistilled accept it.
>>
>>Perhaps you should say why you think A.2 says it is not. Comments start
>>with # and continue to the end of the line. If a succeeding # is on the
>>same line (and not immediately following) why would it be treated
> 
> specially?
> 
> Again, this is my confusion probably. The branches of the serparator
> production are confusing me this time.
> 
> separator returns Void  ::=
>     [	
 &newline;]
>     |  "#"  [^&newline;
#]  restOfLine
>     |  "#"
> 
> I read this as saying: A separator may consist of #9, #A, #20, NEWLINE, or
> it may have a comment, which can either be "#" followed by not NEWLINE, #A,
> or "#", then the rest of the line or a "#" by itself (presumably this could
> only happen at the end of the document, otherwsie there would be a NEWLINE
> to end it?)

Yes, that's how I read it, too. The second alternative distinguishes a 
separator from a documentationLine.

> I could also add that I am slightly confused by the additional [^Chars]
> productions. Does this mean that those chars, if encountered, end the
> production? Or that they may not appear in that position (or both?).

None of the above. E.g., [^"&newline;] means it matches any character 
except  " or &newline;.

Bob Foster

> <sidenote>
> At this point, I have a custom parser written which can decode and tokenize
> RNC (checking for errors and reporting them if encountered). Once I fix
> these tokenization questions I can move on to the conversion to RNG (which
> is my near term goal). Ultimately I _hope_ to release this as either public
> domain or GPL with Library exception (I am still trying to figure out the
> licensing)-- both in C# and VB (so that it can be used with VBRELAXNG.DLL).
> </sidenote>
> 
> Thanks again,
> Jeff Rafter
> 
> 




More information about the relaxng-user mailing list