[relaxng-user] RNC and repeatedPrimary
Bob Foster
bob at objfac.com
Sat May 8 13:30:35 ICT 2004
Jeff Rafter wrote:
> With that said, I can't figure out a way to make a single pass through an
> RNC grammar without either (a) buffering an unknown number of tokens in
> memory, or (b) buffering the resulting grammar. It is likely academic
> though-- multiple passes are probably fine, as is storing the result
> document/grammar in memory. Luckily RNC grammars are compact.
>
> Of course, the folks on this list know far more than I do about parsing-- so
> if there is a way to do it without (a) or (b) then I would love to hear it
> (links to various approaches would be very welcome).
Well, you're going to have to store something, because you don't know
what to generate for a pattern until you've seen the entire pattern.
(Unlike a define or a div, where you know everything you need to know
after parsing the first two tokens.) But you should be able to discard
whatever you store for a pattern after you've generated events for it.
(I'm assuming your translator generates SAX events, or something like them.)
The classical solution for expressions like pattern is to produce a
parse tree, where the nodes are in the order needed for generation.
Thus, an expression like:
(a | b)*
would produce a tree like:
star
|
or
/ \
id id
| |
"a" "b"
from which you can generate
<zeroOrMore>
<choice>
<ref name="a"/>
<ref name="b"/>
</choice>
</zeroOrMore>
during a simple pre-order walk.
Producing a parse tree is very easy in a top-down parser if every
matched rule returns a node.
As to references, I don't know exactly where to start, except of course,
the dragon book. Most people don't write parsers these days, but
generate them from annotated grammars using a "compiler-compiler" like
JavaCC or ANTLR. (I'm pretty sure YACC would have problems with the RNC
grammar.) Google will find all of these for you.
Bob Foster
More information about the relaxng-user
mailing list