[relaxng-user] RNC and repeatedPrimary
Bob Foster
bob at objfac.com
Fri May 7 18:55:38 ICT 2004
Regular expressions have traditionally been written with postfix *,
beginning with the Kleene Star operator introduced by S. C. Kleene
(pronounced KLAY-nee) in a 1956 paper. Universally adopted in
mathematics, the notation was used by Ken Thompson in a version of the
QED text editor he wrote at Bell Labs around 1966, from whence sprang
Unix ed and grep, setting the standard for editor and programming
languages to the present day.
RELAX NG is grounded in regular language theory and its best-known
implementations are elaborations of J. A. Brzozowski's derivatives of
regular expressions, published in 1964. But probably the real reason the
compact syntax uses the notation is that regular expression grammars
were used in GML element declarations at least since 1971, and later in
SGML and XML DTDs.
Changing postfix to prefix wouldn't add any expressiveness, and, after
over 40 years of use, would seem very unusual to programmers and
mathematicians.
(Thanks for a fun 20 minutes looking up the dates!)
Bob Foster
Jeff Rafter wrote:
> Everywhere I look I see this kind of construction (which may be the answer
> to the question I am about to ask), but I have always wondered why the *, +,
> ? operators are at the end of constructions. Wouldn't it make more sense for
> them to be at the start of a construction? For example:
>
> foo = element foo { (bar1, bar2, bar3+, bar4)* }
>
> could just as easily be:
>
> foo = element foo { *(bar1, bar2, +bar3, bar4) }
>
> Especially in the case of RNC, where a repeatedPrimary is wrapped in
> something like <zeroOrMore>x</zeroOrMore> this makes sense, as it allows
> parsers/producers to be written in a streaming mode.
>
> I am sure there is a good reason, I am sure I am missing something obvious--
> I have just always wondered.
>
> Thanks,
> Jeff Rafter
More information about the relaxng-user
mailing list