[relaxng-user] RNC and repeatedPrimary

Bob Foster bob at objfac.com
Fri May 7 18:55:38 ICT 2004


Regular expressions have traditionally been written with postfix *, 
beginning with the Kleene Star operator introduced by S. C. Kleene 
(pronounced KLAY-nee) in a 1956 paper. Universally adopted in 
mathematics, the notation was used by Ken Thompson in a version of the 
QED text editor he wrote at Bell Labs around 1966, from whence sprang 
Unix ed and grep, setting the standard for editor and programming 
languages to the present day.

RELAX NG is grounded in regular language theory and its best-known 
implementations are elaborations of J. A. Brzozowski's derivatives of 
regular expressions, published in 1964. But probably the real reason the 
compact syntax uses the notation is that regular expression grammars 
were used in GML element declarations at least since 1971, and later in 
SGML and XML DTDs.

Changing postfix to prefix wouldn't add any expressiveness, and, after 
over 40 years of use, would seem very unusual to programmers and 
mathematicians.

(Thanks for a fun 20 minutes looking up the dates!)

Bob Foster

Jeff Rafter wrote:
> Everywhere I look I see this kind of construction (which may be the answer
> to the question I am about to ask), but I have always wondered why the *, +,
> ? operators are at the end of constructions. Wouldn't it make more sense for
> them to be at the start of a construction? For example:
> 
>     foo =  element foo { (bar1, bar2, bar3+, bar4)* }
> 
> could just as easily be:
> 
>     foo =  element foo { *(bar1, bar2, +bar3, bar4) }
> 
> Especially in the case of RNC, where a repeatedPrimary is wrapped in
> something like <zeroOrMore>x</zeroOrMore> this makes sense, as it allows
> parsers/producers to be written in a streaming mode.
> 
> I am sure there is a good reason, I am sure I am missing something obvious-- 
> I have just always wondered.
> 
> Thanks,
> Jeff Rafter



More information about the relaxng-user mailing list