[relaxng-user] Latest proposal for smart regexes in RELAX NG
David Tolpin
dvd at davidashen.net
Wed May 5 12:34:38 ICT 2004
> >
> > RNV provides this function - through dsl datatypelibrary and s-pattern
> > facet. I had written about it on xml-dev, and http://ftp.davidashen.net/PreTI/RNV/readme.txt
> > tells about it too (I believe) near the bottom of the page -- search for
> > s-pattern .
>
> Ok. I've looked at this before, but I don't know what it is. What class
> of grammars do these patterns accept? Does your parser guarantee to
> handle any pattern that can be written this way?
Regular grammars. Grammars which are not regular are not accepted
by the parser, and the parser issues an error message stating where
exactly non-regularity happens. In a similar way to Relax NG,
which would syntactically allow non-regular grammars, but restrictions
in 7.3, for example, restrict to the regular ones.
>
> > I think that use of XML syntax for string templates (and regular expressions
> > are string templates) is plain wrong. XML regular expressions are good
> > or XML data, and the regular expressions is Relax NG itself, and the
> > data is XML.
> >
> > Strings are not trees. Templates should match instances in structure.
> > Instances regular expressions are matched against are strings; templates
> > are pretty good as strings too. Just make them structured, that is,
> > composable.
>
> I'm sorry (remember, brain on haitus) but I don't understand what you
> are saying. This:
>
> s-pattern="""
> comment = "\(([^\(\)\\]|\\.)*\)"
> atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
> atoms = atom "(\." atom ")*"
> person = "\"([^\"\\]|\\.)*\""
> location = "\[([^\[\]\\]|\\.)*\]"
> local-part = "(" atom "|" person ")"
> domain = "(" atoms "|" location ")"
> start = "(" comment " )?" local-part "@" domain "( " comment ")?"
> """
>
> is not RELAX NG itself.
It is Relax NG. The whole thing is an attribute value, and the handling
is provided by a Datatype Library, and the Datatype Library is specified
in a conformant way.
> At a glance, it's a context-free grammar. Hence
> my questions above.
A regular grammar is a context-free grammar. At a glance, this is both
context-free and a regular grammar, since it only requires finite
state automaton to parse.
David Tolpin
More information about the relaxng-user
mailing list