[relaxng-user] Latest proposal for smart regexes in RELAX NG
Amelia A Lewis
amyzing at talsever.org
Fri May 7 04:10:34 ICT 2004
*sigh*
On Fri, 7 May 2004 09:17:22 +0500 (AMST)
David Tolpin <dvd at davidashen.net> wrote:
> > Your example covered only a subset of the RFC822 address production.
>
> By the way, my example covers the full syntax of addr-spec in RFC2822.
No, it doesn't.
> Jeffrey Friedl's has bugs in it. But only because his purpose was
> manifestly to impress an innocent reader by presenting a specimen
> of programming style for an obfuscated programming contest.
Ah, well that explains it, then.
> Let me quote my example again.
>
> Using XML Schema Datatype Library:
>
> start=element addr-spec {
> xsd:token {
> pattern=
> "(\(([^\(\)\\]|\\.)*\) )?"
> ~
> """([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`
> {|}~]+)*|"([^"\\]|\\.)*")"""~ "@"
> ~
> "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|
> }~]+)*|\[([^\[\]\\]|\\.)*\])"~ "( \(([^\(\)\\]|\\.)*\))?"
>
> }
> }
Sorry for line-wrap mangling (*shrug* RFC2822 2.3, but it's a SHOULD, not
a MUST).
This, presumably, is to match RFC2822 3.4.1, not RFC822 6.1 or RFC2822
3.4. It doesn't. Perhaps it would if it defined things using the same
names as the BNF; the 'atom' given below does not correspond with RFC2822
3.2.4 atom, and in any event local-part in 3.4.1 uses dot-atom, not atom.
The production fails to permit amy.lewis at talsever.org, for instance (a
legal RFC2822 address, though it'll get a bounce). Is 'person' below
intended to correspond to quoted-string? Will the regex match amy(oh,
her)@talsever(oh, there).org? It doesn't appear to, to me, but I'm
perhaps too tired to be arguing about it.
I can't my first-edition Friedl, and second-edition seems not to have the
regex with explanation.
>
> With a small extension (implemented in RNV using embedded Scheme
> interpreter):
>
> datatypes dsl = "http://davidashen.net/relaxng/scheme-datatypes"
>
> start=element addr-spec {
> dsl:token {
> s-pattern="""
> comment = "\(([^\(\)\\]|\\.)*\)"
> atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
> atoms = atom "(\." atom ")*"
> person = "\"([^\"\\]|\\.)*\""
> location = "\[([^\[\]\\]|\\.)*\]"
> local-part = "(" atom "|" person ")"
> domain = "(" atoms "|" location ")"
> start = "(" comment " )?" local-part "@" domain "( " comment ")?"
> """
> }
> }
>
> The only difference is that the same string-ish regexp is broken
> into parts.
>
> Rewrite it in XML syntax. Let us see which syntax is more readable.
That would be nice. But should it be a translation of the RFC2822 BNF
for 3.4.1, or of the above regex? It seems to me that starting clean from
the BNF would be a better test, but I believe that the above regex doesn't
match it.
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
According to Business Week, in the 1990s the ratio between a chief
executive's salary and the takehome pay of the typical, feckless,
whining grunt on the shopfloor rose from 85:1 to 475:1. (In the UK,
which is seeing a vigorous popular backlash against "fat cat" pay
packets, the ratio is 24:1).
-- The Register
More information about the relaxng-user
mailing list