[relaxng-user] Latest proposal for smart regexes in RELAX NG

Amelia A Lewis amyzing at talsever.org
Fri May 7 04:10:34 ICT 2004


*sigh*

On Fri, 7 May 2004 09:17:22 +0500 (AMST)
David Tolpin <dvd at davidashen.net> wrote:

> > Your example covered only a subset of the RFC822 address production. 
> 
> By the way, my example covers the full syntax of addr-spec in RFC2822.

No, it doesn't.

> Jeffrey Friedl's has bugs in it. But only because his purpose was
> manifestly to impress an innocent reader by presenting a specimen
> of programming style for an obfuscated programming contest.

Ah, well that explains it, then.

> Let me quote my example again. 
> 
> Using XML Schema Datatype Library:
> 
> start=element addr-spec {
>   xsd:token {
>     pattern=
>       "(\(([^\(\)\\]|\\.)*\) )?"
>     ~
>     """([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`
>     {|}~]+)*|"([^"\\]|\\.)*")"""~ "@" 
>     ~
>     "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|
>     }~]+)*|\[([^\[\]\\]|\\.)*\])"~ "( \(([^\(\)\\]|\\.)*\))?"
>     
>   }
> }

Sorry for line-wrap mangling (*shrug* RFC2822 2.3, but it's a SHOULD, not
a MUST).

This, presumably, is to match RFC2822 3.4.1, not RFC822 6.1 or RFC2822
3.4.  It doesn't.  Perhaps it would if it defined things using the same
names as the BNF; the 'atom' given below does not correspond with RFC2822
3.2.4 atom, and in any event local-part in 3.4.1 uses dot-atom, not atom. 
The production fails to permit amy.lewis at talsever.org, for instance (a
legal RFC2822 address, though it'll get a bounce).  Is 'person' below
intended to correspond to quoted-string?  Will the regex match amy(oh,
her)@talsever(oh, there).org?  It doesn't appear to, to me, but I'm
perhaps too tired to be arguing about it.

I can't my first-edition Friedl, and second-edition seems not to have the
regex with explanation.

> 
> With a small extension (implemented in RNV using embedded Scheme
> interpreter):
> 
> datatypes dsl = "http://davidashen.net/relaxng/scheme-datatypes"
> 
> start=element addr-spec {
>   dsl:token {
>     s-pattern="""
>       comment = "\(([^\(\)\\]|\\.)*\)"
>       atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
>       atoms = atom "(\." atom ")*"
>       person = "\"([^\"\\]|\\.)*\""
>       location = "\[([^\[\]\\]|\\.)*\]"
>       local-part = "(" atom "|" person ")"
>       domain = "(" atoms "|" location ")"
>       start = "(" comment " )?" local-part "@" domain "( " comment ")?"
>     """
>   }
> }
> 
> The only difference is that the same string-ish regexp is broken
> into parts.
> 
> Rewrite it in XML syntax. Let us see which syntax is more readable.

That would be nice.  But should it be a translation of the RFC2822 BNF
for 3.4.1, or of the above regex?  It seems to me that starting clean from
the BNF would be a better test, but I believe that the above regex doesn't
match it.

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
According to Business Week, in the 1990s the ratio between a chief
executive's salary and the takehome pay of the typical, feckless, 
whining grunt on the shopfloor rose from 85:1 to 475:1. (In the UK, 
which is seeing a vigorous popular backlash against "fat cat" pay 
packets, the ratio is 24:1).
               -- The Register


More information about the relaxng-user mailing list