[relaxng-user] Latest proposal for smart regexes in RELAX NG

jcowan at reutershealth.com jcowan at reutershealth.com
Wed Apr 28 17:11:28 ICT 2004


The idea of this version, which isn't too different from the versions
I've put together before, is that it embeds cleanly into RNG, using a
separate namespace so that everything is a proper RNG foreign element.
Specifically, any one element that matches the "regex" rule below may
appear as the child of an RNG "data" element.  In that way, non-regex
processors just skip it properly.  Here's the schema:

namespace rx = "urn:x-rng:rx"		# to be changed

# Basic components: width-one and width-zero objects:
one = element rx:one{string {length = 1}
boundary = element rx:boundary {
	attribute type {"bos" | "eos" | "bol" | "eol" | "bow" | "eow"}
	# beginning/end of string, line, word
	}

# Iterators and operators
zeroOrMore = element rx:zeroOrMore {regex+}
oneOrMore = element rx:oneOrMore {regex+}
optional = element rx:optional {regex+}
choice = element rx:choice {regex+}
group = element rx:group {regex+}

# Conveniences
\string = element rx:string {string}	# must match this string
charset = element rx:charset {string}	# matches one char, any of these
class = element rx:class {		# matches one char of a named class
	attribute name {xsd:NCName}
	}
word = element rx:word {regex+}

# Character set operations
complement = element rx:complement {cset+}	# complement of union
difference = element rx:difference {cset+}
intersect = element rx:intersect {cset+}
union = element rx:union {cset+}
range = element rx:range {string} # from r[0] to r[1], r[2] to r[3], etc.

# Reference to defined regex
# Regexes are defined using define/data/rx:*
ref = element rx:ref {
	attribute name {xsd:NCName}
	}

# Escape hatch
pattern = element rx:pattern {string}	# Posix regex string

# Content models
regex = boundary | zeroOrMore | oneOrMore | optional | choice | group |
		\string | word | cset | ref | pattern
cset = one | charset | class | complement | difference |
		union | intersection | range

-- 
Winter:  MIT,                                   John Cowan
Keio, INRIA,                                    jcowan at reutershealth.com
Issue lots of Drafts.                           http://www.ccil.org/~cowan
So much more to understand!                     http://www.reutershealth.com
Might simplicity return?                        (A "tanka", or extended haiku)


More information about the relaxng-user mailing list