Author:
James Clark (Thai Open Source Software Center) <jjc@thaiopensource.com>
Date:
2003-01-31
Copyright © 2003 Thai Open Source Software Center Ltd
The XML Namespaces Recommendation allows an XML document to be composed of elements and attributes from multiple independent namespaces. Each of these namespaces may have its own schema. The problem then arises of how the schemas can be composed in order to allow validation of the complete document.
In RELAX Namespace, Murata Makoto pioneered the idea of dividing the document into islands, with each island containing a single namespace, and validating each island separately against the schema for its namespace. RELAX Namespace formed the basis for the recently published Committee Draft of Document Schema Definition Languages (DSDL) -- Part 4: Selection of Validation Candidates.
This document presents a language named Modular Namespaces (MNS), which is an evolution of the ideas in RELAX Namespace and DSDL Part 4. RELAX Namespace was designed to work well with RELAX Core. RELAX Core cannot deal with documents that use multiple namespaces, nor does it provide any namespace-based wildcards. These limitations of RELAX Core are reflected in the design of RELAX Namespace. MNS is designed to be able to take advantage of more recent schema languages, such as RELAX NG, that are not limited in this way.
A sample implementation of MNS is included in Jing.
It is hoped that this will be a useful contribution to the future development of DSDL Part 4.
In its simplest form, a MNS schema consists of a mapping from namespace URIs to schema URIs. An MNS schema is written in XML. Here is a RELAX NG compact syntax schema for this simplest form of MNS schema:
default namespace = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, element validate { schemaType?, attribute ns { xsd:anyURI }, attribute schema { xsd:anyURI } }* } schemaType = attribute schemaType { mediaType } mediaType = xsd:string
Validity of an instance with respect to a MNS schema is determined as follows. First, a set of validation subjects is identified. Each validation subject is an element in the instance. Associated with each validation subject is a schema. If all the validation subjects are valid with respect to their associated schemas, then the instance is considered valid with respect to the MNS schema.
It is important to understand that when a validation subject is validated with respect to its schema, then it is validated along with all its descendants and attributes. One validation subject may have other validation subjects as ancestors. In this case, a validation subject will be validated with respect to more than one schema. Not only will it be validated with respect to its schema, but it will also be validated as part of validation of ancestor validation subjects with respect to their schemas.
An element is a validation subject if it has no parent or if its
namespace URI is different from that of its parent. A validation
subject must have a validate
rule for its namespace URI:
there must be a validate
element whose ns
attribute is the same as the validate subject's namespace URI. The
value of the ns
attribute of the validate
element can be the empty string to specify the absent namespace URI.
The associated schema is specified by the schema
attribute of the validate
element. The schema can be in
any language supported by the particular implementation. When the
schema is XML, the language of the schema is detected from the
namespace URI of the document element.
When the schema is not XML, then MNS relies on the MIME type
of the result of fetching the URI;
the schemaType
attribute can be used
to specify the MIME type explicitly. A MIME type of
application/x-rnc
can be used for RELAX NG compact
syntax. The schemaType
attribute on the
rules
element specifies the default value of of the
schemaType
on validate
element. Note that
the schema
attribute may refer to another MNS schema.
MNS has additional features that provide further control over the selection of validation subjects and their associated schemas.
Should it be possible to put the schema inline in the
MNS wrapped in, say, a schema
element? How does this
impact extensibility?
Should MNS have an include
element? The
ability to recursively reference MNS maybe makes this unnecessary.
Sometimes it may be desirable to allow elements from namespaces for
which there are no validate
rules. This can be done by
adding an empty lax
element to the rules
element:
default namespace = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, (validate* & lax?) } validate = element validate { schemaType?, attribute ns { xsd:anyURI }, attribute schema { xsd:anyURI } } lax = element lax { empty } schemaType = attribute schemaType { mediaType } mediaType = xsd:string
We will refer to an element that could be a validation subject if
there is an applicable validate
element as a
potential validation subject. In the absence of a
lax
element, there must be an applicable
validate
element for every potential validation subject;
with a lax
element, there need not be.
Attributes can also be validation subjects:
default namespace = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, (validate* & lax?) } validate = element validate|validateAttributes { schemaType?, attribute ns { xsd:anyURI }, attribute schema { xsd:anyURI } } lax = element lax { attribute allow { "attributes" | "elements" }? } schemaType = attribute schemaType { mediaType } mediaType = xsd:string
If an element has attributes that are namespace qualified with a
namespace URI other than the namespace URI of the element itself, then
the set of all attributes on that element with that namespace URI is a
potential validation subject. If there is a
validateAttributes
element for that namespace URI, then
it becomes a validation subject. The associated schema is specified
by the schema
attribute of the
validateAttributes
element. Unqualified attributes are
never validation subjects.
Lax processing for elements means that a potential validation
subject that is an element need not have an applicable
validate
element. Lax processing for attributes means
that a potential validation subject that is an attribute need not have
an applicable validateAttributes
element. In the absence
of a lax
element, neither attributes nor elements are
processed laxly. By default, the lax
element enables lax
processing for both elements and attributes. If
allow="attributes"
is specified, then lax processing is
enabled for attributes only; if allow="elements"
is
specified, then lax processing is enabled for elements only.
Normally, schema languages (including RELAX NG) validate an element
rather than a set of attributes. To work around this, MNS performs
parallel transformations on the set of attributes and on the schema.
The set of attributes is transformed by attaching the attributes to an
element with a particular namespace name and namespace URI. The schema
identified by a validateAttributes
element is transformed
to match. In the case of RELAX NG, when a
validateAttributes
element specifies a schema of
s
, MNS actually uses a schema of:
element * { external "s" }
By default, an element is a potential validation subject if its
namespace URI is different from its parent. This behavior can be
changed by adding one or more cover
children to
validate
elements. With the introduction of
cover
elements, the rule is that an element is a
potential validation subject if it is not
covered by an ancestor validation subject. Each validation
subject has a set of namespace URIs that it covers. A validation
subject always covers it own namespace URI. In addition, it covers
the namespace URIs specified by the cover
elements in its
validate
element. One obvious case where it is useful
for a schema to cover more than one namespace is when a
validate
refers recursively to an MNS schema.
default namespace = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, (validate* & lax?) } validate = element validate { validateModel, element cover { nsAtt }* } | element validateAttributes { validateModel } validateModel = nsAtt, attribute schema { xsd:anyURI }, schemaType? lax = element lax { attribute allow { "attributes" | "elements" }? } nsAtt = attribute ns { xsd:anyURI } schemaType = attribute schemaType { mediaType } mediaType = xsd:string
The set elements covered by a validation subject is determined from the set of namespace URI that it covers by the following two rules:
The rule for attributes is very similar. A set of attributes is a potential validation subject if and only if:
Just as with elements, an attribute is covered by an validation subject if its parent element and its namespace URI are covered by that validation subject.
Note that a validate
element is applicable to
a potential validation subject only if the namespace URI of the
potential validation subject is as specified by the ns
attribute. Any cover
child elements do not affect
this.
Should there be separate sets of namespaces that cover elements and cover attributes?
Should it be possible to cover all namespaces except a particular (possibly empty) finite set of namespaces?
The selection of validation subjects and associated schemas may need to be context dependent. For example, not all namespace URIs may be acceptable for the document element, or the document element may need to be validated against a different schema from that used for subtrees with the same namespace URI.
Context dependence is specified by means of modes.
default namespace = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, attribute startMode { mode }?, (validate|lax)* } validate = element validate { validateModel, attribute useMode { mode }?, element cover { nsAtt }* } | element validateAttributes { validateModel } lax = element lax { attribute allow { elementsOrAttributes }?, inModesAtt? } validateModel = nsAtt, attribute schema { xsd:anyURI }, inModesAtt?, schemaType? nsAtt = attribute ns { xsd:anyURI } schemaType = attribute schemaType { mediaType } mediaType = xsd:string inModesAtt = attribute inModes { list { mode+ } } mode = xsd:NCName | "#default" elementsOrAttributes = list { ("elements", "attributes") | ("attributes", "elements") | "elements" | "attributes" | empty }
The selection of validation subjects takes place with respect to a
named mode. A mode is named by an NCName. In addition, there is a
default mode named #default
. The
validate
and validateAttributes
elements have an optional inModes
attribute, which
specifies the modes in which the elements are applicable. The default
value of the inModes
attribute is the default mode. It is an
error if there is a mode and a namespace URI for which more than one
validate
or validateAttributes
element is applicable. Thus, when an element is selected as a
validation subject there is a unique applicable
validate
element.
The mode used to select whether a particular element or set of
attributes is a validation subject is specified by the
useMode
attribute of the validate
element applicable to the nearest ancestor validation subject. The
default value of the useMode
attribute is the default
mode. If an element or set of attributes has no ancestor validation
subject, then the mode used is determined by the
startMode
attribute on the rules
element;
the default value of the startMode
attribute is also the
default mode. Looking at this more procedurally, processing is
top-down; the starting mode is specified by the startMode
attribute. Within each validation subject element, processing
switches to the mode specified by the useMode
attribute.
Whether processing is lax also depends on the mode. The
lax
element has an inModes
attribute that
specifies the modes in which they apply. There can be multiple
lax
elements specifying lax processing for different
modes. As usual, the default value of the
inModes
attribute is the default mode.
For every mode named in a useMode
attribute other than
the default mode, there must be at least one validate
,
validateAttribute
or lax
element that
includes that mode in its inModes
attribute. For a mode
that allows nothing, either the default mode can be used or a
<lax allow="" inModes="m"/>
rule can be
added. The allowed value of the allow
attribute is, in
fact, a list of between zero and two distinct tokens from the set
elements
and attributes
.
We can distinguish between schemas that are open and
schemas that are closed. Open schemas allow attributes and
elements in other namespaces; closed schemas are not. Sometimes it is
necessary to treat a closed schema as open. This can be done by
adding a prune
attribute to validate
. This
has the effect of removing all potential validation candidates that
are elements or attributes from the subtree before validating the
subtree with respect to the schema specified by the
validate
, according as the value of the
prune
attribute contains the token elements
or attributes
.
attribute prune { elementsOrAttributes }?
Sometimes the processing mode to be used for an element may need to
depend on the name of the parent of that element. For example, we
might wish to allow elements of a particular namespace only within the
XHTML head
element and not anywhere else. To do this,
one or more context
elements are added to the
validate
element. The content of the
context
element identifies a context; the
useMode
attribute identifies a mode to be applied to
potential validation subjects in that context. The default value of
the useMode
attribute is the default mode as usual.
The context relates to the ancestry of the potential validation
subject starting with its parent element and continuing up and
including its nearest ancestor validation subject. The context
identified by a context
element is the union of the
contexts identified by each of its children. An element
element specifies a parent whose local name is equal to the value of
the name
attribute and whose namespace is equal to the
value of the ns
attribute. The namespace must the same as
that specified on the validate
element or one of its
cover
elements. The ns
attribute is
inherited and so it is unnecessary to specify it except when there are
cover
elements.
A context of the form:
<element name="x"> <element name="y"> <element name="z"/> </element> </element>
applies to a potential validation subject with a parent
z
, a grandparent y
and
a great grandparent x
. A context of the
form:
<root> <element name="x"/> </root>
applies to a potential validation subject with a parent element
x
such that that parent element is the nearest
ancestor validation subject of that potential validation subject.
The children of the context
elements of a particular
validate
element must all identify distinct contexts. It
is possible for a single potential validation subject to match
multiple distinct context children. A context child containing more
element
elements takes precedence over one containing
fewer element
elements. Amongst context children
containing the same number of element
elements, one that
has a root
element takes precedence over one that does
not.
default namespace = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, attribute startMode { mode }?, (validate|lax)* } validate = element validate { validateModel, useModeAtt?, attribute prune { elementsOrAttributes }?, element cover { nsAtt }*, context* } | element validateAttributes { validateModel } context = element context { useModeAtt?, nsAtt?, (rootContext|elementContext)+ } rootContext = element root { nsAtt?, elementContext } elementContext = element element { attribute name { xsd:NCName }, nsAtt?, elementContext? } lax = element lax { attribute allow { elementsOrAttributes }?, inModesAtt?, } validateModel = nsAtt, attribute schema { xsd:anyURI }, schemaType?, inModesAtt? nsAtt = attribute ns { xsd:anyURI } schemaType = attribute schemaType { mediaType } mediaType = xsd:string useModeAtt = attribute useMode { mode } inModesAtt = attribute inModes { list { mode+ } } mode = xsd:NCName | "#default" elementsOrAttributes = list { ("elements", "attributes") | ("attributes", "elements") | "elements" | "attributes" | empty }
Just as with RELAX NG, foreign elements and attributes can be added to MNS schemas. Thus, the complete MNS schema is as follows:
namespace local = "" default namespace mns = "http://www.thaiopensource.com/ns/mns" start = element rules { schemaType?, attribute startMode { mode }?, ((validate | lax)* & foreign) } validate = element validate { validateModel, useModeAtt?, attribute prune { elementsOrAttributes }?, ((cover*, context*) & foreign) } | element validateAttributes { validateModel, foreign } cover = element cover { nsAtt, foreign } context = element context { useModeAtt?, nsAtt?, ((rootContext|elementContext)+ & foreign) } rootContext = element root { nsAtt?, (elementContext & foreign) } elementContext = element element { nsAtt?, attribute name { xsd:NCName }, (elementContext? & foreign) } lax = element lax { attribute allow { elementsOrAttributes }?, inModesAtt?, foreign } validateModel = nsAtt, attribute schema { xsd:anyURI }, schemaType?, inModesAtt? nsAtt = attribute ns { xsd:anyURI } schemaType = attribute schemaType { mediaType } mediaType = xsd:string useModeAtt = attribute useMode { mode } inModesAtt = attribute inModes { list { mode+ } } mode = xsd:NCName | "#default" elementsOrAttributes = list { ("elements", "attributes") | ("attributes", "elements") | "elements" | "attributes" | empty } foreign = (attribute * - (mns:* | local:*) { text } | element * - mns:* { anything })* anything = (text | attribute * { text } | element * { anything })*
Suppose we want to validate an XHTML document that uses RDF within its
head
element. The following would do the job:
<rules xmlns="http://www.thaiopensource.com/ns/mns" startMode="xhtml"> <validate ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" schema="rdfxml.rng" inModes="rdf" useMode="anything"/> <validate ns="http://www.w3.org/1999/xhtml" schema="xhtml.rng" inModes="xhtml" prune="elements"> <context useMode="rdf"> <element name="head"/> </context> </validate> <lax inModes="anything"/> </rules>
Note the following points:
head
element. We
therefore specify an appropriate context in the
validation
element for the XHTML namespace.useMode
attribute on the
validate
element for the XHTML, the useMode
attribute will default to the default mode. Since none of our rules
apply in the default mode, the default mode will not allow
anything.anything
mode, for which processing is lax and which does
not have any validate
or
validateAttributes
rules.W3C XML Schema (XSD) includes features for namespace modulariy that are similar in some ways to MNS. Like MNS, XSD validation uses a mapping from namespace URIs to schemas. However, there are important differences.
xsi:schemaLocation
attributes in the instance, or it may
be supplied by the user. With MNS, the mapping is explicit and
implementation independent.