These are the slides for a talk given at the XML 2002 conference in Baltimore. They have been combined into a single HTML file.
Good-quality XSD
Structure (define/include) preserving
Approximate where necessary
Useful rather than perfect
Build RELAX NG object model
Convert RELAX NG object model to intermediate form
Perform transformations on intermediate form
Generate XSD from intermediate form
Schema language between RELAX NG and XSD
Abstract, no syntax
No mixed element/attribute content models
Clean, simple semantics
Schema structure more controlled than RELAX NG
Simple type definition associates local name with simple type
Attribute group definition associates local name with attribute use
Group definition associates local name with particle
Start declaration declares particle that document element must match
Include references a schema
Order of components not semantically significant
Simple type definitions, attribute group definitions, group definitions have distinct symbol spaces
Definitions are named with local names not QNames
Builtin simple types do not have simple type definitions
No target namespace associated with a schema
No complex type declarations, element declarations or attribute declarations
Restriction contains the name of builtin simple type and list of facets
List contains a simple type and a minimum/maximum number of occurrences
Union contains a list of simple types
Reference contains a local name referring to a simple type definition
Element contains an expanded QName and a complex type
Wildcard element contains a wildcard
Repeat contains a particle and a minimum/maximum number of occurrences
Sequence contains one or more particles
Choice contains one or more particles
Interleave contains one or more particles
Reference contains a local name referring to a group definition
Complex content contains attribute use, a particle, a mixed flag
Simple content contains attribute use, a simple type
Attribute contains an expanded QName and a simple type
Optional attribute contains an attribute and a default value
Wildcard attribute contains a wildcard
Attribute group contains a list of zero or more attributes
Attribute use choice contains a list of one or more attribute uses
Reference contains a local name referring to an attribute group definition
Positive/negative flag
Set of namespace URIs
Set of excluded expanded QNames
Flags computed based on possible matches of the pattern
empty says if there is a match whose content is empty
text says if there is a match whose content includes a text node that is matched against a text pattern
data says if there is a match whose content includes a text node that is matched against a data, value or list pattern
attribute says if there is a match that includes an attribute
element says if there a match whose content includes an element
Sufficient to allow conversion to intermediate form
Can compute flags for patterns from subpatterns
A pattern can be converted in three ways:
A pattern may be converted to a particle
A pattern may be converted to a simple type
A pattern may be converted to an attribute use
A single pattern may be converted both to a particle or a simple type and to an attribute use
element
patterns are treated like
empty
when converting to an attribute use
attribute
patterns are treated like
empty
when converting to a particle or simple
type
A name class is converted to:
a set of expanded QNames
a wildcard
Split name class into wildcard and list of expanded names
Generate wildcard element particle for wildcard
Generate element particle for each expanded name by converting body of element pattern to a complex type
Combine with choice particle
If body has element flag, then use a complex type with complex content and convert body to a particle
Mixed if body has either data or text flag
If body has data flag but neither text nor element flag, then use a complex type with simple content and convert body to a simple type
In addition, convert body to an attribute use
If body has attribute flag, then generate an attribute group definition by converting body to an attribute use
If body has element flag, then generate a group definition by converting body to a particle
If body has data flag but neither text nor element flag, then generate a simple type definition by converting body to simple type
Intermediate form like XSD not RELAX NG
Compute minimum and maximum number of tokens in list
Compute union of simple types of possible members of list
Transform out attribute choice
Transform out interleave except where XSD allows it
Combine attribute wildcards
Combine unions of simple types with enumeration facet
Assign target namespace to each file in intermediate schema
Choose or create principal file for every namespace
Determine which attributes, element particles need to be moved
Determine which negative wildcards need to be moved
Determine which attributes, element particles should be global
Null namespace needs special treatment
Identify cases where complex type can be used instead of
Simple type definition and optionally attribute group definition, or
Group definition and optionally attribute group definition
All references must be such that they can turn into
the type of an element
the base type of a complex type extension
Take advantage of XSD shorthands
Generate complex type definitions
Generate global element/attribute declarations
Generate bridging definitions for non-global moved elements/attributes
Generate bridging definitions for negative wildcards
Deal with attribute wildcards
Avoid violating unique particle attribution constraint
Avoid violating element declarations consistent constraint
Take advantage of substitution groups
Better handling of interleave
Inform user about all approximations
Generate annotations using eg Schematron to make approximations exact
Handle RELAX NG overrides using redefine
Trang (Translator for RELAX NG Schemas)
Open source
http://www.thaiopensource.com/relaxng/trang.html