[relaxng-user] re: Document/schema association proposal
David Tolpin
dvd at davidashen.net
Wed Dec 31 00:50:25 ICT 2003
Hi,
I have been looking at Document/schema association proposal,
http://groups.yahoo.com/group/emacs-nxml-mode/message/259; I was going
to implement it.
Essentially, the problem the solution is solving is classifying
1) strings (URIs)
2) xml files
according to a number of templates and their ordering. The rules in
the proposal are about matching the strings and files against informally
defined templates.
Since it is done in a framework of a language (Relax NG), which purpose
is to define templates for xml documents in a regular way, it would be
natural to just use Relax NG itself for classification.
In a similar way, regular expressions for strings, in any of accepted syntaxes,
can be used to filter URIs.
Below is a sample document/schema association, expressed in terms of regular
expressions for URIs and Relax NG grammars for documents. It is actually the utility
I am using for my work. The glue is perl, but it can as well be written in xml,
and interpeted using e-lisp, C, or any other language.
In particular, XML template for XSLT documents checks for either the root element
being stylesheet or transform or a child of the root element being template (for
literal result elements as documents).
my $TMPDIR="/tmp";
my $RNGDIR="/usr/local/share/rng-c";
my %SCHEMAS=(
docbook=>"docbook.rnc",
xslt=>"xslt.rnc",
xhtml=>"xhtml.rnc",
xslfo=>"fo.rnc",
relaxng=>"relaxng.rnc"
);
my $ANY=<<END;
any = (element * {any}|attribute * {text}|text)*
END
my $XML=<<END;
start = element * {any}
$ANY
END
my $DOCBOOK=<<END;
start = element (book|article|part|chapter|section) { any }
$ANY
END
my $NOTXSLT=<<END;
default namespace xsl = "http://www.w3.org/1999/XSL/Transform"
start = element *-xsl:* {not-xsl}
any = (element *-xsl:* {any}|attribute * {text}|text)*
$ANY
END
my $RELAXNG=<<END;
default namespace rng = "http://relaxng.org/ns/structure/1.0"
start = element grammar {any}
$ANY
END
my $type;
for($ARGV[0]) {
$type=(
valid($_,$XML)
and (valid($_,$DOCBOOK) and "docbook"
or!valid($_,$NOTXSLT) and "xslt"
or valid($_,$RELAXNG) and "relaxng")
or /\.x?ht(ml?)?$/ and "xhtml"
or /\.xsl$/ and "xslt"
or /\.dbx$/ and "docbook"
or /\.fo$/ and "xslfo"
or /(.*)\.*$/ and do {my $f=$1.".rnc"; -f $f ? $f : 0;}
or "unknown"
);
}
print exists $SCHEMAS{$type}?$RNGDIR."/".$SCHEMAS{$type}:$type;
sub valid {
my ($uri,$rnc)=@_;
my $f=$TMPDIR."/".$$.".rnc";
my $valid;
open RNC,">$f"; print RNC $rnc; close RNC;
system("rnv -q $f $uri 2>/dev/null");
unlink $f; return !$?;
}
In particular, NOTXSLT rules allows to recognize literal result elements as
documents by contents. I first though had been to use XSLT to match XML
documents, but then I realized that Relax NG is better suited for this purpose.
One can argue that it is a slow approach. Determining file type by content for
a 5 megabytes file takes 1.5 seconds witn rnv, and it parses the whole document
even if the root element is notAllowed.
Sincerely,
David Tolpin
More information about the relaxng-user
mailing list