XML stuff
Home : Linux resources :
XML
[Random collection of XML-related links, so I can keep track of what
I'm learning. -- rgr, 7-Oct-02.]
Table of Contents
- XML stuff
- Table of Contents
- General XML information
- Standards and draft standards
- Books
- Implementations
- Documents frequently cited by XML Recommendations
- SOAP, etc.
- Scalable Vector Graphics
- Bioinformatics and XML
- Glossary
General XML information
From the introduction on the Extensible
Markup Language (XML) page:
Extensible Markup Language (XML) is a simple, very flexible text format
derived from SGML (ISO 8879). Originally designed to meet the
challenges of large-scale electronic publishing, XML is also playing an
increasingly important role in the exchange of a wide variety of data on
the Web and elsewhere.
The first sentence is only half true; nothing derived from SGML can
truly be called "simple." XML does indeed simplify or eliminate the
hairier SGML features, and is lexically simpler even than HTML. The
basic concepts are easy, especially if you already know HTML; indeed,
the most important differences between traditional HTML and XHTML, the
new XML-ized version, are what you can't do in XHTML that
are valid in HTML. However, [DTD vs. Schema].
Another reason for complexity is that XML is truly extensible; anyone
can define and publish their own document formats (called an "XML application"), and standard tools will be
able to parse and operate on such documents to some extent. For this to
work, XML needs [metainformation]. So, although it is easy to write
XML, and even to invent your own XML document formats, it is harder to
write the "metadocuments" (such as a DTD or schema) that other tools
will probably need to make sense of the XML.
Unfortunately, all of this information, like XML itself, is fairly
decentralized. There are about a dozen W3C standards that form what
could be considered the "core" set of XML, with numerous internal
cross-references. This makes for difficult reading -- even in a Web
browser. Worse, these standards were written by different working
groups over a period of several years, making it harder to read the
earlier standards without understanding the context in which they were
written. For example, Namespaces in XML makes
pervasive changes to the syntax and meaning of names, which had to be
retrofitted into earlier standards.
In order to make sense of it all, it helps to have a good general book that covers the XML core concepts in one place.
Even then, it is best to read lightly through it the first time, without
expecting to understand everything, in order to get the big picture.
Then you can fill in the details on the second pass.
Standards and draft standards
These are all available from the W3C on
their Technical Reports and
Publications page, directly or indirectly.
- [XML-general]
- Extensible Markup Language
(XML), a brief description of how it is being developed.
- [XML]
- Extensible Markup Language
(XML) 1.0 (Second Edition), W3C Recommendation, 6 October
2000.
- [XMLNS]
- Namespaces in
XML, W3C Recommendation, 14-January-1999. Defines the
oft-used QName and NCName productions.
- [XSchema]
- This is pretty hairy, so it's a good thing that somebody thought
to break it up:
- XML Schema for
an overview, working group progress, implementations, etc.
-
XML Schema Part 0: Primer, W3C Recommendation, 2 May
2001. This is not officially part of the specification,
but is much more readable for non-implementors.
- XML Schema
Part 1: Structures, W3C Recommendation, 2 May 2001.
- XML Schema
Part 2: Datatypes, W3C Recommendation, 2 May 2001.
- [XHTML]
- XHTMLtm 1.0 The
Extensible HyperText Markup Language (Second Edition), A
Reformulation of HTML 4 in XML 1.0, W3C Recommendation, 26
January 2000, revised 1 August 2002.
[more "meta" things. -- rgr, 3-Nov-02.]
- [Infoset]
- XML Information
Set, W3C Recommendation, 24 October 2001.
- [XQDM]
- XQuery 1.0 and
XPath 2.0 Data Model, W3C Working Draft, 16 August 2002.
This is one of the competing XML API model candidates; the XML
DOM is another.
- [DOM]
- Document Object Model (DOM).
- [XPath]
- XML Path Language (XPath)
Version 1.0, W3C Recommendation, 16 November 1999.
- [XSLT]
- XSL Transformations (XSLT)
Version 1.0, W3C Recommendation, 16 November 1999. See also
the Oasis
overview of XSL. Miloslav Nic has written a very
nice-looking XSLT
Reference with a large collection of examples that is also
searchable.
- [RDF]
- Resource
Description Framework (RDF) Model and Syntax Specification,
W3C Recommendation, 22 February 1999. RDF overview information
can be found on the Resource
Description Framework (RDF) / W3C Semantic Web Activity page.
Books
- [XIAN2]
- Elliotte Rusty Harold and W. Scott Means, XML in a Nutshell
(2e), O'Reilly, 2002. $39.95, ISBN 0-596-00292-0.
Contains a good general overview of all of the foregoing.
Implementations
This is limited to only those implementations with which I have some
personal experience. See also the enormous Oasis Public SGML/XML
Software list.
- [CL-XML]
- CL-XML: Common Lisp support for the
'Extensible Markup Language', written by James Anderson.
Supports both SAX-like and XQDM document interfaces.
- [ACL-XML]
- XML/HTML
parsers in Common Lisp, by Franz, Inc. Includes
A Lisp Based HTML Parser, and
A Lisp Based XML Parser. Both produce output as nested Lisp
lists, which can be easier to deal with than SAX and
lighter-weight than DOM.
- [SOAP::Lite]
- SOAP::Lite, module
for Perl, by Paul Kulchenko. See also the SOAP,
etc. section.
Documents frequently cited by XML Recommendations
- [URI]
- Uniform Resource
Identifiers (URI): Generic Syntax, RFC2396. (Cited as the
definitive reference for URIs.)
- [HTML40]
- HTML 4.01
Specification, W3C Recommendation, 24 December 1999. Not an
XML application, but will continue to be the dominant Web markup
language for some time. In fact, XSLT contains special
support for pre-XML incarnations of HTML; 4.0 is the default
version when HTML output is chosen.
- [CSS2]
- Cascading Style Sheets,
Level 2.
SOAP, etc.
Scalable Vector Graphics
Bioinformatics and XML
Glossary
- application, XML
- An "XML application" just means an "application of XML" to a
given problem area. It is usually defined by a machine-readable
DTD that describes the syntax to a validating parser, together
with a human-readable document that defines the semantics.
- NCName [Namespaces
in XML]
- A "no colon name," used as an identifier. Such names do not
belong to any namespace.
- QName [Namespaces in
XML]
- A "qualified name," with an optional namespace prefix. Such
names are generally understood to belong to a namespace. A
QName has at most one colon, which (if present) must be
neither the first nor the last character. Syntactically, this is
an NCName with an optional prefix
(also an NCName).
Bob Rogers
<rogers@rgrjr.com>