1. An SGML system
I have a picture of an SGML system in Figure 1.
This shows the sequence of tools which are used to create and
transform a document from its SGML original into whichever collection
of output formats are required. The components here are:
- The DTD
- The Document Type Definition specifies the syntax of the
document: which elements, and how many of them, are allowed where.
This is the `brains' of the system, and is used, directly or indirectly,
by all of the tools.
- The Editor
- Helps you create a syntactically valid document. It understands
the syntax specified in the DTD, and won't allow you to enter text
which isn't legal. Many editors also display the text readably (for
example, with sections colour-coded, with or without without the SGML
tags), according to rules you specify, and so have some aspects of the
formatter about them. There is nothing magic about an SGML editor,
however, and it is perfectly possible to create your SGML documents in
an ordinary text editor, relying on the SGML parser to retrospectively
validate your syntax.
- The SGML parser
- This is the heart of the system, which takes your document,
verifies that it does indeed conform to the DTD it claims to, fills in
any legitimately omitted parts, checks cross-references, and
generally does all the dirty work required to transform the document
into a form the down-stream tools can understand easily and
The point of this is that the down-stream tools can now rely
on getting a valid document, in just the format they want, with no
surprises. This makes them a lot simpler, and makes some
- The down-converter, or formatter
- The formatter sets to work on the output of the parser, massageing
it either into its final form (perhaps HTML or plain text), or into a
form ready for more specialised further processing (perhaps MIF or
TeX). This formatter can be arbitrarily complicated, and might refer
to all sorts of external information sources to resolve inter-document
cross references, generate tables of contents or indexes, insert
figures, and so on.
The formatter doesn't refer to the DTD, because it is specific to a
single DTD (or perhaps to a family of related ones), and if you change
the DTD you'll have to change the formatter. The formatter will
typically depend on a stylesheet of some type, perhaps written in
DSSSL (Document Style and Semantics Specification
Language, an ISO standard) or XSL (XML Stylesheet Language, a
forthcoming W3C standard).
The usual SGML term for this is `down-conversion' (presumably
because you're transforming the SGML document into a form with less
information content), but `formatting' is possibly more intelligible.
- Other processing
- Part of the power of SGML is that you can do much more than simply
transform your documents from an abstract SGML master into a variety
of human-readable variants. I won't say anything more about them
here, but the structure you have imposed on your document makes
available such things as SGML-aware search engines (so you might
search for all the occurances of a word in first or second level
headings, but not within lists), HyTime (another
ISO standard, which supports imposing yet more abstract structures on
your documents, including extremely rich cross-referencing) and
document transformation (mechanically rewriting your document from one
DTD to another).
I have described these components separately, to emphasise that
they are conceptually distinct components of an SGML system. It might
be most convenient for you to support these different phases with
different tools, but there do exist tools which blur the components