SGML on the Desktop / 1. An SGML system


1. An SGML system

Figure 1: An SGML system. Bold italics indicates free tools. Bracketed terms are (free) standards, for which there is often further specialised support.
An SGML system

I have a picture of an SGML system in Figure 1. This shows the sequence of tools which are used to create and transform a document from its SGML original into whichever collection of output formats are required. The components here are:

The Document Type Definition specifies the syntax of the document: which elements, and how many of them, are allowed where. This is the `brains' of the system, and is used, directly or indirectly, by all of the tools.
The Editor
Helps you create a syntactically valid document. It understands the syntax specified in the DTD, and won't allow you to enter text which isn't legal. Many editors also display the text readably (for example, with sections colour-coded, with or without without the SGML tags), according to rules you specify, and so have some aspects of the formatter about them. There is nothing magic about an SGML editor, however, and it is perfectly possible to create your SGML documents in an ordinary text editor, relying on the SGML parser to retrospectively validate your syntax.
The SGML parser
This is the heart of the system, which takes your document, verifies that it does indeed conform to the DTD it claims to, fills in any legitimately omitted parts, checks cross-references, and generally does all the dirty work required to transform the document into a form the down-stream tools can understand easily and unambiguously.

The point of this is that the down-stream tools can now rely on getting a valid document, in just the format they want, with no surprises. This makes them a lot simpler, and makes some operations feasible.

The down-converter, or formatter
The formatter sets to work on the output of the parser, massageing it either into its final form (perhaps HTML or plain text), or into a form ready for more specialised further processing (perhaps MIF or TeX). This formatter can be arbitrarily complicated, and might refer to all sorts of external information sources to resolve inter-document cross references, generate tables of contents or indexes, insert figures, and so on.

The formatter doesn't refer to the DTD, because it is specific to a single DTD (or perhaps to a family of related ones), and if you change the DTD you'll have to change the formatter. The formatter will typically depend on a stylesheet of some type, perhaps written in DSSSL (Document Style and Semantics Specification Language, an ISO standard) or XSL (XML Stylesheet Language, a forthcoming *W3C standard).

The usual SGML term for this is `down-conversion' (presumably because you're transforming the SGML document into a form with less information content), but `formatting' is possibly more intelligible.

Other processing
Part of the power of SGML is that you can do much more than simply transform your documents from an abstract SGML master into a variety of human-readable variants. I won't say anything more about them here, but the structure you have imposed on your document makes available such things as SGML-aware search engines (so you might search for all the occurances of a word in first or second level headings, but not within lists), HyTime (another ISO standard, which supports imposing yet more abstract structures on your documents, including extremely rich cross-referencing) and document transformation (mechanically rewriting your document from one DTD to another).

I have described these components separately, to emphasise that they are conceptually distinct components of an SGML system. It might be most convenient for you to support these different phases with different tools, but there do exist tools which blur the components together.
Norman Gray
21 July 1998