SGML on the Desktop / 2. Which tools do I use? / 2.1. Which DTD?


2.1. Which DTD?

If you are interested in SGML because you want to work with a particular collection of SGML documents, then the question of which DTD to use does not arise. If, however, you have no such restriction, then you will have to decide which DTD, or family of DTDs to use, or whether you need to develop your own customised DTD.

There are no standard DTDs, in the sense that there is no DTD which SGML uses `by default' or `most naturally'; instead there are several DTDs which are well-known, extensively developed, and for which support exists. If your application can fit into one of these without too much difficulty, I think you would be well-advised to use it or one of its variants for your documents. Designing a DTD is not a trivial task, and by using a preexisting DTD you are building on a great deal of work by many people.

The advantages of a preexisting DTD are that some group has already done all the work of designing a DTD which is rich enough to encapsulate all the important information in your documents, and simultaneously flexible enough to support documents significantly different from those the designers had in mind. Also, such a DTD will probably already have support available in formatters.

Some well-known DTDs are:

The *TEI DTD was developed by the *Text Encoding Initiative as standard way to encode many types of documents in the humanities. The DTD has very rich support for those structures within texts which are important to the humanities, and there is a good deal of software which exists to process documents marked up with that DTD. If your SGML project is within the humanities, you need a very good reason not to use the TEI DTD.
Because it is so rich, the TEI DTD is very complex, and for this reason, the TEI-Lite DTD has been defined as a subset of the full TEI DTD, suitable for documents which do not require the full works.
The *DocBook DTD is a flexible, modular DTD, intended for software documentation, which means it has rich support for features such as function definitions, user notes, and the other miscellanea of manuals. It is very well supported by tools.
The *MathML draft is currently being worked on by the *World Wide Web Consortium, as an interchangeable mathematical notation for SGML. It is not intended to be used as a complete DTD, but is an example of a DTD fragment, to be included within a larger, more complete DTD.
DTD collections
There is a list of DTD collections at the *SGML DTD archives

The DTDs above are flexible, in that they have optional sections, which can be included or not, depending on the complexity of the documents you wish to support, and the DocBook DTD, for example, has support within it to help you add your own customisations if it does not match your documents by default.

If none of these DTDs matches your documents, you will have to develop a custom DTD. The most sensible way to do this is to start with a DTD which comes close, and then adapt and extend it so that it includes the structures you want. You will also have to develop a formatter to cope with the DTD. This is not a trivial task, and will require you to learn about SGML in some detail.

Bear in mind that the choice of DTD to use for your documents is not cast in stone. If you decide in future that a different DTD would be most suitable for your documents, then it is possible to translate them from one DTD to another, provided that your initial DTD is sufficiently rich to express all (or at least most) of the information in the final one (the ability to perform this sort of high-level translation is part of the point of SGML). This translation from one DTD to another is referred to within SGML as transformation. The process is not trivial, as it requires you to get a transformation program and find out how to express the transformation in a suitable way, but it is preferable to either doing the transformation by hand, or struggling along with an inappropriate DTD. Some formatting tools, or a system like HyTime, can be used to do the transformation, with a bit of ingenuity.
Norman Gray
21 July 1998