Writes out a SAX stream in a format based on the sgmls ESIS output.


Interface Summary
EsisWriter Provides the writing functions needed by an EsisHandler

Class Summary
EsisHandler Writes out a SAX stream in a format based on the sgmls ESIS output.
EsisParser A parser which can interpret the pseudo-ESIS syntax of EsisHandler.
MessageDigestEsisWriter Calculates a message digest of a stream.
StreamEsisWriter Writes ESIS output to a stream, taking care of encodings and line separators

Package Description

Writes out a SAX stream in a format based on the sgmls ESIS output. This original format is defined by sgmls. The original point of the format was that it should be easy for downstream tools to parse. The point here is that it turns an XML file into an unambiguous byte-stream and, further, that it permits a normalisation operation which is both well-defined and simple.

There isn't a complete overlap between the ESIS and the SAX models, so there are some differences in the output of this normalisation. All the differences here are extensions rather than changes.

The output consists of a sequence of records, separated by CR LF (ie bytes 0xd 0xa). Each line consists of a start character indicating which type of output record it represents, followed by one or more arguments. There are always the same number of arguments, all separated by a single space.

Mprefix uristart prefix mappingextn
mprefixend prefix mappingextn
Aattname CDATA valuedeclare attributeESIS
Bnamespace localname CDATA valuedeclare namespaced attributeextn
(namestart elementESIS
[namespace localnamestart namespaced elementextn
)nameend elementESIS
]namespace localnameend namespaced elementextn
-textcharacter contentESIS
=textignorable whitespaceextn
?pi dataprocessing instructionESIS
Xnameskipped entityextn

An important function of this class is to normalise the ESIS output. We do this in the following ways:

  1. Attribute records (‘A’ and ‘B’) are alphabetised on output.
  2. Successive 'character content' events are merged, and leading and trailing whitespace is trimmed from the resulting merged event. If the resulting event is empty, it is discarded. Ignorable whitespace is... ignored.
  3. Start and end prefix mappings (‘M’ and ‘m’) are discarded.
  4. Any processing instruction which has a ‘target’ of signature is removed.
  5. All of the output is encoded to bytes as UTF-8.

Each start element event is preceded by the set of attributes on that event.

The result of this is to turn the XML:

<doc><ns:p class='foo'
  <p> there,

into the (unnormalised) ESIS form:

Mns urn:namespace
Aclass CDATA foo
Burn:namespace att CDATA bar
[urn:namespace p
]urn:namespace p
- there,\nchum\n

This can also be given the normalised form:

Aclass CDATA foo
Burn:namespace att CDATA bar
[urn:namespace p
]urn:namespace p

In the normalised form, the prefix mappings have been removed (the prefixes are not semantically important), leading and trailing whitespace has been removed from the ‘-’ lines, and all-whitespace ‘-’ records have been removed.

Copyright © 2015. All rights reserved.