I present a proposal for structuring and justifying UTypes, which is intended to complement the 2009 May 24 Louys draft-0.3 proposal.
On 2009 May 24, Mireille Louys circulated a preliminary draft proposal for Utypes, for comments within the DM working group. This document is a response to that.
[Update: Mireille has since produced a further 0.4 draft, also available on the the IVOA wiki. The proposal on this page is to some extent a response to a list of utype questions I produced some while ago; detailed correspondences are discussed below in Sect. 5 Answers to Questions; see also the discussion on the dm@ivoa.net list.]
This document is not intended to be a
counterproposal. I believe it is at heart the same proposal
as Mireille's, but arrived at from a rather different direction, and
so justified in a different way. The principal syntactic difference
is that the model-name
Utype elements are here references
to a namespace, rather than regarded as the namespace themselves.
For the sake of clarity, I have presented this in the assertive style of a proposed standard; it is in fact more tentative than that, and with more rationale and commentary (presented like this) than would likely be in an actual standard.
Goals:
Consider the XML fragment below:
<characterisationAxis> <axisName>space</axisName> <numBins>16</numBins> </characterisationAxis>
Consider also the FITS card below
NUM-BINS= 16
Do these two fragments say the same thing?
Yes, they do. We can tell this because in each case we can (mentally) draw a picture which looks like this:
In each case, we can clearly define a rule for mechanically going from that picture (or, more concretely, from a set of structures in memory, or whatever it is which is the post-parse result of reading these files) to each of the given serialisations, and we can define a rule for recovering that picture from the serialisation.
The property number-of-bins
expresses the relationship
between the thing which has this observation date and the date
itself. It is an element of a data model. The string
number-of-bins
is obviously a poor one, since it gives no
indication of which data model the property comes from, nor where to
find more information. It is important to specify what features a
better name would have, and the resulting well-featured name is termed
a UType
.
This approach – identifying the abstract property name as the interface between a data model and a serialisation – allows us to draw a clear boundary between the various parts of the problem, since it becomes clear that:
Put another way, identifying the picture above as an interface specification means that the details of the serialisation, and the algorithm for constructing the UType names, become mere implementation details.
What are the features which the property name must have?
rawserialised file, to be able to understand it without constantly looking up otherwise meaningless strings.
We note that a URL meets all of these criteria, and propose that UTypes should be dereferenceable URLs. For example, one might imagine a Characterisation UType such as:
http://www.ivoa.net/Documents/Characterisation-1.13.html#Char.SpatialAxis.NumBins
This might appear in a specific serialisation as:
char:Char.SpatialAxis.NumBins
if there were some separate syntactic mechanism, appropriate to
that serialisation, for associating the string char:
with
the URL http://www.ivoa.net/Documents/Characterisation-1.13.html#
.
In use, an application would not have to parse this UType, and could regard it as a completely opaque string. Since an application only deals with the post-parse result of a deserialisation, the serialisation technique has absolute freedom to transform this UType in any way. Thus it would be natural and tidy (but generally not necessary) for an XML serialiser to use XML namespaces when generating its output file, and it would be necessary for a FITS serialisation to perform some transformation to fit keyword-value pairs into an 8+70 character FITS card image.
This has the following advantages.
The requirement that UTypes be dereferenceable does not mean that software would be expected to dereference them frequently. Since the content of the retrieved information would generally be static, being the result of a standardisation process, it could be very aggressively cached, and might for example only need to be retrieved during a application build process.
Consider now a slightly more complicated case, which has elements from both the Characterisation and STC namespages:
<characterisationAxis> <axisName>spatial</axisName> <!-- ... --> <coverage> <location> <coord coord_system_id="TT-ICRS-TOPO"> <stc:Position2D> <!-- ... --> <stc:value2> <stc:C1>132.4210</stc:C1> <stc:C2>12.1232</stc:C2> </stc:value2> </stc:Position2D> </coord> </location> </coverage> </characterisationAxis>
How do we picture this? One obvious possibility is:
A suitable UType for the coord-value-c1
property might be (following
Mireille Louys's draft)
char:coverage.location.coord;stc:Position2D.value2.C1
,
but this syntax appears slightly arbitrary.
Another way of picturing this case is as follows:
In this picture, the char:coverage.location.coord
UType has as its value
an object of type stc:Position2D
, which in turn possesses
properties stc:value2.C1
and stc:value2.C2
.
At this point we have two choices:
char:coverage.location.coord;stc:value2.C1
and
...;stc:value2.C2
, and require the serialisation to use
these UTypes.The latter picture gets us to the same point as in Mireille's
proposal, but in a way which reflects the relationship with the
structured underlying model, and which makes clear how this approach
could be extended to more elaborate situations if that were
necessary. In this view, a UType is a unique sequence of what we might call
proto-UTypes
which ends in a literal value.
A FITS analogue might use FITS-WCS keywords to serialise this abstract structure, or might serialise it using a more direct keyword-value technique, using the same UTypes.
Dereferenceable to give HTML docs. See section 4.1 Human-readable documentation.
Dereferenceable to give machine-readable information. See section 4.2 Machine-readable documentation.
UType publishers should choose a long-term stable URL for their
namespace. The natural domain for this is some location under
www.ivoa.net
, identified as part of the data model
standardisation process. The standard
document's URL is an obvious first choice.
Given that UML is becoming a popular data modelling language, this standard should publish a set of XSLT scripts which transform an XMI file into HTML and XML or RDF, to help UType publishers. These scripts could embody recommended practice for how to generate UTypes from UML model entities.
A set of UTypes would be defined by some standard document,
published at a URL which will remain stable over a timescale of at
least decades. We can expect that these will
be www.ivoa.net
URLs, without closing the door to future
www.iau.org
UTypes for example. If each of the
UType definitions in that document is associated with an HTML
<a name="utype-name">
element, then the requirement
for human-readable documentation behind the UType has been immediately
and fully met.
The requirements for machine-readable documentation
need be
neither onerous nor exotic. This would be retrieved from the
same URL, by requesting an appropriate non-HTML MIME type.
It is at this point that a preferred label
(or display name, or
short name) might be declared for the object. As noted above, this would not typically be
retrieved by the running application, but only rarely, such as during
a software build. It is an open question what form this extra
information might take, but XML, RDF and JSON are all defensible
possiblities.
This proposal provides concrete answers to the recent list of UType questions I asked. Specifically:
//
or foo/..
, which are
fairly unlikely to appear in UTypes).uniqueness problembecomes the question of whether the first goal in 1 Background is actually achievable – that is, is it possible to reconstruct an instance of the object that the data model represents, purely from a set of key-value pairs? If this is not in fact achievable, for realistic data models, then one solution is to postpone the problem by developing
UFIs– which simply means we have two problematic standard to identify; or else to reluctantly abandon this goal and develop a more flexible framework for UTypes.
complexType
declarations in
XSchemas or types in UML; this may come down to the judgement of the
group deriving the UType names from a given data model.