$Date: 2006/05/26 14:00:21 $

RDF Notes

At the Cambridge IVOA meeting in May 2003, I made a noise about RDF on a couple of occasions, and it was suggested that I write down something about it. So here it is.

This is not really a beginners' guide to RDF, because (a) I'm not expert enough to claim I should write such a thing, and (b) because several such guides already exist, and I point to them below. My aim here is simply to draw attention to RDF in the astronomical context, give a rapid introduction to it, and point out why and how it might be useful. I'd be delighted to receive corrections and amplifications on what I write here.

http://www.astro.gla.ac.uk/users/norman/note/2003/rdf-intro/
$Revision: 1.2 $

Contents:

Example

Example first, syntax later. Consider the set of RDF statements illustrated here. RDF statements as a graph This appears to be a very elaborate way of saying something simple, but we'll get to the point of that in a moment.

What this says is that the resource with URI #m31 has a property called `luminosity', the value of which is an anonymous resource (meaning simply that it doesn't itself have a URI). That resource has two properties, one with property name `numericValue' which has a literal value, and one with property name `type', the value of which is another resource, with URI #Vmag, and that resource in turn has the `subClassOf' property with value #Mag.

That introduces essentially all of the RDF terminology and ideas:

The property names don't look like URIs, but are more formally #luminosity, http://www.w3.org/1999/02/22-rdf-syntax-ns#type and http://www.w3.org/2000/01/rdf-schema#subTypeOf, indicating that the `luminosity' property is a local one, `type' is part of the basic RDF syntax, and `subTypeOf' is part of a particular RDF Schema, namely the core one described as part of the RDF spec.

As noted here, the property `subTypeOf' is part of a particular `RDF Schema'. A Schema in this context is simply a particular set of properties, with the semantics of the various properties carefully described in text, and perhaps with interrelations enough thay they can support some inferencing. This, as far as I'm aware, is all that the term ontology actually refers to. In this example, I imported the `type' and `subClassOf' properties from their respective ontologies, and the properties `luminosity' and `numericValue' come from a home-made ontology. The `inferencing' here is not particularly profound; it simply means that a general RDF processor (as long as it `knew about' the core Schema which includes `subTypeOf') confronted with this set of RDF statements, could infer that, as well as having a #Vmag of `8', the resource #m31 can be taken to have a #Mag of `8', also. (There's obviously a great deal of enjoyable argument extractable from the phrases `knew about', `infer' and `taken to have', but the useful work done by these relations is clear). DAML+OIL and OWL are other examples of ontologies.

That's more-or-less it.

The Point

The introduction to the RDF Recommendation describes the goals of RDF, and one passage in particular says

[RDF] provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web resources. RDF can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities, in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library, by intelligent software agents to facilitate knowledge sharing and exchange, in content rating, in describing collections of pages that represent a single logical "document" [...]

That's the sort of language, with `RDF' changed to `the VO' I wouldn't be surprised to find in a VO funding application. Add in language like Due to RDF's incremental extensibility, agents processing metadata will be able to trace the origins of schemata they are unfamiliar with back to known schemata and perform meaningful actions on metadata they weren't originally designed to process, and it seems fairly clear that the problem that RDF aims to solve overlaps significantly with the opportunities the VO wants to exploit.

It's unfortunate that, with the exception of this homily at the beginning, the RDF spec, and even the W3C RDF primer, talks a lot about syntax, but makes rather a poor job of explaining just what RDF is for.

An excellent introduction, which does discuss this, is Tim Bray's xml.com article. That article remarks that [RDF] is a framework for describing and interchanging metadata, which sums it up admirably.

The Point, it seems to me, is that the sort of primitive statements we want to make -- M31 has luminosity x, a V-magnitude is a subclass of Magnitude -- are, with only a little processing, better modelled by RDF triples than by XML trees, so that there is a better impedance match between what we want to say and RDF's way of saying it.

Further, the framework is expressly designed for interoperability, both by making the primitive statements storable in a wide variety of formats, and by confronting head-on the problem of associating meaning with elements in the (RDF) Schema.

Another part of the point is that the very basic ideas which RDF exposes are very illuminating, and provide (in my opinion) a good intellectual toolbox for discussing other data-modelling problems, including those which don't explicitly involve RDF.

The basic ideas

RDF is simple

RDF gets some of its power from being composed of very primitive ideas, namely the resource-property-value trichotomy, and (what is much the same thing) the notion of subject-verb-object triples as the primitive type of statement which, with only minimal ingenuity, is sufficient to express everything you want. What this gets you is a system which is not useful in isolation, since the set of properties defined in the RDF spec and the core RDF Schema is tiny, but which provides the language which efforts such as OWL or DAML+OIL can use, so that they can produce useful property sets.

Because it's so simple, it's tremendously interoperable. RDF triples can be stored in XML (in a bewildering variety of syntaxes), in databases, on blackboards, even in Fortran, probably.

The other thing you gain from the simplicity is an antidote to the dominance of XML. It's easy to be sucked into the `all the world is XML' mindset, and find yourself ramming into a tree form all sorts of structures which aren't really organised like that, in much the same way that old Fortran programmers can be observed writing Fortran in any language you teach them.

You can reason with it

Metadata lets you process resources in a generic fashion -- that's the point of it. Since in RDF properties are also resources, it follows that it's possible to process them generically also. That processing has the potential to be tremendously sophisticated (and this is where the AI and knowledge-engineering folk come in), but it can also, usefully, be as simple as in the example above: if I want a resource which is of type #Mag, I can deduce that something of type #Vmag will do.

All the world's a URI

The resources which RDF statements describe are all named using URIs. That's possible because URIs are intended to be a perfectly general syntax for naming things, even things which aren't, or could never be, on the web, such as people or concepts. They are therefore distinguished from URLs, which is the subset of URIs which are also addresses, and which refer to things which are actually network-retrievable; see RFC 2396 for distinctions and scope (this is currently, as of June 2003, being revised; the draft replacement rfc2396bis expires 2003 December 5).

The example at the top consists of a description of a resource with URI #m31 -- a local URI. While this is natural, there is nothing to stop you creating a set of RDF statements about any URI. Thus, with no added syntax, we have a way of creating stand-off metadata, and thus adding information about resources you do not control, or which are not XML, or which are read-only.

So why not just use XML?

Because it's too complicated. Consider the example at the top again. It could be written in XML as:

<resource>
  <uri>#m31</uri>
  <luminosity>
    <type>
      #Vmag
      <subClassOf>
        #Mag
      </subClassOf>
    </type>
    <value>8</value>
  </luminosity>
</resource>

or, say,

<resource uri="#m31">
  <luminosity type="#Vmag|#Mag">
    8
  </luminosity>
</resource>

or any one of a large number of other ways. As humans, we have little difficulty in making some sense of this, but it would be a fuss at least to define the XML Schema which this is part of, and to write software to process it. The problem is that, in XML, concepts like trees, children, attributes, which are so helpful when we wish to mark up general data, simply get in the way when we wish to express a set of flexibly primitive statements. XML is self-describing only to humans; machines need standards documents and piles and piles of Java.

In his xml.com article, Tim Bray asks this same question, and the point is addressed again in more detail in Tim Berners-Lee's RDF-vs-XML note.

Syntax

This pictorial form shown above is one of the defined notations for RDF. The other well-known one is the XML serialisation. The RDF/XML version of the example above is:

<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns='http://x.net/#'
  xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
  >
  <rdf:Description rdf:about='#Vmag'>
    <rdfs:subClassOf rdf:resource='#Mag'/>
  </rdf:Description>
  <rdf:Description rdf:about='#anon0'>
    <rdf:type rdf:resource='#Vmag'/>
    <numericValue>8</numericValue>
  </rdf:Description>
  <rdf:Description rdf:about='#m31'>
    <luminosity rdf:resource='#anon0'/>
  </rdf:Description>
</rdf:RDF>

...which looks horrible. If you want to unpack that, you can do so using one of the RDF primers I point to below. I introduce it here only to make the observation that the RDF/XML syntax is not intended to be human-readable, or even particularly human-writable. As I understand it, a primary consideration was that it be flexible enough to be embeddable within other XML in a wide variety of ways.

A slightly more readable example is this one:

<rdf:RDF
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xmlns="http://x.net/rdf/#">
  <rdf:Description
      rdf:about='http://x.net/rdf-intro'>
    <Author
      rdf:resource='http://x.net/norman/'/>
  </rdf:Description>
  <rdf:Description 
      rdf:about='http://x.net/norman/'>
    <Name>Norman Gray</Name> 
    <Email rdf:resource='mailto:norman@x.net' />
  </rdf:Description>
</rdf:RDF>

Graphical representation of         RDFThe graphical representation of it is here. That indicates that the http://x.net/rdf-intro resource has for its #Author property the resource http://x.net/norman/, which has the two further resources shown. More clearly than in the previous example, each of the resources is a URI.

Even though this is a little simpler than the previous example, it is still rather difficult to read. Clearest is to break this into the three subject-predicate-object triples which the RDF fundamentally represents. They are:

<http://x.net/rdf-intro> <#Author> <http://x.net/norman/> .
<http://x.net/norman/> <#Name> "Norman Gray" .
<http://x.net/norman/> <#Email> <mailto:norman@x.net> .

The triples form of the first example is:

<#m31> <#luminosity> <#anon0> .
<#anon0> <#numericValue> "8" .
<#anon0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <#Vmag> .
<#Vmag> <http://www.w3.org/2000/01/rdf-schema#subClassOf>
        <#Mag> .

There is another syntax, called Notation3 or n3, which is similar to this triples notation, and which is designed to be `scribblable', in the sense of being writable on a whiteboard. There are tools available to convert back and forth between these various syntaxes. There are other syntax proposals, but the main point of RDF is not syntax, and we shouldn't become bogged down in it.

Primers and other resources

Spec
http://www.w3.org/RDF/ is the home of all things RDF. This is still in flux, and I understand that the later documents here largely supersede the 1999 spec, even though they're not yet (June 2003) at recommendation.
Primers
There's a primer on the W3C RDF pages which is comprehensive and correspondingly rather intimidating. Tim Bray has an excellent introduction, What is RDF? which, most unusually for these introductions, does actually explain what the point is.
Tools
Dave Beckett's RDF resource guide is comprehensive; the W3C RDF page has a short list of tools; and there are a few DAML tools. Jena is a semantic web toolkit from HP.
Notation3
Spec, primer, preprocessor n3s and A Rough Guide to Notation3
semantics@ivoa.net

The IVOA semantics list has had a number of discussions on RDF and ontology and the like. Notable (to me) messages and threads include

Other examples
Norman
$Date: 2006/05/26 14:00:21 $