Reasoning about access using ontologies

There exists a small number of approaches to the authorisation problem of deciding, once a user has been authenticated, what that user is permitted to do. This problem is very naturally viewed as an (ontological) subsumption problem: `is this user provably a member of the class of entities allowed access to this resource?'. This approach provides a flexible solution, in which delegation and federation are natural, and which fits into a broad range or architectures. I also describe a reasoning service, Quaestor, which implements the reasoning service.

Once a user has been authenticated -- that is, once a resource has decided that a user really is who they claim to be -- there exists a separate problem of deciding what that user is and is not allowed to do with or to the resource. There are various approaches which address this (including Shibboleth and PERMIS), but the problem can very naturally be expressed in ontological terms, as a straightforward subsumption problem: `is this user provably a member of the class of entities allowed access to this resource?'.

The account below describes:

  1. the simple ontology required to address a simple but realistic access-control problem;
  2. the application, Quaestor, which provides generic reasoning services, with both a RESTful and an XML-RPC interface;
  3. possible extensions of this simple demo to address some of the existing outstanding problems to do with authorisation.

1 The use-case

The simple, but non-trivial, use-case which this demo addresses is the following. A database is to be accessible to researchers at institutions in the UK and researchers who are members of a particular collaboration. Certain tagged rows are accessible to researchers at African institutions.

There are a few other use-cases on the VOTech wiki.

2 The solution

2.1 An ontology of access control

On the right is the asserted hierarchy for an access-control ontology, as displayed by the ontology editor Protégé. Although it is not obvious from this screenshot, Protégé writes its ontologies out as OWL ontologies [std:owl], which is layeed on top of RDF [std:rdf]. The various locations are represented as classes, gathered together under other classes representing continents. The GroupOfPeople class represents either collaborations or institutional groups, and the Person class has subclasses based on location and on access rights. The goal is to end up assigning individuals into the CanSeeAllData class, the CanSeeTaggedData class, or neither.

asserted hierarchy To this class hierarchy, we add further conditions. We declare a locatedIn property, which has Person as its domain, and GeographicalLocation as its range, or co-domain. We then declare as a necessary condition of membership of the UniversityOfLeicesterPerson class that an entity has a locatedIn property whose range is specifically UnitedKingdom, with similar necessary conditions on the other institutional groups. We can then add as a necessary and sufficient condition on the PersonAtUKInstitution class that they have a locatedIn property whose range is UnitedKingdom. If we subsequently assert that #norman is a UniversityOfLeicesterPerson, then we can deduce that he must have the given locatedIn property, and this is sufficient to then deduce that he is a member of the PersonAtUKInstitution class. If we then assert that a necessary and sufficient condition for membership of the CanSeeAllData class is that an individual is a member of the union of the PersonAtUKInstitution and CollaborationXMember classes, and that a member of the CanSeeTaggedData class is in the union of the CanSeeAllData and PersonAtAfricanInstitution classes, then we are finished.

These various conditions are all compactly asserted as extra statements about the classes shown in the hierarchy here. Indeed, the displayed tree is just the visualisation of the assertion that, for example, a UniversityOfLeicesterPerson is necessarily a member of the InstitutionalGroup class.

inferred hierarchyWe (or rather Protégé) can give this collection of assertions (which corresponds to a single RDF graph) to a reasoner, and ask it to deduce the inferred subclass hierarchy which these extra conditions impose on the asserted hierarchy we have displayed above. That results in the hierarchy shown here. Observe that the Person class has been restructured, with various InstitutionalGroup subclasses appearing under Person, and several of them appearing also under PersonWithAccessRights. You can see that someone who is a member of the UniversityOfLeicesterPerson class is also a member of the CanSeeAllData and CanSeeFlaggedData classes. We can see that, with a hierarchy of classes plus a few extra conditions, the reasoner has done most of our authorisation work for us.

Although we have presented this as a single ontology, this is only for the purposes of this demo, and in practice this would most reasonably be split amongst several ontologies, maintained by different actors.

2.2 Instances

How, then, do we exploit this as a component of an authorisation architecture?

Once the ontology is created, we can add assertions about individuals. For example, here are some assertions written in Notation 3 [std:n3]:

@prefix : <urn:example#> .
@prefix ac: <http://eurovotech.org/access-control.owl#> .
:Norman a ac:UniversityOfLeicesterPerson, ac:CollaborationXMember.
:Guy a ac:CambridgeUniversityPerson.
:Markus a ac:EuropeanSouthernObservatoryPerson.
:Sébastien a ac:CentreDeDonnéesDeStrasbourgPerson.
:Jonathan a ac:HarvardUniversityPerson;
        a ac:CollaborationXMember.
:Nelson a ac:UniversityOfCapeTownPerson;
        a ac:CollaborationXMember.
:Tutankhamun a ac:UniversityOfCairoPerson.

We can add further assertions such as:

<urn:example#Norman> = <mailto:norman@astro.gla.ac.uk>.

This indicates that these two URIs are to be deemed to be equivalent, in the sense that any assertion made about one can be taken to be made about the other also.

As with the ontology above, these various assertions would be made in practice by different actors. Assertions that <urn:example#Norman> a ac:UniversityOfLeicesterPerson would be made by (a proxy of) the Leicester personnel department, and an equivalence relation similar to the one above might be made by the resource owner to link the URI that the Leicester authorities use to a different local name for the same individual, such as a local username or, as in this case, an email address.

2.3 Querying

So we have an ontology plus some individuals. How do we get this information out? How do we go about actually plumbing this in to the architecture of our resource-owner's system?

Enter SPARQL [std:sparql].

SPARQL is a vaguely SQL-like language for querying RDF triple-stores. A query against the access-control ontology might be:

prefix : <http://eurovotech.org/access-control.owl#>
select ?person
where { ?person a :CanSeeFlaggedData }

This would return a list of all the individuals in the triple-store which were members of the CanSeeFlaggedData class. Alternatively,

ask { <mailto:norman@astro.gla.ac.uk>
    a <http://eurovotech.org/access-control.owl#CanSeeAllData> }

would return a yes or no answer if norman@astro.gla.ac.uk was indeed in the class of individuals who could see all the data (it should be `yes').

There are other types of query which return RDF graphs, and various ways of filtering and enhancing the results. As of April 2006, SPARQL is not yet standardised, but it is an advanced W3C Working Draft, with multiple working implementations.

2.4 A reasoning service: Quaestor

I have created a generic SPARQL endpoint, called Quaestor, which can be given multiple ontologies and instance assertions, and run SPARQL queries against the merged result. It has both RESTful [fielding00] and XML-RPC [std:xmlrpc] interfaces, and runs within Tomcat. Once the ontologies have been uploaded to it, via HTTP PUT requests, a client can make SPARQL queries of the merged result using either HTTP POST or GET queries.

This service is generic in the sense that it is not tied to any particular ontology -- in particular, it is not tied to just this access-control problem. It is designed to provide OWL-based reasoning services as part of a larger infrastructure, and so its interface has been designed with generality and extensibility in mind.

There is a walkthrough of the interaction with Quaestor, and you can download the demo files and the service .war file from here.

3 Strengths

This approach is heavily standards-based, and builds on pre-existing standards rather than new ones.

OWL, as used here, is essentially a logic programming language, and so the architecture described here is essentially one which relies on mobile code, though it is safe because the language is sufficiently restricted. This flexibility also means that resource owners can be as sophisticated as they wish in defining their security policies, and are not restricted to a pre-existing authorisation language.

Because the relevant assertions are given to the reasoner in the form of OWL/RDF, which is a very low-level format, it is possible to extract assertions from a wide variety of other sources, such as SAML assertions, X.509 certificates, and PERMIS policies (I expect -- I haven't yet tried this, and so don't know just how much preprocessing would be required).

For the same reason, federation of authorisation logic is (again, should be) relatively simple, and flexible. If, for example, institution A has a class a:CanSeeEJournals, and wishes to give e-journal access to members of another institution, B, without re-registering all the relevant members of that institition, then it can do so in multiple ways. If the other institution (or `identity provider', IdP) maintains a class b:LibraryUser and gives access to its e-journals to individuals it asserts to be members of that class, then institution A could simply declare class b:LibraryUser to be a subclass of a:CanSeeEJournals, at which point any individuals asserted to be members of b:LibraryUser can be immediately deduced to be members of a:CanSeeEJournals also. Alternatively, it might be more suitable for institution B to assert individuals' membership of a:CanSeeEJournals directly. In either case the set of assertions would be transmitted to institution A in a discrete packet of RDF assertions (a single RDF graph), and in each case the trust is isolated into A's decision whether or not to trust that particular set of B's assertions (this is expanded on in the discussion of security below).

An infrastructure based on these standards allows delegation in other ways. We have described an architecture in which the reasoning is done locally to the resource, using RDF graphs which may originate from multiple sources. Alternatively a decision could be delegated, in whole or in part, to a remote IdP. Continuing the example above, the resource A could wholly or partially decide to allow an entity access to its e-journals by simply asking B whether they would allow that entity access to their e-journals; that is, by sending a SPARQL query to ask B whether that entity is in b:LibraryUser.

4 Open issues

The problem is not of course completely solved. The following problems need to be addressed.

4.1 Security and trust

In the simplest scenario, the reasoning service described here would sit well away from the open internet, and the graphs which it handles would either be generated locally in the case of the resource owner's own rules, obtained from known-good sources in the case of utility ontologies, or from otherwise secure sources, such as a graph extracted from a signed X.509 certificate.

In contrast to this, the delegation example above required an RDF graph to be sent from one institution to another. This could either be done through a separately secured channel, or by signing the graph using one of the relevant emerging standards (see, for example, http://xmlns.com/wot/0.1/).

Since the parsed RDF graphs are programmatically manipulable, it would be possible for a resource owner to constrain or filter the set of assertions which a remote entity makes, to ensure that the graph is not only from a known source, but also that it does not assert anything it shouldn't.

Privacy: This architecture suffers from some of the same information-leakage problems that SAML assertions do. It is not immediately obvious how an IdP should restrict the set of RDF assertions it makes available to those which are relevant to the properties a remote resource needs or wishes. A possible solution to this is to allow the resource to make more indirect SPARQL queries of the IdP, such as `would you allow this person in to your library?', since these do not expose the underlying assertions. However a malevolent user of such an interface could still build up a substantial amount of information through such a channel, through multiple crafted queries. Combining queries of this type with the Shibboleth handle system [proj:shibboleth] would provide most of the required security.

4.2 Other authorisation frameworks

The Shibboleth system [proj:shibboleth] has defined an intricate infrastructure for access control. When a user requests access to a resource, the resource owner may securely query an appropriate IdP, as guided by the user, to discover the set of attributes, transported in a SAML assertion, which the IdP will warrant applies to the user. The resource owner will then allow or deny access based on those attributes. The Shibboleth system concerns itself with the mechanism for negotiating and transporting the attribute sets, and does not cover any support for the resource owner's reasoning.

The PERMIS system [proj:permis] focuses on the resource owner's specification of their access policy, and provides algorithmic support for the reasoning involved. The PERMIS system does not provide easy support for the dynamic or delegated authorisation frameworks, though it is possible to add such support indirectly.

Since RDF functions at a rather low level, it will be possible to transform PERMIS policies and SAML assertions into equivalent OWL/RDF graphs, so that an OWL-based reasoning infrastructure would be possible as a plug-in replacement for the reasoning in these other authorisation frameworks. This would have the advantage that the resource owner is limited only by their ingenuity in the type and structure of the access controls they wish to impose.

[This section needs to be expanded; add refs to NESC federation experiments. Add pointers to demos/downloads]

[std:n3] Tim Berners-Lee.
Notation 3. Web page, March 2006.
[fielding00] Roy Thomas Fielding.
Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine, 2000.
[proj:shibboleth] Internet2.
The shibboleth project. [Online, cited April 2006].
[proj:permis] PERMIS Project.
PERMIS project. [Online].
[std:sparql] Eric Prud'hommeaux and Andy Seaborne.
SPARQL query language for RDF. W3C Candidate Recommendation, June 2007.
[std:xmlrpc] Dave Winer.
XML-RPC specification. [Online, cited April 2006].
[std:rdf] World Wide Web Consortium.
Resource Description Framework. [Online, cited February 2005].
[std:owl] World Wide Web Consortium.
The web ontology language. [Online].
$Log: access-control.xml,v $
Revision 1.3  2008/08/18 22:33:30  norman
Substantial reworking, to fit with newer stylesheets.
I'm coming back to this work with the AGAST project, so this
  has become live again.

Revision 1.2  2006/04/13 17:26:11  norman
The ontology prefix has changed from access-control2.owl to access-control.owl.
Point to Quaestor demo/walkthrough.

Revision 1.1  2006/04/07 11:05:31  norman
Initial version

Norman Gray
2008/08/18 22:33:30