RDF

From GetSemantic

Jump to: navigation, search

Contents

[edit] What is RDF?

The Resource Description Framework (RDF) is a W3C standard to model information and meta-data in to a variety of different serialisation formats.

RDF is a way of linking data using the web to let anyone say anything about anything.

[edit] Data model

The RDF data model is simple. It uses a subject, object and predicate "triple" structure to give information about a particular thing. For instance:

  • "Bill Gates" (subject)
  • "runs a company at" (predicate)
  • "microsoft.com" (object)

The idea is that if you have a lot of data, you can find out a lot of information by being told the relationships between items by putting them in to triples. The triple is these three pieces of data combined. Each of the three components has different requirements - for instance, a predicate must be a URI, but an object may be a string literal.

[edit] Building data from triples

Let's take a passage of English language. In this case, one of the genealogy passages from the New Testament. It describes relationships between things. In this case, it describes relationships between people (who may or may not be real). Here is how you could imagine this in a triple format:

  • Abraham - father of - Isaac
  • Isaac - father of - Jacob
  • Jacob - father of - Judah
  • Judah - father of - Perez
  • Judah - father of - Zerah
  • Tamar - mother of - Perez
  • Tamar - mother of - Zerah
  • Perez - father of - Hezron
  • Hezron - father of - Ram
  • Ram - father of - Amminadab
  • Amminadab - father of - Nahshon
  • Nashshon - father of Salmon
  • Salmon - father of - Boaz
  • Rahab - mother of - Boaz
  • Boaz - father of - Obed
  • Ruth - mother of - Obed
  • Obed - father of - Jesse
  • Jesse - father of - King David
  • and so on.

Each of these has a similar kind of structure to how an RDF triple works. You could quite easily set a rule saying that any instance where X is the father of Y, and Y is male (you could add gender information as extra triples), you could declare Y the son of X. You can do this kind of manipulation using Rules.

This works on the Semantic Web in another way. Because data is piecemeal - ie. you don't necessarily have all of it at one time - RDF, because of it's triple format, lets you model some of what you know and then add more and more as you go on. If I had my family tree in an RDF file up on the Web, and discovered you had the same thing, we could make a link between our two files where there is a link, import the two in to an RDF reader and what would emerge would be a combination. Combining data in RDF is easy - it's barely a step above concatenating two files together. If I gave you a machine readable copy of the first half of the above list, then gave you the second half, your RDF tools would not skip a beat when combining the two.

[edit] Specifying complexity

RDF is a little bit more complex than this - it also allows you to represent data without a subject. This is called an "ungrounded" triple, or a "blank node". Compare:

  • cheddar - is a - cheese
  • [ something ] - is a - cheese

Sometimes you can describe everything about something except for it's subject. There are many cases where this is relevant.

Our New Testament example has this on the very next line:

  • King David - father of - Solomon
  • [ [ unknown ] - wife of - Uriah] - mother of - Solomon

This is pseudo-RDF, but it should be relatively clear. [ unknown ] is a thing - a person - who is the wife of Uriah, and we know that this unknown person is the mother of Solomon.

[edit] Serialisation formats

In Practical RDF, Shelley Powers points out that RDF/XML is the recommended serialisation for RDF, but "the RDF model exists independently of any representation of RDF, including RDF/XML" (p. 3). Some models of RDF are better suited for some purposes than others. Notation3 is better suited for writing by hand even though RDF/XML is better supported by RDF parsers and toolkits. Notation3 makes it easier to see the structure of your triples than RDF/XML. You need to try and look beyond the serialization format to the triples hidden underneath the syntax.

Below, there are some code samples of the same triple structure expressed in different serialisation formats.

[edit] RDF/XML

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Bill_Clinton">
<dc:title>Bill Clinton</dc:title>
<dc:lang>en</dc:lang>
</rdf:Description>
</rdf:RDF>

[edit] Notation3

Notation3 uses dots to end triple statements, semi-colons to signify the end of one triple but the start of another with the same subject and commas to signify the start of a triple with the same subject and predicate but a different object.

String literals are wrapped in quotation marks, and URIs are wrapped in left and right angle brackets. It also supports 'prefixing' that is similar to QNames. For instance <http://purl.org/dc/elements/1.1/title> could be represented as dc:title if the @prefix node is there (as below).

@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dc: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
<http://en.wikipedia.org/wiki/Bill_Clinton> dc:title "Bill Clinton";
dc:lang "en".

[edit] N-Triples

N-Triples uses a limited structure of Notation3 - it is a sub-set of N3. Triples end with ".", but prefixes are not used, nor are semi-colon or comma delimiters.

<http://en.wikipedia.org/wiki/Bill_Clinton> <http://purl.org/dc/elements/1.1/title> "Bill Clinton".
<http://en.wikipedia.org/wiki/Bill_Clinton> <http://purl.org/dc/elements/1.1/lang> "en".

[edit] TriX

TriX is an XML serialization of RDF, but follows the N-Triples data model of explicitly listing the URL. The XML element, such as "uri" or "plainLiteral" serves for typing the content. You can also use the xml:lang attribute to enter a language code (like "en" or "fr") for a string literal.

<TriX>
<graph>
<triple>
<uri>http://en.wikipedia.org/wiki/Bill_Clinton</uri>
<uri>http://purl.org/dc/elements/1.1/title</uri>
<plainLiteral xml:lang="en">Bill Clinton</plainLiteral>
</triple>
<triple>
<uri>http://en.wikipedia.org/wiki/Bill_Clinton</uri>
<uri>http://purl.org/dc/elements/1.1/lang</uri>
<plainLiteral>en</plainLiteral>
</triple>
</graph>
</TriX>

[edit] Using RDF

  • It is possible to query RDF using a SQL-like language called SPARQL
  • It is possible to use an inference engine to infer from statements in RDF more information.

[edit] More information

Personal tools