RDF Parsing Hacks
From GetSemantic
WARNING: These are hacks! Please do not follow them, and please do not build stuff on the basis that others will follow them. They are listed as ways to get you out of sticky situations, not as everyday practices.
You should follow RDF Best Practices where possible, but sometimes it's difficult to get to the particular part of the document you want. These are the WRONG ways to do it, but sometimes the WRONG ways are useful. (Please use SPARQL / Turtle-esque syntax to describe hacks.) Feel free to also add links to data sets that prove or disprove particular hacks. The eventual goal of this document, along with RDF Best Practices is compiling a functional specification to provide a simple XML+JSON API to allow mainstream, non-RDF developers to get access to RDF data.
[edit] Subject hunting
Finding the relevant subject in a graph can often be difficult, especially when a document doesn't make it obvious. For FOAF profiles, the { ?x a foaf:PersonalProfileDocment; foaf:primaryTopic ?subject . } pattern is the best way to do it. But sometimes this is not done. Often people will use things like blank nodes or not conform to the FOAF primaryTopic pattern. List below hacks you use to get to those.
- Count the IFPs. If you've got a FOAF document and one subject in the graph has a whole shed-load of inverse functional properties, there's a fairly good possibility that this subject is the person you are looking for.
- Count IFP and literal uniqueness. Generally in FOAF profiles, the primary subject of them will have a fair few more unique properties than the others.
- Check the relational links. If you've got a FOAF document, look for { ?person foaf:knows ?x . } relations. The primary person in the document should have a lot of these going outwards, and few, if any, coming back.
- Check URIs. If you've retrieved a FOAF document from 'example.org', regex all the foaf:homepage relations to find the one which lists 'example.org' as their homepage. This can be deceptive, of course.
See also The Topic Finder, a service and a list of heuristics.
[edit] Application use
- Generally, if you are reusing data in your application, following the 'additive' pattern, you should try to avoid false positives at the risk of false negatives, and give the user the ability to correct mistakes.
[edit] If the data sucks
Do contact the provider of the data, if that is possible. Give them clear instructions on how to correct the RDF they are publishing, and point them to the relevant parts of the RDF specifications and RDF Best Practices. Also be sure to point out to them that if they have further questions to contact you, or

