Bibliography Project

From GetSemantic

Jump to: navigation, search

Bibliographies seem to be the perfect place to use Semantic Web technology, including RDF, GRDDL, Microformats and others to build an emerging "semantic scholarly web".

With this, we propose to work on a comprehensive bibliographic project that uses a wide mixture of modern Web technologies to produce a new open format for use where bibliographies are currently used.

Contents

[edit] What is currently inside a bibliography

  • Details that are assumed to be correct about documents
    • By 'assumed', it is meant that it is generally required by scholarly standards that one does not intentionally provide misinformation about one's sources, and to make amends where problems are detected.
  • An acknowledgment of either quotation of, or reliance upon the work of, another document - essentially a 'link' between two documents
    • Like web links, these are not necessarily symmetrical. I can state that I have been influenced by Einstein without implying that Einstein has been influenced by me.
  • People

[edit] What exists on the Semantic Web already?

  • FOAF - already provides some of the basis for a bibliographic/scholarly index.
  • Dublin Core
  • hCard - to represent people
  • class values, like with XFN

[edit] What exists in other data already?

  • Library catalogues
    • Specifically, national research catalogues like the British Library and Library of Congress
  • ISBN and ISSN numbers
  • URIs
  • Amazon ASIN numbers
  • Wikipedia and other mostly canonical URI sources
  • Card catalogues
  • Citeulike
  • Frontpiece data in books (eg. title, author, Library of Congress subject classification etc.)
  • Tag metadata from social bookmarking services such as del.icio.us related to Amazon pages
  • Zotero databases
  • MODS XML

[edit] Work So Far

[edit] HTML Serialisation

The Bibliography Project can have any number of HTML serialisations attached using the GRDDL mechanism. Using the same mechanism, if and when the Citation microformat is made available from microformats.org, it should be able to fall naturally alongside the Bibliography Project - and GRDDL can be used to translate Citation data into RDF bibliographies.

[edit] Some possible uses

  • Easily marking books down that one sees on the web for offline retrieval. Imagine if you are surfing around, see a book that you'd like to read, you 'starred' it or 'marked' it, and then later you could get back a file listing the books you've marked and which libraries have them in stock. This would save countless hours of typing in book names in to library databases.
  • Sharing of bibliographic data.
  • A professor may put up a publication list.
  • A book may come with the bibliography listed online, with a URL in the book saying "for machine readable RDF version, see our website..."
  • A university or department might be able to provide an up-to-date RDF file of a reading list for students, which agents could pull in for them.
  • Marking relationships between scholarly documents. For instance, marking up a review in a scholarly journal as being about a certain book.
  • Annotation - a complex one, but something to think about.
  • Validation of scholarly integrity on the Web - perhaps by using Web of Trust to sign documents or online identities (OpenID?) used by academic bloggers?

[edit] Where to go next? (aka. How to Contribute)

  • Cracking open the library databases. Simple scraping tools to get data from university and research databases and triple-ise them (Python is good).
    • Perhaps JavaScript-based parsers that can extract data from library pages as they are viewed (Greasemonkey, Firefox extensions etc.) and store them on a local server?
  • Initial thoughts on RDF modelling of bibliographic data, with the intention to keep it as compatible as possible with existing formats (BibTeX being an important one, since if it is compatible with BibTeX, then a lot of other things are compatible with BibTeX!)
Personal tools