Open Education: Talis Incubator Proposal

Back in May, I woke up with an idea in my head which, in a slightly modified form, I’d now like to try and find funding for. ((I figure that if I repeat this idea enough times, someone will see that it’s worth funding ;-))) The idea is based on work we’re doing on our JISCPress project, which itself is based on work Tony and I have been doing with WriteToReply since February. In my original blog post, I proposed that WordPress Multi User ((and here I’ll repeat what is becoming my mantra: ‘the same software that runs six million blogs on wordpress.com’ )) and Scriblio, a set of plugins for WordPress which allows you to import an OPAC library catalogue and benefit from all the advantages of the WordPress ecosystem, would together allow libraries to host independently branded catalogues on an open, union platform.

Imagine that JISC, Talis or Eduserv offered such a platform to UK university libraries. It could be a service, not unlike wordpress.com, where authorised institutions, could self-register for a site and easily import their OPAC, apply a theme, tweak some CSS, choose from a few useful plugins, and within less than a day or two, have a branded, cutting-edge search and browse interface to their OPAC, running under their own domain.

Paul and I gave a Lightening Talk about this at Mashoop North, which I present to you below.

Slide four is the useful one. It show the various slices of the platform and, by implication, the various uses each layer offers.  The bottom slice shows the OPACs converge with WPMU to the benefit of the institution. It’s a nice, easy, hosted service that would offer an end-user experience not unlike the one that Plymouth State offer to their users. The middle slice – the WPMU bit – is where the OPACs converge together in union, under a single administrative interface that is easy to manage, widely used and supported. For $5000/year, Automattic, the company that leads the development of WordPress and runs wordpress.com, would provide support and advice with a six hour SLA. On top of that, anyone with a knowledge of PHP, can quickly learn the guts of WordPress, as Alex who’s working on JISCPress, will testify. My point is that this is a well tested and widely understood technology.

Now, once you have one or more OPACs hosted on WPMU, you bring together a lot of library catalogue data into one database and the platform’s web analytics (i.e. usage trends) can be a rich source of data for learning about what library users are looking for. Each library, would have access to their own analytics, while the analytics for the entire platform would also be collected. I do this on our university WPMU installation.

The next slice in our diagram, shows a few different ways of getting data out of the platform (and this would also apply to each individual catalogue site, too).  First, you can see that the platform as a whole could act as a union catalogue where, from a single site, users could search across library holdings. That union catalogue would have all the useful features of WordPress, too. Next to that, you can see Triplify, a nice little web application that transforms a relational database into RDF/N3, JSON and Linked Data. Triplify could re-present the data in each catalogue as semantic data and this could be subsequently hosted on the Talis platform.  We’re already doing this with JISCPress. Every night, changes to any of the library catalogue data could be pushed to Talis, where the data can be queried and mashed up using the Talis API. Finally, don’t forget good old RSS and Atom feeds, which are available for almost every WordPress endpoint URL, as I’ve previously documented.

Given the work we’ve done on JISCPress, which covers our experience with WPMU and Triplify, I think that a demonstrator prototype, using entirely open source software, could be developed within the constraints of the Talis Incubator fund. I canvassed my original idea to the Scriblio mailing list and had positive and useful feedback from Ross Singer at Talis. Leigh Dodds at Talis also sees potential in the use of WPMU and Triplify, although I understand that neither of these people are endorsing the idea for the Talis Incubator fund, but their interest has been encouraging.

So, what I’m proposing is that Paul and I work with Casey Bisson, the Scriblio developer, on a short project to get this all up and running. In my mind, Scriblio needs some more work to make the set up process easier for a variety of library catalogues and the last time I looked, it needed documenting better, too. I think that the maximum of £15,000 from the Talis fund is workable. In fact, I’d like to bring it down a little to make it more attractive to the judges. Paul would bring his knowledge and expertise from working with our university library catalogue, I would bring what we’ve learned from JISCPress and could manage the WPMU server side of things and the project in general, as well as write documentation, while Casey could be funded to spend some dedicated time fine tuning Scriblio to meet our objectives.

So what do you think? A wordpress.com like platform for library OPACs that pushes semantic data to the Talis platform. Each catalogue remains under the control of its owner institution, while contributing to a wider union OPAC that will benefit users and offer the library community some useful analytics. The platform as a technology, would be as flexible as WordPress itself is, so additional features could be developed for the platform by other future projects. Only last week, Tony was discussing on his new Arcadia project blog, how it would be useful to be able to capture library catalogue links as QR codes. Well, using WordPress in the way I’ve described, we could implement that across every UK HEI Library catalogue in a snap using this plugin. Hoorah!

Mashoop North!

Paul and I have just presented our ‘lightning talk’ on the use of WordPress MU and Scriblio to create a platform for publishing multiple OPAC catalogues and then exposing the aggregate data as RDF using Triplify. I blogged about this idea a while back and this is the first presentation we’ve given. Not sure what people made of it. Too ambitious? Threatening? Confusing? All I know is that from where I’m standing, it would require a relatively small amount of funding to show it working in principle with a handful of library catalogues. The difficult part would be scaling it to work for 100+ catalogues (though bear in mind, wordpress.com hosts 6 million sites) and satisfying the politics of each institution. Still, that shouldn’t stop us from trying.

Scriblio, Triplify and XMPP PubSub

It occured to me this morning, as I woke from my slumber, that the work I’ve been doing recently with WordPress, could also be applied to a library catalogue using Scriblio.

Scriblio (formerly WPopac) is an award winning, free, open source CMS and OPAC with faceted searching and browsing features based on WordPress. Scriblio is a project of Plymouth State University, supported in part by the Andrew W. Mellon Foundation.

Which means that you can import your library catalogue into WordPress and the user can search for and retrieve a record for The Films of Jean-Luc Goddard. Have a look around Plymouth State’s Scriblio and you’ll get a good feel for what’s possible.

Anyway, taking Scriblio’s functionality for granted, you could easily add Triplify to the mix as I have discussed before. So with very little effort, you can convert your library catalogue to RDF N-Triples (and/or JSON). My questions to you Librarians is: knowing this is possible and fairly trivial to do, is there any value to you in exposing your OPACs in this way?

Next, as I lay listening to my daughter chat to her squeaky duck, I thought about the other stuff I’ve been looking at recently with WordPress.  Once you think of your library catalogue as a WordPress site, there’s quite a lot of fun to be had.  You could ramp up the feeds that you offer from your OPAC, use the OpenCalais API to add semantic tags, plugin some more semantic addons if you wish (autodiscovery of SIOC, FOAF, OAI-ORE data??), and, perhaps most fun of all, publish OPAC records in realtime over XMPP PubSub.

Which brings me to JISCPress, our recent #jiscri project proposal, which we may or may not get funded (what are we, a week or two away from finding out??).  In that Project, we’re proposing a WordPress MU platform for publishing and discussing JISC funding calls and project reports (among other things).  There’s a lot of cross-over between the above Scriblio ideas and JISCPress. So much so, that it’s probably no more than a days work to transform the JISCPress platform, hosted as an Amazon Machine Image, to a multi-user OPAC platform where, potentially, all UK University libraries, publish their OPACs via separate Scriblio sites.

You could then, like wordpress.com has done, publish an XMPP firehose from every catalogue over PubSub for search engines or whoever is interested in realtime data from UK university library catalogues. Alternatively, instead of the WPMU set up, each University library could maintain their own Scriblio install and publish an XMPP feed to an agreed server (though that approach seems like more hassle than is necesary if you ask me. You’re bound to have some libraries falling behind and not upgrading their sites as things develop. For less than a collective £4K/year, we could all buy into commercial support for a WPMU site from Automattic to help maintain server-side stuff).

I dunno. Maybe this is all off the wall, but the building blocks are all there. Is anyone experimenting with Scriblio in this way? Don’t tell me, a bunch of you have been doing it for years…

Triplify: Make your blog mashable

Last week, I wrote about how it is relatively simple to ‘pimp your ride on the semantic web‘. Over the weekend, I stumbled upon Triplify, a small ‘plugin’ for pretty much any web publishing platform, that “reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.” What is so appealing about Triplify is how easy it is to implement, especially alongside a WordPress site.

I can confirm that the three-step installation process is all it takes, although I wouldn’t undertake implementing this blindly as you are, literally, exposing a semantic representation of your database content. In other words, you should look at the configuration file you’re using and check that it’s going to expose the right data and not clear text passwords and unpublished posts and comments. Before I  implemented it, I realised that it would expose comments on a bunch of posts that I have since made private (they were imported from an old, private blog), so I had to ‘unapprove’ those comments so the script didn’t expose them to the public. A five minute job. Alternatively, the script could probably be modified to work around my problem, by only exposing comments after a certain date, for example.

The end result is that, with a WordPress site, you expose a semantic representation of your users, posts, pages, tags, categories, comments and attachments in RDF (N-Triples) and JSON formatted data (for JSON, just add ‘?t-output=json’ to the end of the URI). Like I said though, it could be used on any database driven web application. Here’s what you get when you expose the high level links to your content:


<http://blog.josswinn.org/triplify/> <http://www.w3.org/2000/01/rdf-schema#comment> "Generated by Triplify V0.5 (http://Triplify.org)" .
<http://blog.josswinn.org/triplify/> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by/2.0/uk/> .
<http://blog.josswinn.org/triplify/post> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://blog.josswinn.org/triplify/attachment> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://blog.josswinn.org/triplify/tag> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://blog.josswinn.org/triplify/category> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://blog.josswinn.org/triplify/user> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://blog.josswinn.org/triplify/comment> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .

Here’s an example of what you get when you expose the full content:


<http://blog.josswinn.org/triplify/post/154> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#Post> .
<http://blog.josswinn.org/triplify/post/154> <http://rdfs.org/sioc/ns#has_creator> <http://blog.josswinn.org/triplify/user/1> .
<http://blog.josswinn.org/triplify/post/154> <http://purl.org/dc/terms/created> "2008-10-06T05:55:25"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<http://blog.josswinn.org/triplify/post/154> <http://rdfs.org/sioc/ns#content> "Up early to go to Sheffield for LPI exams. The last week has left me underprepared. Never mind." .
<http://blog.josswinn.org/triplify/post/154> <http://purl.org/dc/terms/modified> "2008-10-06T20:12:15"^^<http://www.w3.org/2001/XMLSchema#dateTime> .

...

<http://blog.josswinn.org/triplify/post/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag> <http://blog.josswinn.org/triplify/tag/27> .

...

<http://blog.josswinn.org/triplify/post/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag> <http://blog.josswinn.org/triplify/tag/41> .
<http://blog.josswinn.org/triplify/post/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag> <http://blog.josswinn.org/triplify/tag/42> .

...

<http://blog.josswinn.org/triplify/post/154> <http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/belongsToCategory> <http://blog.josswinn.org/triplify/category/22> .

...

<http://blog.josswinn.org/triplify/tag/154> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/Tag> .
<http://blog.josswinn.org/triplify/tag/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/tagName> "valentine" .

You can choose to expose different levels of information in your HTML source. If you have more than a moderate amount of content, you’ll probably want to just expose the top level links as in the first example and let the users of your data dig deeper. You’ll also note that you can (and should) attach a license to your data.

A number of namespaces are recognised as well as a WordPress vocabulary.


$triplify['namespaces']=array(
'vocabulary'=>'http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/',
'rdf'=>'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs'=>'http://www.w3.org/2000/01/rdf-schema#',
'owl'=>'http://www.w3.org/2002/07/owl#',
'foaf'=>'http://xmlns.com/foaf/0.1/',
'sioc'=>'http://rdfs.org/sioc/ns#',
'sioctypes'=>'http://rdfs.org/sioc/types#',
'dc'=>'http://purl.org/dc/elements/1.1/',
'dcterms'=>'http://purl.org/dc/terms/',
'skos'=>'http://www.w3.org/2004/02/skos/core#',
'tag'=>'http://www.holygoat.co.uk/owl/redwood/0.1/tags/',
'xsd'=>'http://www.w3.org/2001/XMLSchema#',
'update'=>'http://triplify.org/vocabulary/update#',
);

So, what’s the point in doing this? Well, it’s fairly trivial and if you think that structured, linked, machine-readable licensed data is a Good Thing, why not?  The Triplify website lists an number of advantages:

Such a triplification of your Web application has tremendous advantages:

  • The installations of the Web application are better found and search engines can better evaluate the content.
  • Different installations of the Web application can easily syndicate arbitrary content without the need to adopt interfaces, content representations or protocols, even when the content structures change.
  • It is possible to create custom tailored search engines targeted at a certain niche. Imagine a search engine for products, which can be queried for digital cameras with high resolution and large zoom.

Ultimately, a triplification will counteract the centralization we faced through Google, YouTube and Facebook and lead to an increased democratization of the Web

The vision of the semantic web and semantic publishing is one of meaningfully identifying objects (and people) on the Internet and showing their relationships. This should improve searches for things on the web, but also improve how we exchange knowledge, re-use information and help clarify our identity on the web, too. It’s an ambitious task, but made easier with tools like Triplify.  The semantic web also raises questions over individual privacy and, if data is well formed and accessible, it may be easier to control and therefore censor. The creator of Triplify recently gave a technical presentation on Triplify and how it is being used to publish data collected by the OpenStreetMap project. It shows how geodata exposed in this way can result in mashup applications that directly benefit you and me.