Posts tagged rdf

I’ve spent much of today trying to finish off a site for anyone interested in learning how to sketch.  Pam Locker, a Principle Teaching Fellow and lecturer on the Design for Exhibitions and Museums degree, had produced (with her colleagues) ten, good quality videos on learning how to sketch and wanted to make them widely available. We discussed Creative Commons licensing and how we might publish them on the web and, well, this is what I’ve ended up with… For me, it’s a kind of rehearsal for the HEA/JISC-funded Chemistry.fm project.

[As I write, I've still got to upload a couple of videos and podcasts, but it's complete enough to discuss here].

Pencils and Pixels

http://pencilsandpixels.blogs.lincoln.ac.uk/

All the original materials are in our Institutional Repository (IR). <– that’s the direct link which you might want to have a squint at. Although the IR transcodes video for preview on the web, we weren’t satisfied with the quality of the web versions for display on the WordPress site. Due to the nature of the content, it’s important to be able to see considerable detail on the screen and so we looked around for third-party hosting that would accept 50 mins+ videos at 2Mbps each. I ended up opting for a Vimeo Pro account.

So, at this point, we have all the original source materials in the IR, as well as copies of the high quality videos on Vimeo which we could use to embed on a WordPress site. From the IR, you can preview and download the videos as well as download all the accompanying support materials as a .zip file.

The WordPress site consists of fourteen pages. One Overview page, is a parent page for all ten videos and their related resources. It’s basically a map of resources linking to the individual files in the IR. There’s also a License page and a page with information on how to Download all the OERs. In addition to the license page, I’ve added the CC license code to the theme footer. This could have been achieved with a plugin but I just hard coded it in the theme. The site also has the Dublin Core for WordPress plugin (actually, it’s activated across all blogs on our WPMU platform – hell, why not!) and the OAI-ORE plugin for WordPress is activated, too. I wrote about these plugins recently and following a comment from Andy Powell, updated the DC4WP plugin to reflect current standards.

Because much of the content is constructed using WordPress pages and not posts, I also used the RSS includes pages plugin, which does exactly that. Then, I adjusted the publish dates of all the pages so that they are included in the RSS feed in the right order.  Actually, ‘right order’ is questionable here. I’ve opted to publish them backwards so that when someone subscribes to the feed they get the course dumped in their reader in one go, with the first video first. By default, they would be in reverse order and I guess Tony Hirst would say that’s a better way of doing it and I should use his daily feed plugin to drip feed the OERs. Ack! I dunno. Personally, I like the way you can click on the feed and get the entire resource ordered correctly in Google Reader :-) Go on, tell me what I should do…

OERs in Google Reader

In addition to the embedded video, I also created versions of the videos for the iPhone/iPod Touch. I used WordPress posts for this because you can add categories to posts and from the category, you get a nice feed which can be subscribed to separately from the main feed (which also includes the video podcasts). I added a clearly visible link to the video podcasts on the front page and a link on the Download page.  The main feed and the podcast feed are both auto-discoverable.

As well as these feeds, I also created an XML/RDF block of links in the sidebar to advertise to geeks the OAI-ORE Atom resource map and other stuff. A bit pointless, but you might like to have a look.

I also thought that it might be nice to watch the videos using Boxee. The site RSS feed doesn’t really work (you can’t control the embedded video), but there’s a Vimeo app for Boxee that can be used to subscribe to any particular user’s videos. This works nicely, although if I start using the account for other, unrelated videos, it’s going to get cluttered. I guess you could run a search for ‘Pencils and Pixels’ which would only pick up the videos we want. I need to think about Boxee and other broadcasting applications a bit more. I’d be interested to know how you might have approached this project differently.

Pencils and Pixels on Boxee

UPDATE: I forgot to add (maybe because it’s just so trivial with WordPress), that the Pencils and Pixels site also has the Media RSS and WPtouch plugins activated. It will look really nice on your iPhone, Android, Blackberry and maybe other phones, too. The MediaRSS plugin should allow me to embed media in the RSS feed. I wonder whether it can improve the Podcast feed but I think that by default, it’s only set up to work on images. I’ll work on it. I’ve also used FeedBurner to enhance the podcast feed. The nice thing about this is that you can quickly prepare the feed for submitting to iTunes, which I’ve done and it should appear in the next few days. A direct download link for iTunes is here ;-) UPDATE: Here is the approved iTunes link.

iTunes

Re-broadcasting Mike Ellis’ recent presentation

It occured to me this morning, as I woke from my slumber, that the work I’ve been doing recently with WordPress, could also be applied to a library catalogue using Scriblio.

Scriblio (formerly WPopac) is an award winning, free, open source CMS and OPAC with faceted searching and browsing features based on WordPress. Scriblio is a project of Plymouth State University, supported in part by the Andrew W. Mellon Foundation.

Which means that you can import your library catalogue into WordPress and the user can search for and retrieve a record for The Films of Jean-Luc Goddard. Have a look around Plymouth State’s Scriblio and you’ll get a good feel for what’s possible.

Anyway, taking Scriblio’s functionality for granted, you could easily add Triplify to the mix as I have discussed before. So with very little effort, you can convert your library catalogue to RDF N-Triples (and/or JSON). My questions to you Librarians is: knowing this is possible and fairly trivial to do, is there any value to you in exposing your OPACs in this way?

Next, as I lay listening to my daughter chat to her squeaky duck, I thought about the other stuff I’ve been looking at recently with WordPress.  Once you think of your library catalogue as a WordPress site, there’s quite a lot of fun to be had.  You could ramp up the feeds that you offer from your OPAC, use the OpenCalais API to add semantic tags, plugin some more semantic addons if you wish (autodiscovery of SIOC, FOAF, OAI-ORE data??), and, perhaps most fun of all, publish OPAC records in realtime over XMPP PubSub.

Which brings me to JISCPress, our recent #jiscri project proposal, which we may or may not get funded (what are we, a week or two away from finding out??).  In that Project, we’re proposing a WordPress MU platform for publishing and discussing JISC funding calls and project reports (among other things).  There’s a lot of cross-over between the above Scriblio ideas and JISCPress. So much so, that it’s probably no more than a days work to transform the JISCPress platform, hosted as an Amazon Machine Image, to a multi-user OPAC platform where, potentially, all UK University libraries, publish their OPACs via separate Scriblio sites.

You could then, like wordpress.com has done, publish an XMPP firehose from every catalogue over PubSub for search engines or whoever is interested in realtime data from UK university library catalogues. Alternatively, instead of the WPMU set up, each University library could maintain their own Scriblio install and publish an XMPP feed to an agreed server (though that approach seems like more hassle than is necesary if you ask me. You’re bound to have some libraries falling behind and not upgrading their sites as things develop. For less than a collective £4K/year, we could all buy into commercial support for a WPMU site from Automattic to help maintain server-side stuff).

I dunno. Maybe this is all off the wall, but the building blocks are all there. Is anyone experimenting with Scriblio in this way? Don’t tell me, a bunch of you have been doing it for years…

A few days ago, I wrote about adding Triplify to your web application. Specifically, I wrote about adding it to WordPress, but the same information can be applied to most web publishing platforms. Earlier this month, TALIS announced their Connected Commons platform and yesterday they announced a commercial version of their platform for the structured storage of Linked Data. Storage is all very well, but more importantly they have an API for developers, so that the data can be queried and creatively re-used or mashed up.

So this got me thinking about JISCPress, our recent JISC Rapid Innovation Programme bid, which proposes a WordPress Multi-User based platform for publishing JISC funding calls and the reports of funded projects. This is based on my experience of running WriteToReply with Tony Hirst.

Although a service for comment and discussion around documents, one of the things that interests me most about WriteToReply and, consequently the JISCPress proposal, is the cumulative storage of data on the platform and how that data might be used. No surprise really as my background is in archiving and collections management. As with the University of Lincoln blogs, WriteToReply and the proposed JISCPress platform, aggregate published content into a site-wide ‘tags’ site that allows anyone to search and browse through all content that has been published to the public. In the case of the university blogs, that’s a large percentage of blogs, but for WriteToReply and JISCPress, it would be pretty much every document hosted on the platform.

You can see from the WriteToReply tags site that over time, a rich store of public documents could be created for querying and re-use. The site design is a bit clunky right now but under the hood you’ll notice that you can search across the text of every document, browse by document type and by tag. The tags are created by publishing the content to OpenCalais, which returns a whole bunch of semantic keywords for each document section. You’ll also notice that an RSS feed is available for any search query, any category and any tag or combination of tags.

Last night, I was thinking about the WriteToReply site architecture (note that when I mention WriteToReply, it almost certainly applies to JISCPress, too – same technology, similar principles, different content). Currently, we categorise each document by document type so you’ll see ‘Consultations‘, ‘Action Plans‘ ‘Discussion Papers‘, etc.. We author all documents under the WriteToReply username, too and tag each document section both manually and via OpenCalais. However, there’s more that we could do, with little effort, to mark up the documents and I’ve started sketching it out.

You’ll see from the diagram that I’m thinking we should introduce location and subject categories. There will be formal classification schemes we could use. For example, I found a Local Government Classification Scheme, which provides some high level subjects that are the type of thing I’m thinking about. I’m not suggesting we start ‘cataloguing’ the documents, but simply borrow, at the top level, from recognised classification schemes that are used elsewhere. I’m also thinking that we should start creating a new author for each document and in the case of WriteToReply, the author would be the agency who issued the consultation, report, or whatever.

So following these changes, we would capture the following data (in bold), for example:

The Home Office created Protecting the public in a changing communications environment on April 27th which is a consultation document for England, Wales and Scotland, categorised under Information and communication technology with 18 sections.

Section one is tagged Governor, Home Department, Office of Public Sector Information, Secretary of State, Surrey.

Section two is tagged communications data, communications industry, emergency services, Home Secretary, Jacqui Smith MP, Rt Hon Jacqui Smith MP.

Section three is tagged Broadband, BT, communications, communications changes, communications data, communications data capability, communications data limits, communications environment, communications event, communications industry, communications networks, communications providers, communications service providers, communications services, emergency services, Her Majesty’s Revenue and Customs, Home Office, intelligence agencies, internet browsing, Internet Protocol, Internet Service, IP, mobile telephone system, physical networks, public telecommunications service, registered owner, Serious Organised Crime Agency, social networking, specified communications data, The communications industry, United Kingdom.

Section four is tagged …(you get the picture)

Section five, paragraph six, has the comment “fully compatible with the ECHR” is, of course, an assertion made by the government, about its own legislation. Has that assertion ever been tested in a court? authored by Owen Blacker on April 28th 11:32pm.

Selected text from Section five, paragraph eight, has the comment Over my dead body! authored by Mr Angry on April 28th 9:32pm

Note that every author, document, section, paragraph, text selection, category, tag, comment and comment author has a URI, Atom, RSS and RDF end point (actually, text selection and comment author feeds are forthcoming features).

Now, with this basic architecture mapped out, we might wonder what Triplify could add to this. I’ve already shown in my earlier post that, with little effort, it re-publishes data from a relational database as N-Triples semantic data, so everything you see above, could be published as RDF data (and JSON, too).

So, in my simple view of the world, we have a data source that requires very little effort to generate content for and manage (JISCPress/WriteToReply/WordPress), a method of automatically publishing the data for the semantic web (Triplify) and, with TALIS, an API for data storage, data access, query, and augmentation.  As always, my mantra is ‘I am not a developer’, but from where I’m standing, this high-level ‘workflow’ seems reasonable.

The benefits for the JISC community would primarily be felt by using the JISCPress website, in a similar way (albeit with better, more informed design) to the WriteToReply ‘tags’ site. We could search across the full text of funding calls, browse the reports by author, categories and tags and grab news feeds from favourite authors, searches, tags or categories. This is all in addition to the comment, feedback and discussion features we’ve proposed, too. Further benefits would be had from ‘re-publishing’ the site content as semantic data to a platform such as TALIS. Not only could there be further Rapid Innovation projects which worked on this data, but it would be available for any member of the public to query and re-use, too. No longer would our final project reports, often the distillation of our research, sit idle as PDF files on institutional websites and in institutional repositories. If the documentation we produce it worth anything, then it’s worth re-publishing openly as semantic data.

Finally, in order to benefit from the (free) use of TALIS Connected Commons, the data being published needs to be licensed under a public domain or Creative Commons ‘zero’ licence. I suspect Crown Copyright is not compatible with either of these licenses, although why the hell public consultation documents couldn’t be licensed this way, I don’t know. Do you? For JISCPress, this would be a choice JISC could make. The alternative is to use the commercial TALIS platform or something similar.

As usual, tell me what you think… Thanks.