Pencils and Pixels: Publishing OERs using WordPress (and EPrints)

I’ve spent much of today trying to finish off a site for anyone interested in learning how to sketch.  Pam Locker, a Principle Teaching Fellow and lecturer on the Design for Exhibitions and Museums degree, had produced (with her colleagues) ten, good quality videos on learning how to sketch and wanted to make them widely available. We discussed Creative Commons licensing and how we might publish them on the web and, well, this is what I’ve ended up with… For me, it’s a kind of rehearsal for the HEA/JISC-funded Chemistry.fm project.

[As I write, I’ve still got to upload a couple of videos and podcasts, but it’s complete enough to discuss here].

Pencils and Pixels

http://pencilsandpixels.blogs.lincoln.ac.uk/

All the original materials are in our Institutional Repository (IR). <– that’s the direct link which you might want to have a squint at. Although the IR transcodes video for preview on the web, we weren’t satisfied with the quality of the web versions for display on the WordPress site. Due to the nature of the content, it’s important to be able to see considerable detail on the screen and so we looked around for third-party hosting that would accept 50 mins+ videos at 2Mbps each. I ended up opting for a Vimeo Pro account.

So, at this point, we have all the original source materials in the IR, as well as copies of the high quality videos on Vimeo which we could use to embed on a WordPress site. From the IR, you can preview and download the videos as well as download all the accompanying support materials as a .zip file.

The WordPress site consists of fourteen pages. One Overview page, is a parent page for all ten videos and their related resources. It’s basically a map of resources linking to the individual files in the IR. There’s also a License page and a page with information on how to Download all the OERs. In addition to the license page, I’ve added the CC license code to the theme footer. This could have been achieved with a plugin but I just hard coded it in the theme. The site also has the Dublin Core for WordPress plugin (actually, it’s activated across all blogs on our WPMU platform – hell, why not!) and the OAI-ORE plugin for WordPress is activated, too. I wrote about these plugins recently and following a comment from Andy Powell, updated the DC4WP plugin to reflect current standards.

Because much of the content is constructed using WordPress pages and not posts, I also used the RSS includes pages plugin, which does exactly that. Then, I adjusted the publish dates of all the pages so that they are included in the RSS feed in the right order.  Actually, ‘right order’ is questionable here. I’ve opted to publish them backwards so that when someone subscribes to the feed they get the course dumped in their reader in one go, with the first video first. By default, they would be in reverse order and I guess Tony Hirst would say that’s a better way of doing it and I should use his daily feed plugin to drip feed the OERs. Ack! I dunno. Personally, I like the way you can click on the feed and get the entire resource ordered correctly in Google Reader 🙂 Go on, tell me what I should do…

OERs in Google Reader

In addition to the embedded video, I also created versions of the videos for the iPhone/iPod Touch. I used WordPress posts for this because you can add categories to posts and from the category, you get a nice feed which can be subscribed to separately from the main feed (which also includes the video podcasts). I added a clearly visible link to the video podcasts on the front page and a link on the Download page.  The main feed and the podcast feed are both auto-discoverable.

As well as these feeds, I also created an XML/RDF block of links in the sidebar to advertise to geeks the OAI-ORE Atom resource map and other stuff. A bit pointless, but you might like to have a look.

I also thought that it might be nice to watch the videos using Boxee. The site RSS feed doesn’t really work (you can’t control the embedded video), but there’s a Vimeo app for Boxee that can be used to subscribe to any particular user’s videos. This works nicely, although if I start using the account for other, unrelated videos, it’s going to get cluttered. I guess you could run a search for ‘Pencils and Pixels’ which would only pick up the videos we want. I need to think about Boxee and other broadcasting applications a bit more. I’d be interested to know how you might have approached this project differently.

Pencils and Pixels on Boxee

UPDATE: I forgot to add (maybe because it’s just so trivial with WordPress), that the Pencils and Pixels site also has the Media RSS and WPtouch plugins activated. It will look really nice on your iPhone, Android, Blackberry and maybe other phones, too. The MediaRSS plugin should allow me to embed media in the RSS feed. I wonder whether it can improve the Podcast feed but I think that by default, it’s only set up to work on images. I’ll work on it. I’ve also used FeedBurner to enhance the podcast feed. The nice thing about this is that you can quickly prepare the feed for submitting to iTunes, which I’ve done and it should appear in the next few days. A direct download link for iTunes is here 😉 UPDATE: Here is the approved iTunes link.

iTunes

Ten reasons why you should pay attention to the geeks because actually they have something quite important to say which us non-geeky people should be listening to

Re-broadcasting Mike Ellis’ recent presentation

Scriblio, Triplify and XMPP PubSub

It occured to me this morning, as I woke from my slumber, that the work I’ve been doing recently with WordPress, could also be applied to a library catalogue using Scriblio.

Scriblio (formerly WPopac) is an award winning, free, open source CMS and OPAC with faceted searching and browsing features based on WordPress. Scriblio is a project of Plymouth State University, supported in part by the Andrew W. Mellon Foundation.

Which means that you can import your library catalogue into WordPress and the user can search for and retrieve a record for The Films of Jean-Luc Goddard. Have a look around Plymouth State’s Scriblio and you’ll get a good feel for what’s possible.

Anyway, taking Scriblio’s functionality for granted, you could easily add Triplify to the mix as I have discussed before. So with very little effort, you can convert your library catalogue to RDF N-Triples (and/or JSON). My questions to you Librarians is: knowing this is possible and fairly trivial to do, is there any value to you in exposing your OPACs in this way?

Next, as I lay listening to my daughter chat to her squeaky duck, I thought about the other stuff I’ve been looking at recently with WordPress.  Once you think of your library catalogue as a WordPress site, there’s quite a lot of fun to be had.  You could ramp up the feeds that you offer from your OPAC, use the OpenCalais API to add semantic tags, plugin some more semantic addons if you wish (autodiscovery of SIOC, FOAF, OAI-ORE data??), and, perhaps most fun of all, publish OPAC records in realtime over XMPP PubSub.

Which brings me to JISCPress, our recent #jiscri project proposal, which we may or may not get funded (what are we, a week or two away from finding out??).  In that Project, we’re proposing a WordPress MU platform for publishing and discussing JISC funding calls and project reports (among other things).  There’s a lot of cross-over between the above Scriblio ideas and JISCPress. So much so, that it’s probably no more than a days work to transform the JISCPress platform, hosted as an Amazon Machine Image, to a multi-user OPAC platform where, potentially, all UK University libraries, publish their OPACs via separate Scriblio sites.

You could then, like wordpress.com has done, publish an XMPP firehose from every catalogue over PubSub for search engines or whoever is interested in realtime data from UK university library catalogues. Alternatively, instead of the WPMU set up, each University library could maintain their own Scriblio install and publish an XMPP feed to an agreed server (though that approach seems like more hassle than is necesary if you ask me. You’re bound to have some libraries falling behind and not upgrading their sites as things develop. For less than a collective £4K/year, we could all buy into commercial support for a WPMU site from Automattic to help maintain server-side stuff).

I dunno. Maybe this is all off the wall, but the building blocks are all there. Is anyone experimenting with Scriblio in this way? Don’t tell me, a bunch of you have been doing it for years…

Getting your Triples into Talis Connected Commons

A few days ago, I wrote about adding Triplify to your web application. Specifically, I wrote about adding it to WordPress, but the same information can be applied to most web publishing platforms. Earlier this month, TALIS announced their Connected Commons platform and yesterday they announced a commercial version of their platform for the structured storage of Linked Data. Storage is all very well, but more importantly they have an API for developers, so that the data can be queried and creatively re-used or mashed up.

So this got me thinking about JISCPress, our recent JISC Rapid Innovation Programme bid, which proposes a WordPress Multi-User based platform for publishing JISC funding calls and the reports of funded projects. This is based on my experience of running WriteToReply with Tony Hirst.

Although a service for comment and discussion around documents, one of the things that interests me most about WriteToReply and, consequently the JISCPress proposal, is the cumulative storage of data on the platform and how that data might be used. No surprise really as my background is in archiving and collections management. As with the University of Lincoln blogs, WriteToReply and the proposed JISCPress platform, aggregate published content into a site-wide ‘tags’ site that allows anyone to search and browse through all content that has been published to the public. In the case of the university blogs, that’s a large percentage of blogs, but for WriteToReply and JISCPress, it would be pretty much every document hosted on the platform.

You can see from the WriteToReply tags site that over time, a rich store of public documents could be created for querying and re-use. The site design is a bit clunky right now but under the hood you’ll notice that you can search across the text of every document, browse by document type and by tag. The tags are created by publishing the content to OpenCalais, which returns a whole bunch of semantic keywords for each document section. You’ll also notice that an RSS feed is available for any search query, any category and any tag or combination of tags.

Last night, I was thinking about the WriteToReply site architecture (note that when I mention WriteToReply, it almost certainly applies to JISCPress, too – same technology, similar principles, different content). Currently, we categorise each document by document type so you’ll see ‘Consultations‘, ‘Action Plans‘ ‘Discussion Papers‘, etc.. We author all documents under the WriteToReply username, too and tag each document section both manually and via OpenCalais. However, there’s more that we could do, with little effort, to mark up the documents and I’ve started sketching it out.

You’ll see from the diagram that I’m thinking we should introduce location and subject categories. There will be formal classification schemes we could use. For example, I found a Local Government Classification Scheme, which provides some high level subjects that are the type of thing I’m thinking about. I’m not suggesting we start ‘cataloguing’ the documents, but simply borrow, at the top level, from recognised classification schemes that are used elsewhere. I’m also thinking that we should start creating a new author for each document and in the case of WriteToReply, the author would be the agency who issued the consultation, report, or whatever.

So following these changes, we would capture the following data (in bold), for example:

The Home Office created Protecting the public in a changing communications environment on April 27th which is a consultation document for England, Wales and Scotland, categorised under Information and communication technology with 18 sections.

Section one is tagged Governor, Home Department, Office of Public Sector Information, Secretary of State, Surrey.

Section two is tagged communications data, communications industry, emergency services, Home Secretary, Jacqui Smith MP, Rt Hon Jacqui Smith MP.

Section three is tagged Broadband, BT, communications, communications changes, communications data, communications data capability, communications data limits, communications environment, communications event, communications industry, communications networks, communications providers, communications service providers, communications services, emergency services, Her Majesty’s Revenue and Customs, Home Office, intelligence agencies, internet browsing, Internet Protocol, Internet Service, IP, mobile telephone system, physical networks, public telecommunications service, registered owner, Serious Organised Crime Agency, social networking, specified communications data, The communications industry, United Kingdom.

Section four is tagged …(you get the picture)

Section five, paragraph six, has the comment “fully compatible with the ECHR” is, of course, an assertion made by the government, about its own legislation. Has that assertion ever been tested in a court? authored by Owen Blacker on April 28th 11:32pm.

Selected text from Section five, paragraph eight, has the comment Over my dead body! authored by Mr Angry on April 28th 9:32pm

Note that every author, document, section, paragraph, text selection, category, tag, comment and comment author has a URI, Atom, RSS and RDF end point (actually, text selection and comment author feeds are forthcoming features).

Now, with this basic architecture mapped out, we might wonder what Triplify could add to this. I’ve already shown in my earlier post that, with little effort, it re-publishes data from a relational database as N-Triples semantic data, so everything you see above, could be published as RDF data (and JSON, too).

So, in my simple view of the world, we have a data source that requires very little effort to generate content for and manage (JISCPress/WriteToReply/WordPress), a method of automatically publishing the data for the semantic web (Triplify) and, with TALIS, an API for data storage, data access, query, and augmentation.  As always, my mantra is ‘I am not a developer’, but from where I’m standing, this high-level ‘workflow’ seems reasonable.

The benefits for the JISC community would primarily be felt by using the JISCPress website, in a similar way (albeit with better, more informed design) to the WriteToReply ‘tags’ site. We could search across the full text of funding calls, browse the reports by author, categories and tags and grab news feeds from favourite authors, searches, tags or categories. This is all in addition to the comment, feedback and discussion features we’ve proposed, too. Further benefits would be had from ‘re-publishing’ the site content as semantic data to a platform such as TALIS. Not only could there be further Rapid Innovation projects which worked on this data, but it would be available for any member of the public to query and re-use, too. No longer would our final project reports, often the distillation of our research, sit idle as PDF files on institutional websites and in institutional repositories. If the documentation we produce it worth anything, then it’s worth re-publishing openly as semantic data.

Finally, in order to benefit from the (free) use of TALIS Connected Commons, the data being published needs to be licensed under a public domain or Creative Commons ‘zero’ licence. I suspect Crown Copyright is not compatible with either of these licenses, although why the hell public consultation documents couldn’t be licensed this way, I don’t know. Do you? For JISCPress, this would be a choice JISC could make. The alternative is to use the commercial TALIS platform or something similar.

As usual, tell me what you think… Thanks.

Triplify: Make your blog mashable

Last week, I wrote about how it is relatively simple to ‘pimp your ride on the semantic web‘. Over the weekend, I stumbled upon Triplify, a small ‘plugin’ for pretty much any web publishing platform, that “reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.” What is so appealing about Triplify is how easy it is to implement, especially alongside a WordPress site.

I can confirm that the three-step installation process is all it takes, although I wouldn’t undertake implementing this blindly as you are, literally, exposing a semantic representation of your database content. In other words, you should look at the configuration file you’re using and check that it’s going to expose the right data and not clear text passwords and unpublished posts and comments. Before I  implemented it, I realised that it would expose comments on a bunch of posts that I have since made private (they were imported from an old, private blog), so I had to ‘unapprove’ those comments so the script didn’t expose them to the public. A five minute job. Alternatively, the script could probably be modified to work around my problem, by only exposing comments after a certain date, for example.

The end result is that, with a WordPress site, you expose a semantic representation of your users, posts, pages, tags, categories, comments and attachments in RDF (N-Triples) and JSON formatted data (for JSON, just add ‘?t-output=json’ to the end of the URI). Like I said though, it could be used on any database driven web application. Here’s what you get when you expose the high level links to your content:


&lt;http://blog.josswinn.org/triplify/&gt; &lt;http://www.w3.org/2000/01/rdf-schema#comment&gt; "Generated by Triplify V0.5 (http://Triplify.org)" .
&lt;http://blog.josswinn.org/triplify/&gt; &lt;http://creativecommons.org/ns#license&gt; &lt;http://creativecommons.org/licenses/by/2.0/uk/&gt; .
&lt;http://blog.josswinn.org/triplify/post&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/attachment&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/tag&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/category&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/user&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/comment&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .

Here’s an example of what you get when you expose the full content:


&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://rdfs.org/sioc/ns#Post&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://rdfs.org/sioc/ns#has_creator&gt; &lt;http://blog.josswinn.org/triplify/user/1&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://purl.org/dc/terms/created&gt; "2008-10-06T05:55:25"^^&lt;http://www.w3.org/2001/XMLSchema#dateTime&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://rdfs.org/sioc/ns#content&gt; "Up early to go to Sheffield for LPI exams. The last week has left me underprepared. Never mind." .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://purl.org/dc/terms/modified&gt; "2008-10-06T20:12:15"^^&lt;http://www.w3.org/2001/XMLSchema#dateTime&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/27&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/41&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/42&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/belongsToCategory&gt; &lt;http://blog.josswinn.org/triplify/category/22&gt; .

...

&lt;http://blog.josswinn.org/triplify/tag/154&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/Tag&gt; .
&lt;http://blog.josswinn.org/triplify/tag/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/tagName&gt; "valentine" .

You can choose to expose different levels of information in your HTML source. If you have more than a moderate amount of content, you’ll probably want to just expose the top level links as in the first example and let the users of your data dig deeper. You’ll also note that you can (and should) attach a license to your data.

A number of namespaces are recognised as well as a WordPress vocabulary.


$triplify['namespaces']=array(
'vocabulary'=&gt;'http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/',
'rdf'=&gt;'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs'=&gt;'http://www.w3.org/2000/01/rdf-schema#',
'owl'=&gt;'http://www.w3.org/2002/07/owl#',
'foaf'=&gt;'http://xmlns.com/foaf/0.1/',
'sioc'=&gt;'http://rdfs.org/sioc/ns#',
'sioctypes'=&gt;'http://rdfs.org/sioc/types#',
'dc'=&gt;'http://purl.org/dc/elements/1.1/',
'dcterms'=&gt;'http://purl.org/dc/terms/',
'skos'=&gt;'http://www.w3.org/2004/02/skos/core#',
'tag'=&gt;'http://www.holygoat.co.uk/owl/redwood/0.1/tags/',
'xsd'=&gt;'http://www.w3.org/2001/XMLSchema#',
'update'=&gt;'http://triplify.org/vocabulary/update#',
);

So, what’s the point in doing this? Well, it’s fairly trivial and if you think that structured, linked, machine-readable licensed data is a Good Thing, why not?  The Triplify website lists an number of advantages:

Such a triplification of your Web application has tremendous advantages:

  • The installations of the Web application are better found and search engines can better evaluate the content.
  • Different installations of the Web application can easily syndicate arbitrary content without the need to adopt interfaces, content representations or protocols, even when the content structures change.
  • It is possible to create custom tailored search engines targeted at a certain niche. Imagine a search engine for products, which can be queried for digital cameras with high resolution and large zoom.

Ultimately, a triplification will counteract the centralization we faced through Google, YouTube and Facebook and lead to an increased democratization of the Web

The vision of the semantic web and semantic publishing is one of meaningfully identifying objects (and people) on the Internet and showing their relationships. This should improve searches for things on the web, but also improve how we exchange knowledge, re-use information and help clarify our identity on the web, too. It’s an ambitious task, but made easier with tools like Triplify.  The semantic web also raises questions over individual privacy and, if data is well formed and accessible, it may be easier to control and therefore censor. The creator of Triplify recently gave a technical presentation on Triplify and how it is being used to publish data collected by the OpenStreetMap project. It shows how geodata exposed in this way can result in mashup applications that directly benefit you and me.

Pimping your ride on the semantic web

Yesterday, I wrote about how I’d marked up my home page to create a semantic profile of myself that is both auto-discoverable and portable. A place where my identity on the web can be aggregated; not a hole I’ve dug for myself, but an identity that reaches out across the web but always leads back home.

While I enjoy polishing my text editor regularly and hand-crafting beautifully formed, structured data, we all know it’s a fool’s game and that the semantic web is about machines doing all the work for us. So here’s a quick and dirty run down of how to pimp your ride on the semantic web with WordPress and a few plugins.

You’ll need a self-hosted WordPress site that allows you to install plugins. I’ve got one on Dreamhost that costs me $6 a month. Next, you’ll want to install some plugins. I’ll explain what they do afterwards. One thing to note here is that I’m using plugins from the official plugin repository whenever possible. It means that you can install them from the WordPress Dashboard and you’ll get automatic updates (and they’re all GPL compatible). In no particular order…

I think that’s quite enough. All but the SIOC plugin are available from the official WordPress plugin repository. Here’s what they provide:

APML: Attention Profile Markup Language

APML (Attention Profiling Mark-up Language) is an XML-based format for capturing a person’s interests and dislikes. APML allows people to share their own personal attention profile in much the same way that OPML allows the exchange of reading lists between news readers.

The plugin creates an XML file like this one that marks up and weighs your WordPress tags as a measure of your interests. It also lists your blogroll/links and any embedded feeds.

Extended Profile

This plugin adds additional fields in your user profile which is encoded with hCard semantic microformat markup and can then be displayed in a page or as a sidebar widget. You can import hCard data, too. There might also be another use for this, too. (see below)

Micro Anywhere

Provides a couple of additional editor functions that allow you to create an hCard or hCalendar events page. Here’s an example.

OpenID

This plugin allows users to login to their local WordPress account using an OpenID, as well as enabling commenters to leave authenticated comments with OpenID. The plugin also includes an OpenID provider, enabling users to login to OpenID-enabled sites using their own personal WordPress account. XRDS-Simple is required for the OpenID Provider and some features of the OpenID Consumer.

This is key to your identity. You can use your blog URL as your OpenID or delegate a third-party service, such as MyOpenID or ClaimID. In fact, you’ve almost certainly got an OpenID already if you have a Yahoo!, Google, MySpace or AIM account. It’s up to you which one you choose to use as your persistent ID. Read more about OpenID here. It’s important and so are the issues it addresses.

XRDS-Simple

This is required to add further functionality to the OpenID plugin. It adds Attribute Exchange (AX) to your OpenID which basically means that certain profile information can be passed to third-party services (less form filling for you!) Like a lot of these plugins, install it and forget about it.

SIOC

Provides auto-discoverable SIOC metadata. “A SIOC profile describes the structure and contents of a weblog in a machine readable form.”

wp-RDFa

Provides an auto-discoverable FOAF (Friend of a Friend) profile, based on the members of your blog. I’ve been in touch with the author of this plugin and suggested that the extended profile information could also be pulled into the FOAF profile. This is largely dependent on the FOAF specification being finalised, but expect this plugin to do more as FOAF develops.

OAI-ORE Map

Provides an auto-discoverable OAI-ORE resource map of your blog. It conforms to version 0.9 of the specification, which recently made it to v1.0, so I imagine it will be updated in the near future. OAI-ORE metadata describes aggregated resources, so instead of seeing your blog post permalink as the single identifier for, say, a collection of text and multimedia, it creates a map of those resources and links them.

LinkedIn hResume

LinkedIn hResume for WordPress grabs the hResume microformat block from your LinkedIn public profile page allowing you to add it to any WordPress page and apply your own styles to it.

I like this plugin because you benefit from all the features of LinkedIn, but can bring your profile home. Ideal for students or anyone who wants to create a portfolio of work and offer their resume/CV on a single site. Depending on the theme you use, it does require some additional styling.

Get_OPML

This is a nice way to create an OPML file of your sidebar links. If, like on my personal blog, your links point to resources related to you, you can easily create an OPML file like this one. There’s a couple of things to note about this plugin though. The instructions mention a Technorati API key. I didn’t bother with this. When you create your links, just scroll down the page to the ‘advanced’ section and add the RSS feed there. Secondly, the plugin author has, for some stupid reason, hard-coded the feed to their own site into the plugin. Assuming you don’t want this spamming your personal OPML file, download a modified version from here or comment out line 101 in get-opml.php. I guess the plugin author thinks that you’ll be using this to import the OPML into a feed reader and from there, you can delete his feed. That’s no good to us though. Finally, you’ll want to make your OPML file auto-discoverable. You can do this by adding a line of html in your header, using the Header-Footer plugin below.

Header-Footer

This simply allows you to add code to the header and footer of your blog. In our case, you can use it to add an auto-discovery link to the header of every page of your blog.


<link rel="outline" type="text/xml+opml" title="ADD YOUR TITLE HERE" href="http://YOUR_BLOG_ADDRESS/opml.xml" />

WP Calais * + tagaroo

These three plugins use the OpenCalais API to examine your blog posts and return a bunch of semantic tags. I’ve written about this in more detail here (towards the end).

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

It’s an easy way to add relevant tags to your content and broadcast your content for indexing by OpenCalais. They place an additional link in your header that lists the tags for web crawlers and, I guess, improves the SEO for your site.

Extra Feed Links

I’ve written about this plugin previously, too. It adds additional autodiscovery links to your blog for author, category and tag feeds. WordPress feed functionality is very powerful and this plugin makes it especially easy to make those feeds visible.

Lifestream

This isn’t a semantic web plugin, but is a powerful way of aggregating all of your activity across the web into a single activity stream. See my example, here. It also produces a single RSS feed from your aggregated activity. Nice 😉

Wrapping things up

If you set all of this up, you’ll have a WordPress site that can act as your primary identity across the web, aggregates much of your activity on the web into a single site and also offers multiple ways for people to discover and read your site. You also get a ‘well-formed’ portfolio that is enriched with semantic markup and links you to the wider online community in a way that you control.

Bear in mind that some of these plugins might not appear to do anything at all. The semantic web is about machines being able to read and link data, right? If you look closely in the source of your home page, you’ll see a few lines that speak volumes about you in machine talk.


<link rel="meta" href="./wp-content/plugins/wp-rdfa/foaf.php"type="application/rdf+xml" title="FOAF"/>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<link rel="meta" type="text/xml" title="APML" href="http://blog.josswinn.org/apml/" />
<link rel="alternate" type="application/rss+xml" title="NoteStream RSS Feed" href="http://blog.josswinn.org/feed/" />
<link rel="resourcemap" type="application/atom+xml" href="http://blog.josswinn.org/wp-content/plugins/oai-ore/rem.php"/>

If you do want a way to view the data, I recommend the following Firefox add-ons

Operator: Auto-discovers any embedded microformats and provides useful ways to search for similar data via third-party services elsewhere on the web.

OPML Reader: Auto-discovers an OPML file if you have one linked in your header. Allows you to either download the file or read it on Grazr.

Semantic Radar: Auto-discovers embedded RDF data. Displays custom icons to indicate the presence of FOAF, SIOC, DOAP and RDFa formats.

The Tabulator Extension: Auto-discovers and provides a table-based display for RDF data on the Semantic Web. Makes RDF data readable to the average person and shows how data are linked together across different sites.

As always, please let me know how this overview could be improved or if you know of other ways to add semantic functionality to your WordPress blog. Thanks.

Addicted to feeds

I’ve been a long time consumer of news feeds and spend a lot of time reading the web via 200+ feeds in Google Reader. More recently, and largely as a result of working on WriteToReply, I’ve become just as addicted to publishing feeds from any data end point I can find.

WordPress makes this quite easy for developers, providing a whole load of functions and template tags for feeds. For the rest of us, there’s also documentation which is useful is you’re wondering what kinds of feeds can be generated from a basic WordPress site.

All the examples below, assume you’re using ‘pretty URLs’. If your URLs are something like http://example.com/?p=123 then the same principles apply but you’ll use the format /?feed=feed_type i.e. http://example.com/?feed=rss2 The documentation shows full examples.

So, here are the basic content feeds. RSS is RSS version 0.92 and RDF is RSS version 1.0, if you were wondering.

http://example.com/feed/
http://example.com/feed/rss/
http://example.com/feed/rss2/
http://example.com/feed/rdf/
http://example.com/feed/atom/

It’s also pretty straightforward to create a feed from a category or tag

http://example.com/category/my_category/feed/
http://example.com/tag/my_tag/feed/

You can also create feeds from combined tags

http://example.com/tag/tag1+tag2+tag3/feed/

And we know that a feed is available for site comments

http://example.com/comments/feed/

and it’s simple to grab a feed of comments from a single post by appending /feed/ to the end of the post permalink.

http://example.com/2009/01/01/my-latest-post/feed

You can also create a feed of a single post itself, by appending '?withoutcomments=1' to the end of the URL

http://example.com/2009/01/01/my-latest-post/feed/?withoutcomments=1

There is a feed for each author of the blog

http://example.com/author/joss/feed

but alas, as far as I know, no feed for the comments by any particular person.

You can also do something fancy with dates

http://example.com/2009/feed
http://example.com/2009/01/feed
http://example.com/2009/01/15/feed

and one of my favourite types of feed is from a search

http://example.com/?s=search_term&feed=rss2

Now all of this is well and good, but how many readers are going to know or care about constructing the various types of feeds available? Fortunately, it’s possible to make many of these feeds auto-discoverable either by adding some simple code to your theme’s header.php or installing a plugin.

By default, two feeds are auto-discoverable on your WordPress site: An atom and rss2 feed of your posts.

By using the Extra Feed Links plugin, you can make your comments, category, tag, author and search feeds autodiscoverable.

It’s also got a useful template tag that allows you to show the feed links in your theme, making the discovery of feeds even easier.  I created a simple widget for the plugin to display the feed and an RSS icon in the sidebar

Here’s the code. Let me know where it could be improved as I just hacked it together from looking at other widgets.

<?php
function widget_extrafeeds_register() {
function widget_extrafeeds($args) {
extract($args);
?>
<br />
<?php echo $before_widget;
echo $before_title;
echo $widget_name;
echo $after_title; ?>
<ul class="sidebarList">
<?php extra_feed_link(); ?> <?php extra_feed_link('http://path/to/your/feed/icon/feed.png'); ?>
</ul>
<?php echo $after_widget; ?>
<?php
}
register_sidebar_widget('Extra Feeds',
'widget_extrafeeds');
register_sidebar_widget('Extra Feeds','widget_extrafeeds');}
add_action('init', widget_extrafeeds_register);
?>

To get this to work with the plugin, you need to add this to the very bottom of the plugin’s main.php file

// widget support
require(dirname(__FILE__) . '/widget.php');

Like I said, if anyone can improve on this, do let me know. Also note that you’ll need to point the URL in the widget to a feed icon. A lot of themes include them in their /images/ directory, which makes it easy.

By using the widget or template tag, you can have these appearing on the relevant pages.

Try it by using http://writetoreply.org/tags 🙂

If you’re interested in how to add category and tag auto-discovery feeds to your theme’s source code, try adding this to your header.php

<?php if (is_category()) { ?>
<link rel="alternate" type="application/atom+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> Atom Feed" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'atom'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/atom+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> Atom Feed" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'atom'); ?>" />
<?php } ?>

<?php if (is_category()) { ?>
<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> RSS2 Feed" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'rss2'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> RSS2 Feed" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'rss2'); ?>" />
<?php } ?>

<?php if (is_category()) { ?>
<link rel="alternate" type="application/rdf+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> as RDF data" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'rdf'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/rdf+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> as RDF data" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'rdf'); ?>" />
<?php } ?>

I learned this from the author of this related plugin, which is similar but not quite as powerful as the Extra Feed Links plugin.

Finally, if you use FeedBurner, beware that it breaks some of the above feeds. One fix for ensuring that tag and category feeds continue to work as they should is to modify the FeedBurnerFeedSmith plugin as noted here. Simply change the line

is_feed() &amp;&amp; $feed != 'comments-rss2' &amp;&amp; !is_single() &amp;&amp;

to read

is_feed() &amp;&amp; $feed != 'comments-rss2' &amp;&amp; !is_single() &amp;&amp; !is_tag() &amp;&amp;

That’ll do for now. I intend to learn more about the RSS and Atom specifications over the next few weeks and will post anything I think relevant here. If you can add anything to this post, please do leave a comment. Thanks.

Open Calais + site-wide tags = semantic site architecture

Preamble about people

Over the last month, we’ve I’ve started to grow an embryonic social web publishing platform that can be many things but fundamentally offers a personalised and collaborative environment for research, teaching and learning. (Where? You’re looking at it!). There are a few active blogs (currently fewer than on the pilot Learning Lab blogs), nearly 70 users and the word is starting to get out at a pace that I can manage. So, now it’s time to look to the future…

By running BuddyPress, the connections between people are pretty much taken care of. Sign in to http://blogs.lincoln.ac.uk with a Lincoln username and password and you’ve joined a community that, as it grows, will increasingly and effortlessly connect people through the information they choose to add to their profile. Staff and students can click on a link and find other people who have similarly tagged their profile.

Notice the comma seprated hyper-linked data
Notice the comma-separated hyper-linked data

What is of equal interest to me, and potentially very useful to the university community, is how we link the content that is being generated by staff and students and make those links accessible. It is not difficult to appreciate what the potential is when you have a revolving community of 10,000 people who, over time, document their work, their research, teaching and learning using cutting edge web publishing tools, but I’m writing this post to try and understand and sketch out how I might evolve what I have begun.

Put simply, WordPress Multi-User (WPMU) allows one person (me) to provide and manage multiple web sites which other people (staff and students) take ownership of. Typically, every action, every new user and every new page and post on every site, is recorded and held in a shared database(s). Although at this low level, the data is relational, on the surface, when you look at one of the sites, they pretty much stand alone and so they should. We’re not talking about a single website with lots of users, we’re talking about lots of websites with lots of users. They might be working collaboratively with others, but they’re working as individuals or in distinct groups that benefit from a distinct online identity. BuddyPress helps bring things together by aggregating people’s actions (i.e. posting blog updates, making friends, joining groups, posting messages) but the visibility of those connections is transient. Social networks display our actions along a timeline and the connections between people are, for the most part, buried until the next time person A interacts with person Y.

Enough about connecting people.

Site-wide content aggregation

Site content is a mixture of text, multimedia and metadata. The last thing I’ll do when completing this blog post is to categorise and tag it. Each time I write, I publish text, (sometimes images) and metadata which summarises and categorises the full text. Why am I telling you this? You know it already. What you may not know is that each post created on our university WPMU installation, by any person, providing their blog is public, is aggregated into a single site and re-published a second time. So this post exists here on this site and there, on the Community Posts site. Notice how the Community Posts version links back to the original post. We’re not creating a whole new resource, we’re creating a powerful linked resource that allows others to search, filter, browse and discover content held across multiple sites. With only a few sites up and running here at the moment, the opportunity to discover varied content is limited, but over time that will change. Look at wordpress.com, where there are 5 million sites:

Browse by user-generated metadata

Search over 5 million sites
Search over 5 million sites

On the university blogs, this is made possible through the use of the site-wide-tags plugin, which was developed by @donncha, the same person that develops WPMU and the wordpress.com site. By using this plugin, a WPMU installation can share similar functionality to what you see on wordpress.com. I say ‘similar’ because, as I’ll mention later, designing how people discover content is key to all of this and something I, or we as a community, would benefit from thinking about and acting on collectively.

Community Posts
Community Posts

On the Community Posts site, you can search the full-text of every post, filter resources by category and tag, and subscribe to feeds from any combination of tag or category. Any search can be turned into a feed by appending ‘&feed=rss’ to the end of the resulting URL.

i.e. http://tags.blogs.lincoln.ac.uk/?s=gaming&feed=rss

To create a feed from a tag or category, just click on a tag or category and append ‘/feed’ to the end of the URL.

i.e. http://tags.blogs.lincoln.ac.uk/tag/games/

You can combine tags with ‘+’, too:

http://tags.blogs.lincoln.ac.uk/tag/games+development/

You can also specify the type of feed you want by appending:

/feed/rss/
/feed/rss2/
/feed/rdf/
/feed/atom/

Mixing categories and tags is currently broken by a bug but is due to be fixed in the next version of WordPress.

So it’s not difficult to imagine, over time, an active community of thousands of university web publishers, having their content aggregated into a site-wide resource that allows full text searching, browsing and filtering with a choice of feeds to syndicate that content elsewhere. See how it’s happening at the University of Mary Washington, where over 2400 sites have been created in under three years.

Semantic technology

Yesterday, I discovered OpenCalais. It’s a semantic technology that’s been around since January 2008, so you might be tired of hearing about it, but if not, ‘Welcome to Web 3.0!’

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

Nice. And it’s installed on this site. There are three Calais plugins available for WordPress. This one, allows writers to submit their blog posts to the OpenCalais web service API and fetch back a number of auto-generated tags based on the content of their post. The longer the post, the more tags are returned. Tags are returned in just seconds. Those tags can be added to the post in their entirety or used selectively (actually, you have to add them all and then remove those you don’t want to include – a minor irritation). This next plugin, allows you to automatically go through every post you’ve written and tags them using the Calais web service. It’s all or nothing, but following the auto-tagging of archive content, you can then go to the ‘tags’ menu and delete any tags you don’t want to use. I’ve done that to this site and to the Community Posts site. Calais looks for names, facts and events and the API allows for up to 40,000 transactions a day and up to four per second. It returns some predictable tags and a few odd ones, but on the whole is fast and works like magic.

The third plugin also allows blog authors to fetch tags for the post they are writing and, in addition, it also suggests Creative Commons licensed images based on a dynamic evaluation of the chosen or suggested tags.

The tagaroo interface
The tagaroo interface

Image suggestion is a nice idea, but tends to return some fairly generic images.

Having used OpenCalais to auto-tag the Community Posts site, a whole new and richer set of semantic metadata has been added with barely any effort. The challenge now is to figure out how to 1) automate this as a scheduled process, so that the Calais plugin looks for new content every hour, say, and tags whatever has been recently introduced (a cron job that calls the plugin and a modification to the plugin to look at the timestamp of the post and ignore anything older than when it was last run?); 2) present the semantic data in an accessible way and this mostly, I think, comes down to appropriate site design.  The wordpress.com screenshots above show one way of doing it. A del.icio.us style approach is a more powerful and versatile model of tag filtering. Until then, it’s a matter of constructing filters, searches and feeds in the way I’ve outlined above.

So how might all of this semantically structured data be used? It seems to me that most of the advantages are proportional to the quantity of information available. For teaching and learning, it could be used by students and staff who want to find and re-use material that has been posted in the past for a specific course or subject area. Great for new students who want to measure the type and quality of work produced by students in previous years. In a similar way, it could be used by staff looking for posts by colleagues on subjects they might be teaching, and because searches and tags can be turned into feeds, past content could be aggregated into a new course site. A widely adopted, semantically tagged WPMU installation could also reveal trends in the type of work occurring at the university and, by tagging names of people, queries against references to Prof. X’s work could be made (I also wonder whether through the use of feeds, content from the institutional repository could be joined up with all of this, too – but it’s late in the day and I can’t think straight).

You’ll see from the image below that using Calais on the Community Posts site, resulted in a much richer variety of tags than would have appeared if we relied on user-generated tagging alone (136 posts now have 558 tags). Some people don’t even bother to tag their work… Shame on them! Notice too, that with the Firefox Operator plugin, you can take a tag on the site and use it to find related resources elsewhere. So if you’re looking at work tagged ‘client-applications’ on WPMU, you can conveniently hop over to delicious and find further web resources or, on a whim, look at what books on this subject are available on Amazon.

Operator provides a way to use tags on one site to discover related resources on another site
Use tags on one site to discover related resources on another site

Anyway, if you’re still reading, you might remember from the title of this post that my overriding interest in all of this is how it can be understood as and developed into a site-wide ‘architecture’. Again, I’m thinking how user-generated tags have determined the way delicious is designed for navigation and searching of resources. I need to learn more about how WordPress themes are constructed and consider how available functions can be best exploited and usefully presented on this type of site. If you have any ideas or want to work on a specific theme to get the most out of the site-wide-tags plugin, please do leave a comment or get in touch on Twitter @josswinn