Posts tagged XML

In my previous job as Audiovisual Archivist, I spent a lot of time examining various metadata standards in detail; hours spent pouring over PBCore, METS, MODS, MIX, EXIF and IPTC/XMP, because we were designing a content model for an in-house Digital Asset Management system. I thought I had put it all behind me yet here I am staring at Phil Barker’s informative post about ‘metadata and resource description’ and it’s all coming back to me… Arrghhh :-)

Workpackage six of the Chemistry.fm project aims to:

  1. Plan the storage, delivery and marketing of the course.
  2. Choose a metadata standard
  3. Evaluate third-party hosting such as Flickr, Slideshare and YouTube as well as JORUM and the IR.

Ah, if only life were as simple as a series of bullet points!

As I was creating the project poster yesterday, I was reminded about the various ways that our project OERs could be ‘broadcast’. Although collaboration with our community radio station SirenFM, is core to the approach of our project, we all know that there are many ways for anyone to be a broadcaster on the web and part of the fun of this project for me, is being able to explore the different ways that educational content can be pulled and pushed between subscribing students and members of the public.

My plan at the moment is to use our Institutional Repository as the ‘canonical reference’ for the OERs. During our JISC-funded LIROLEM project, we developed EPrints to better accommodate multimedia resources and it makes sense to use a versioned digital archive that supports embedded media enriched by copious amounts of metadata. (I know it’s a requirement to use JORUM, too, but at the first Programme Meeting, it became clear that JORUM can be used simply as a directory where we can register URIs of existing OERs, so that’s what I’ll be doing).

Anyway, Archivists, have you ever feasted your eyes on the source code of an EPrint? Of course you have. Here’s a reminder.

Looking at the (draft) Metadata Guidelines for the OER Programme, you can see that the following are covered by EPrints:

  • programme tag [there is no "DC.keyword" term, so EPrints uses name="eprints.keywords"]
  • title [name="DC.title"]
  • author [name="DC.creator"]
  • date [name="DC.date"]
  • url [name="DC.identifier]
  • technical information [name="DC.format"]
  • language [hmmm, nowhere to be seen. Can we add that?]
  • subject classification [name="DC.subject"]
  • keywords/tags [there is no "DC.keyword" term, so EPrints uses name="eprints.keywords"]
  • comments [We use the SNEEP plugins but the comments are not showing in the source code - do we need to make sure they are crawlable? Some people aren't keen...]
  • description [name="DC.description"]

I’ve highlighted the Dublin Core terms above, but happily, the data is available in several other alternate formats:

<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/HTML/lirolem-eprint-1543.html" title="HTML Citation" type="text/html; charset=utf-8" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/Text/lirolem-eprint-1543.txt" title="ASCII Citation" type="text/plain; charset=utf-8" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/ContextObject/lirolem-eprint-1543.xml" title="OpenURL ContextObject" type="text/xml" />

<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/EndNote/lirolem-eprint-1543.enw" title="EndNote" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/BibTeX/lirolem-eprint-1543.bib" title="BibTeX" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/MODS/lirolem-eprint-1543.xml" title="MODS" type="text/xml" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/COinS/lirolem-eprint-1543.txt" title="OpenURL ContextObject in Span" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/DIDL/lirolem-eprint-1543.xml" title="DIDL" type="text/xml" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/XML/lirolem-eprint-1543.xml" title="EP3 XML" type="text/xml" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/JSON/lirolem-eprint-1543.js" title="JSON" type="text/javascript; charset=utf-8" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/DC/lirolem-eprint-1543.txt" title="Dublin Core" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/RIS/lirolem-eprint-1543.ris" title="Reference Manager" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/EAP/lirolem-eprint-1543.xml" title="Eprints Application Profile" type="text/xml" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/Simple/lirolem-eprint-1543.txt" title="Simple Metadata" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/Refer/lirolem-eprint-1543.refer" title="Refer" type="text/plain" />
<link rel="alternate" href="http://eprints.lincoln.ac.uk/cgi/export/1543/METS/lirolem-eprint-1543.xml" title="METS" type="text/xml" />

Now, we could choose to lump all the OERs that we create into one single EPrint, but that doesn’t give us much flexibility and remember that EPrints is serving as the canonical reference for the OERs, not necessarily the final presentation layer that people will actually be using to browse, download and use the resources from. So if we were to group the OERs into sets of items that constituted an EPrint and then relate those EPrints to each other, using the “DC.isPartOf” property, from the point of view of metadata, we’ll be creating a consistent whole, but giving ourselves some flexibility in how we ‘broadcast’ the content of the course.

EPrints DC.relation
Dublin Core relationships

If we consider the course MindMap that we knocked up a while back, we might decide to create a single EPrint for each of the five major ‘nodes’ of the course. Doing this, would then give us an RSS 1.0 (RDF), RSS 2.0 and Atom feed for the course where each node was an item.

Introductory Chemistry Mindmap
Course MindMap

Before I move on with this, look at the export formats that EPrints offers for a query. Imagine that the course could be exported in each of these ways:

EPrints export formats
Exporting from EPrints

The zip export allows you to download the entire query and all it’s resources at once. The HTML citation format allows you to produce some HTML you could copy and paste into any web page. It could just as easily be dropped into Blackboard as it could on any other (and anybody’s) web page. BibTex would allow you to browse the course via your preferred reference management software and JSON… I still don’t completely get it, but it’s pretty fancy, I know that much.

Anyway, If each of the mindmap nodes is an ‘item’ in the RSS feed, then perhaps we can use that to feed a WordPress site, using the FeedWordPress plugin? Nope. It doesn’t seem to work. FeedWordPress recognises the feed but doesn’t import anything. Testing it with another feed based on keywords does work, but the information included in the feed is sparse, so that’s no good. By the way, the EPrints RSS 2.0 feed does include the xmlns:media=”http://search.yahoo.com/mrss” namespace and marks up the preview thumbnails accordingly:


<media:thumbnail url="http://eprints.lincoln.ac.uk/1543/thumbnails/15/small.png" type="image/png"></media:thumbnail><media:content url="http://eprints.lincoln.ac.uk/1543/thumbnails/15/preview.png" type="image/png"></media:content>

(Another way to tackle this might be using our newly developed ‘EPrints2Blog’ plugin, which allows a depositor to post information about their new EPrint to a blog of their choice (using XML-RPC). As we deposit the course EPrints, each could be posted to a WordPress site. The resulting feed from the WordPress site does include some embedded media, but it’s still a bit of a hack. No, scrap this idea).

Post2Blog
Post2Blog: An XML-RPC plugin for EPrints

Podcasting from Eprints in WordPress
Podcasting from Eprints in WordPress

Right, how about this…?

Using EPrints as the canonical source for each of the files for the course, we could create a WordPress site with the addition of the Dublin Core and OAI-ORE plugins for WordPress.

For each WordPress post, this gives us the following metadata:


<meta name="DC.publisher" content="../learninglab/joss" />

<meta name="DC.publisher.url" content="http://joss.blogs.lincoln.ac.uk/" />

<meta name="DC.title" content="Thinking the unthinkable" />

<meta name="DC.identifier" content="http://joss.blogs.lincoln.ac.uk/2009/10/08/thinking-the-unthinkable/" />

<meta name="DC.date.created" scheme="WTN8601" content="2009-10-08T16:14:54" />

<meta name="DC.creator" content="Joss" />

<meta name="DC.rights.rightsHolder" content="Joss" />

<meta name="DC.subject" content="Funding" />

<meta name="DC.rights.license" content="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/" />

<link rel="alternate" type="application/rss+xml" title="Comments: Thinking the unthinkable" href="http://joss.blogs.lincoln.ac.uk/2009/10/08/thinking-the-unthinkable/feed/" />

<!-- OAI-ORE -->

<link rel="resourcemap" type="application/atom+xml" href="http://joss.blogs.lincoln.ac.uk/wp-content/plugins/oai-ore/rem.php"/>

This is more like it. Click on the oai-ore link and look at the source code. It’s too big to display here, but it does what you’d expect and produces a OAI-ORE 1.0 compliant Atom/XML file. Contained within the file is a ‘resource map’ of all the WordPress posts and pages marked up with Dublin Core and FOAF terms. Thinking about how the course site might be represented in this way, it makes sense to atomise the course even further so that each of the sub-nodes of the Mind Map is a WordPress post. Using the current course structure, that would result in about 20 separate posts to represent the course. Each post would contain one or more resources such as a PDF, video, audio, slides, etc. Is it worth atomising it even further and creating a post for each of these resources, too, I wonder? Quite possibly.

Unfortunately, the resource map does not include media that are included in each post or page – apparently it’s on the developer’s list of things to do. Maybe we could use some of the project budget to ask Alex, who’s working on the JISCPress project with me, to extend the plugin in this way…

Finally, there’s also a MediaRSS plugin for WordPress, which could enhance the RSS feeds to include all the media used in the course. Here’s an example that’s including images by default. I’ve already written about the various feeds that are available for WordPress, with some careful categorisation and tagging, media rich feeds would be available for different points (‘nodes’) of entry into the course.

Once we are at this point, I guess we’re ready to think about broadcasting the course via Boxee and DeliTV (no time to dig into that now. Sorry!)

Metadata… arrghhh!

p.s. you’ve probably noticed that I’m a bit weak on the EPrints and OAI-ORE stuff, to say the least. Please do pick me up on where I’m going wrong with this. Thanks :-)

Working on the JISCPress project, I’ve been thinking quite a lot about scholarly publishing on the web, and in particular with WordPress. This morning, I read a post over on the ArchivePress blog about some WordPress plugins which are useful additions for creating a scholarly blog and it got me thinking a bit more about what features WordPress would need to support scholarly publishing.

JISCPress does away with the idea that WordPress is a blogging tool, and instead uses WordPress Multi-User as a document publishing platform, where one site or ‘blog’ is a document. The way WPMU is structured means that despite serving multiple (potentially millions) of document sites, the platform remains relatively ‘lightweight’ as each document site generates just a handful of additional database tables, while sharing the same administrative core as a single WordPress install. So, 100 WordPress blogs on WPMU is nothing like the equivalent of running 100 separate WordPress blogs, both from the point of resource requirements and administration. In fact, quite soon, there will be no such thing as WPMU as the two products are going to be merged and because they share 90%+ of the same code already, it’s not too difficult to achieve.1

Anyway, my point here is to discuss whether WordPress can be extended to accommodate most conventions found in scholarly publishing and where it is lacking, to identify the development work required to meet the needs of most academic who wish to write on and publish to the web.2

Scholarly publishing extends to a wide variety of published outputs. As a Content Management System (CMS) and technology development platform, I believe that WordPress has the potential to support any type of scholarly publishing that the web supports. It is extremely extensible, as can be seen from the 6000+ plugins that are available. However, what I’m interested in is what can be done now, by an academic wishing to publish their work through the use of WordPress acting as a CMS. What can be achieved with a few quid3 to self-host WordPress so that a few plugins can be installed and a well structured, typical, scholarly paper can be published.

My Dissertation

For some time, I’ve been meaning to publish my MA dissertation. Back in 2002, I undertook some unique research which has not, to my knowledge, been repeated and I think there is some value in having it easily accessible on the web. I have an OpenOffice file and a PDF and, in the course of a morning, have published it under my own domain. The reason I did not publish it on the university WPMU platform is because I have been experimenting with different plugins and did not want to install plugins that were untested or we may not support long-term.  In this case, I’ve used a single WordPress installation, but ideally an individual researcher, group of researchers or research institution, would run a WPMU installation which allowed multiple documents to be authored individually or collaboratively4 and published directly to the web as XHTML.

BuddyPress, by the way, can make the experience even more natural, not only because it is based around a community of like-minded people writing together  on the same web publishing platform, but also because, with a few tweaks here and there, we can move away from the language of blogs and towards the language of documents.


BuddyPress admin bar

Profile menu

Enough of BuddyPress on WPMU for now and back to my dissertation. I set up the site in ten minutes, without using FTP or a command line because I use a host that provides a one-click install of WordPress and WordPress allows you to search for and install plugins from its Dashboard, rather than having to use FTP. Once the site was installed, I then  made some basic changes to the settings, turning on XML-RPC and AtomPub, so that, if I decided to, I could publish to the site using my Word Processor.5 I didn’t use this in the end, but trust me, it works very well using recent versions of MS Word, Open Office (free) and other blogging clients such as MS Live Writer (free).

So, what are the common characteristics of an academic paper? What does WordPress have to support to provide functionality that meets most scholars’ publishing requirements? I scratched my head (and asked on Twitter) and came up with the following:

  • footnotes/endnotes
  • citations
  • use of LaTeX (sciences)
  • tables
  • images
  • bibliography
  • sub-headings
  • annexes
  • appendices
  • dedication
  • abstract
  • table of contents
  • index to figures
  • introduction
  • exposition
  • conclusion

Many of these are supported in WordPress by default and don’t require any additional plugins (tables, images, sub-headings, annexes, appendices, dedication, abstract, introduction, exposition, conclusion, are all either basic literary conventions or just part of a simply structured document).

For additional support, I installed digress.it, which we have funded through the JISCPress project. This is a WordPress plugin which allows readers to comment on the paragraphs of a document, rather than at the document section level. We’re adding a lot more functionality to meet the objectives of the JISCPress project, but I chose digress.it, principally for the reason that it is designed to turn a WordPress blog into a document site. I could have used any other WordPress theme, but digress.it automatically creates a Table of Contents and allows you to re-order WordPress posts when they are read so that you don’t have to author your document in reverse or adjust the publication dates so the document sections appear in the correct order.

My dissertaion published using digress.it
My dissertation published using digress.it

I added the abstract for my dissertation to the ‘about’ page, so it shows up on the front of the site. I also uploaded a PDF version so that people can download it directly. You’ll see that I also added some links to a related book and DVD, which will certainly appeal to people who are interested in my dissertation. The links pull an image and some basic metadata from Amazon, using the Amazon Machine Tags plugin. This could be used to link to the book in which your article is published and earn you money in click referrals. An alternative, would be the Open Book Book Data plugin, which retrieves a book cover and metadata from Open Library, where your book may already be catalogued. If it’s not on Open Library, catalogue it!

After setting this up, I installed a few more plugins:

Dublin Core for WordPress: Automatically adds ten Dublin Core metadata elements to the document mark up.

wp-footnotes: This allows you to easily add footnotes to your document by enclosing your footnote in double parentheses.6

OAI-ORE Resource Map: Automatically marks up the document sections with a OAI-ORE 1.0 resource map.

Google Analyticator: Adds Google Analytics support so you can collect statistics on the readership of your document.

WP Calais Archive Tagger: Analyses your entire document and automatically keywords each section, using the Open Calais API.

Search API: WordPress comes with search built in, but there is a new search API which will eventually make its way into the WordPress core. I’ve installed the plugin to provide full-text search across the document. It can also add Google Search to your document site.

wp-super-cache: This is simple to install and will significantly speed up your document site, making it a pleasure to navigate through and read :-)

Plugins I didn’t use

wp-latex: Although I didn’t need it for my dissertation, it’s worth noting that WordPress supports the use of \LaTeX.

Academic Citation: You need to add a line of code to your theme for this to display. It supports the concept of an article being a single blog post, rather than a ‘document site’ and displays a variety of citation formats for readers to use.

Do you know of any other plugins for a scholarly blog?

The Beauty of Feeds

The other useful thing about managing a document using WordPress and in particular, using digress.it, is that you automatically get RSS/Atom feeds for the document. I’ve already discussed these in detail. It means that I was able to read my document in my feed reader, with footnotes and images displayed correctly.

Document in Google Reader

See how nicely the formatting is preserved. \LaTeX is also rendered correctly in feed readers.

Document formatted nicely in Google Reader
Reading my dissertation in Google Reader

You’ll see that the document sections are listed in order; that is, first section on top. As I noted above, blogs list posts in reverse (most recent first), so I sorted the feed items in Yahoo Pipes and sorted it in ascending order. Yahoo Pipes exports as RSS and it’s that feed that I subscribed to in Google Reader. Wouldn’t it be nice, if I could import my document feed into an Institutional Repository? Wait a minute, I can! :-)

Importing an RSS feed into EPrints

Click to see the item in the repository
Click to see the item in the repository

When importing the default feed, the HTML output is accurate but in reverse order, while the RSS output from Yahoo Pipes didn’t import into EPrints very cleanly at all. I’ll work on this. UPDATE: Forget Yahoo Pipes. WordPress feeds can be sorted with a switch added to the URL: http://example.com/feed/?orderby=post_date&order=ASC

So there it is. An academic paper, published to the web using a modern CMS which supports most authoring and publishing requirements. I would favour an institutional WPMU platform for academics to author directly to, publish their pre-print to the web for open access and detailed comment, and import their RSS feed into the repository. As a proof of concept, I’m quite pleased with this. We are currently developing a widget that can be embedded in a web page or WordPress sidebar and allow a member of staff to upload a document or zipped folder of documents to the Institutional Repository. I wonder if we can also support the import of a feed from the widget, too?

So, what would your requirements be? Tell me and I’ll do my best to test WordPress against them.

  1. Has anyone done a diff on the two code bases to measure exactly what percentage of the code is shared between WP and WPMU? []
  2. Actually, I think I’ll save the discussion of its shortfalls for my next post. This one is already long enough. []
  3. I pay $5/year for my domain name and as many sub-domains as I need. I pay $10/month for my hosting with unlimited storage and bandwidth. []
  4. Like any decent CMS, WordPress supports role-based authoring and editing and maintains a revision history of edits, auto-saved once per minute. Revisions can be compared alongside of each other. []
  5. On a scholarly WPMU installation, plugins could be pre-installed and activated, a default theme selected and settings tweaked so very little work is required by the academic author prior to writing her document. []
  6. I am using the plugin on this blog! []

Each participant on the Mozilla Open Education Course, has been asked to develop a project blueprint. Here is the start of mine. It’s basically a ‘Personal Learning Environment’ (PLE)1and I’m going to try to show how WordPress MU is a good technology platform for an institution to easily and effectively support a PLE. I’m going to place an emphasis on ‘identity’ because it’s something I want to learn more about.

Short description

University students are at least 18 years old and have spent many years unconsciously accumulating or deliberately developing a digital identity. When people enter university they are expected to accept a new digital identity, one which may rarely acknowledge and easily exploit their preceding experience and productivity. Students are given a new email address, a university ID, expected to submit course work using new, institutionally unique tools and develop a portfolio of work over three to four years which is set apart from their existing portfolio of work and often difficult to fully exploit after graduation.

I think this will be increasingly questioned and resisted by individuals paying to study at university. Both students and staff will suffer this disconnect caused by institutions not employing available online technologies and standards rapidly enough. There is a legacy of universities expecting and being expected to provide online tools to staff and students. This was useful and necessary several years ago, but it’s now quite possible for individuals in the UK to study, learn and work apart from any institutional technology provision. For example, Google provides many of these tools and will have a longer relationship with the individual than the university is likely to.

Many students and staff are relinquishing institutional technology ties and an indicator of this is the massive % of students who do not use their university email address (96% in one case study). In the UK, universities are keen to accept mature, work-based and part-time students. For these students, university is just a single part of their lives and should not require the development of a digital identity that mainly serves the institution, rather than the individual.

How would it work?

Students identify themselves with their OpenID, which authenticates against a Shibboleth Service Provider.2 They create, publish and syndicate their course work, privately or publicly using the web services of their choice. Students don’t turn in work for assessment, but rather publish their work for assessment under a CC license of their choice.

It’s basically a PLE project blueprint with an emphasis on identity and data-portability. I’m pretty sure I’m not going to get a fully working model to demonstrate by the end of the course, but I will try to show how existing technologies could be stitched together to achieve what I’m aiming for. Of course, the technologies are not really the issue here, the challenge is showing how this might work in an institutional context.

I think it will be possible to show how it’s technically possible using a single platform such as WordPress which has Facebook Connnect, OAuth, OpenID, Shibboleth and RPX plugins. WordPress is also microformat friendly and profile information can be easily exported in the hCard format. hResume would be ideal for developing an academic profile. The Diso project are leading the way in this area.

Similar projects:

UMW Blogs?

Open Technology:

OpenID, OAuth, RPX, Shibboleth, RSS, Atom, Microformats, XMPP, OPML, AtomPub, XML-RPC + WordPress

Open Content / Licensing:

I’ll look at how Creative Commons licensing may be compatible with our staff and student IP policies.

Open Pedagogy

No idea. This is a new area for me. I’m hoping that the Mozilla/CC Open Education course can point me in the right direction for this. Maybe you have some suggestions, too?

  1. See Personal Learning Environments: Challenging the dominant design of educational systems []
  2. See the JISC Review of OpenID. []

Yesterday, I wrote about how I’d marked up my home page to create a semantic profile of myself that is both auto-discoverable and portable. A place where my identity on the web can be aggregated; not a hole I’ve dug for myself, but an identity that reaches out across the web but always leads back home.

While I enjoy polishing my text editor regularly and hand-crafting beautifully formed, structured data, we all know it’s a fool’s game and that the semantic web is about machines doing all the work for us. So here’s a quick and dirty run down of how to pimp your ride on the semantic web with WordPress and a few plugins.

You’ll need a self-hosted WordPress site that allows you to install plugins. I’ve got one on Dreamhost that costs me $6 a month. Next, you’ll want to install some plugins. I’ll explain what they do afterwards. One thing to note here is that I’m using plugins from the official plugin repository whenever possible. It means that you can install them from the WordPress Dashboard and you’ll get automatic updates (and they’re all GPL compatible). In no particular order…

I think that’s quite enough. All but the SIOC plugin are available from the official WordPress plugin repository. Here’s what they provide:

APML: Attention Profile Markup Language

APML (Attention Profiling Mark-up Language) is an XML-based format for capturing a person’s interests and dislikes. APML allows people to share their own personal attention profile in much the same way that OPML allows the exchange of reading lists between news readers.

The plugin creates an XML file like this one that marks up and weighs your WordPress tags as a measure of your interests. It also lists your blogroll/links and any embedded feeds.

Extended Profile

This plugin adds additional fields in your user profile which is encoded with hCard semantic microformat markup and can then be displayed in a page or as a sidebar widget. You can import hCard data, too. There might also be another use for this, too. (see below)

Micro Anywhere

Provides a couple of additional editor functions that allow you to create an hCard or hCalendar events page. Here’s an example.

OpenID

This plugin allows users to login to their local WordPress account using an OpenID, as well as enabling commenters to leave authenticated comments with OpenID. The plugin also includes an OpenID provider, enabling users to login to OpenID-enabled sites using their own personal WordPress account. XRDS-Simple is required for the OpenID Provider and some features of the OpenID Consumer.

This is key to your identity. You can use your blog URL as your OpenID or delegate a third-party service, such as MyOpenID or ClaimID. In fact, you’ve almost certainly got an OpenID already if you have a Yahoo!, Google, MySpace or AIM account. It’s up to you which one you choose to use as your persistent ID. Read more about OpenID here. It’s important and so are the issues it addresses.

XRDS-Simple

This is required to add further functionality to the OpenID plugin. It adds Attribute Exchange (AX) to your OpenID which basically means that certain profile information can be passed to third-party services (less form filling for you!) Like a lot of these plugins, install it and forget about it.

SIOC

Provides auto-discoverable SIOC metadata. “A SIOC profile describes the structure and contents of a weblog in a machine readable form.”

wp-RDFa

Provides an auto-discoverable FOAF (Friend of a Friend) profile, based on the members of your blog. I’ve been in touch with the author of this plugin and suggested that the extended profile information could also be pulled into the FOAF profile. This is largely dependent on the FOAF specification being finalised, but expect this plugin to do more as FOAF develops.

OAI-ORE Map

Provides an auto-discoverable OAI-ORE resource map of your blog. It conforms to version 0.9 of the specification, which recently made it to v1.0, so I imagine it will be updated in the near future. OAI-ORE metadata describes aggregated resources, so instead of seeing your blog post permalink as the single identifier for, say, a collection of text and multimedia, it creates a map of those resources and links them.

LinkedIn hResume

LinkedIn hResume for WordPress grabs the hResume microformat block from your LinkedIn public profile page allowing you to add it to any WordPress page and apply your own styles to it.

I like this plugin because you benefit from all the features of LinkedIn, but can bring your profile home. Ideal for students or anyone who wants to create a portfolio of work and offer their resume/CV on a single site. Depending on the theme you use, it does require some additional styling.

Get_OPML

This is a nice way to create an OPML file of your sidebar links. If, like on my personal blog, your links point to resources related to you, you can easily create an OPML file like this one. There’s a couple of things to note about this plugin though. The instructions mention a Technorati API key. I didn’t bother with this. When you create your links, just scroll down the page to the ‘advanced’ section and add the RSS feed there. Secondly, the plugin author has, for some stupid reason, hard-coded the feed to their own site into the plugin. Assuming you don’t want this spamming your personal OPML file, download a modified version from here or comment out line 101 in get-opml.php. I guess the plugin author thinks that you’ll be using this to import the OPML into a feed reader and from there, you can delete his feed. That’s no good to us though. Finally, you’ll want to make your OPML file auto-discoverable. You can do this by adding a line of html in your header, using the Header-Footer plugin below.

Header-Footer

This simply allows you to add code to the header and footer of your blog. In our case, you can use it to add an auto-discovery link to the header of every page of your blog.


<link rel="outline" type="text/xml+opml" title="ADD YOUR TITLE HERE" href="http://YOUR_BLOG_ADDRESS/opml.xml" />

WP Calais * + tagaroo

These three plugins use the OpenCalais API to examine your blog posts and return a bunch of semantic tags. I’ve written about this in more detail here (towards the end).

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

It’s an easy way to add relevant tags to your content and broadcast your content for indexing by OpenCalais. They place an additional link in your header that lists the tags for web crawlers and, I guess, improves the SEO for your site.

Extra Feed Links

I’ve written about this plugin previously, too. It adds additional autodiscovery links to your blog for author, category and tag feeds. WordPress feed functionality is very powerful and this plugin makes it especially easy to make those feeds visible.

Lifestream

This isn’t a semantic web plugin, but is a powerful way of aggregating all of your activity across the web into a single activity stream. See my example, here. It also produces a single RSS feed from your aggregated activity. Nice ;-)

Wrapping things up

If you set all of this up, you’ll have a WordPress site that can act as your primary identity across the web, aggregates much of your activity on the web into a single site and also offers multiple ways for people to discover and read your site. You also get a ‘well-formed’ portfolio that is enriched with semantic markup and links you to the wider online community in a way that you control.

Bear in mind that some of these plugins might not appear to do anything at all. The semantic web is about machines being able to read and link data, right? If you look closely in the source of your home page, you’ll see a few lines that speak volumes about you in machine talk.


<link rel="meta" href="./wp-content/plugins/wp-rdfa/foaf.php"type="application/rdf+xml" title="FOAF"/>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<link rel="meta" type="text/xml" title="APML" href="http://blog.josswinn.org/apml/" />
<link rel="alternate" type="application/rss+xml" title="NoteStream RSS Feed" href="http://blog.josswinn.org/feed/" />
<link rel="resourcemap" type="application/atom+xml" href="http://blog.josswinn.org/wp-content/plugins/oai-ore/rem.php"/>

If you do want a way to view the data, I recommend the following Firefox add-ons

Operator: Auto-discovers any embedded microformats and provides useful ways to search for similar data via third-party services elsewhere on the web.

OPML Reader: Auto-discovers an OPML file if you have one linked in your header. Allows you to either download the file or read it on Grazr.

Semantic Radar: Auto-discovers embedded RDF data. Displays custom icons to indicate the presence of FOAF, SIOC, DOAP and RDFa formats.

The Tabulator Extension: Auto-discovers and provides a table-based display for RDF data on the Semantic Web. Makes RDF data readable to the average person and shows how data are linked together across different sites.

As always, please let me know how this overview could be improved or if you know of other ways to add semantic functionality to your WordPress blog. Thanks.