Hacking as an academic practice

Just a note to say that I think those of us working in universities should be defending the terms ‘hack’, ‘hacker‘ and ‘hacking’. Hacking has had some really bad press recently and as a consequence is terribly mis-understood, but is a term that was born out of the work ethic of pioneering academics and a practice that’s still worth defending. The history of EdTech has its roots in hacking culture. Stand up for it! Stand up for the Hackers!*

hacker: n.[originally, someone who makes furniture with an axe]

  1. A person who enjoys exploring the details of programmable systems and how to stretch their capabilities, as opposed to most users, who prefer to learn only the minimum necessary. RFC1392, the Internet Users’ Glossary, usefully amplifies this as: A person who delights in having an intimate understanding of the internal workings of a system, computers and computer networks in particular.
  2. One who programs enthusiastically (even obsessively) or who enjoys programming rather than just theorizing about programming.
  3. A person capable of appreciating hack value.
  4. A person who is good at programming quickly.
  5. An expert at a particular program, or one who frequently does work using it or on it; as in ‘a Unix hacker’. (Definitions 1 through 5 are correlated, and people who fit them congregate.)
  6. An expert or enthusiast of any kind. One might be an astronomy hacker, for example.
  7. One who enjoys the intellectual challenge of creatively overcoming or circumventing limitations.
  8. [deprecated] A malicious meddler who tries to discover sensitive information by poking around. Hence password hacker, network hacker. The correct term for this sense is cracker.

The term ‘hacker’ also tends to connote membership in the global community defined by the net (see the network. For discussion of some of the basics of this culture, see the How To Become A Hacker FAQ. It also implies that the person described is seen to subscribe to some version of the hacker ethic (see hacker ethic).

It is better to be described as a hacker by others than to describe oneself that way. Hackers consider themselves something of an elite (a meritocracy based on ability), though one to which new members are gladly welcome. There is thus a certain ego satisfaction to be had in identifying yourself as a hacker (but if you claim to be one and are not, you’ll quickly be labeled bogus). See also geekwannabee.

This term seems to have been first adopted as a badge in the 1960s by the hacker culture surrounding TMRC and the MIT AI Lab. We have a report that it was used in a sense close to this entry’s by teenage radio hams and electronics tinkerers in the mid-1950s.

In addition to the links above, here’s some worthwhile reading if you’re interested:

* Dedicated to Jim GroomTony Hirst, Alex Bilbie and Nick Jackson. Fine Hackers working in universities today.

Books, LibraryThing and me

We recently planned, designed and built our own, small, house. Once the builders had gone, I finished off the interior, laying the finished floors, decorating and tiling. I also put up book shelves that went up one side of the back door, over it and then down the other side, so that when you walk through the door, you walk under our books. There’s about 500 or so, I guess.

Why am I writing about this? Well, those books, accumulated by my wife and I over many years, look good against the wall there, and when people visit, they often stand looking across the shelves at the range of books we’ve bought and sometimes even read. Personally, I buy books on a whim and there are many that I’ve yet to read. I rarely read books cover-to-cover and rarely read fiction. When you look across the shelves, tilting your head to read the spines, you’ll come across the fiction I started reading in my late teens, the books on Buddhism, philosophy and Japanese, that I bought as a student; you’ll see the books I bought while a post-graduate student, studying film archiving. Then there are all the books I bought in-between, while teaching myself about computers, not to mention all my wife’s books. I don’t know about you, but those books say a lot about me and about the last 20 years of my life. Many of them reflect my interests and ambitions before I sent my first email, before I first used the World Wide Web and before I knew 70% of the people I now call ‘friends’ (such an abused word these days).

The thing is, I haven’t opened many of those books since I first looked at them. I’ll never read them again and most of the people I know at work and online, will never scan my bookshelves. We can chat over Twitter, subscribe to each other’s FriendFeed and read each other’s blogs, but I’ll tell you now, that’s not even half of me.

I signed up to LibraryThing a couple of years ago, added a couple of Cormac McCarthy books and then left the account alone. I thought my wife would enjoy it more than me. She reads books cover-to-cover all the time. I don’t. I use books to learn about things that I don’t know. With the exception of a few authors, I rarely read books for relaxation. I relax in my own time online and have done for years. It suits my wandering mind.

It occurred to me a few days ago that LibraryThing could enrich my digital identity in a way that no other social networking site could. By importing my book collection into LibraryThing, I could go back over my book collection, dust off the covers and gradually enrich my online identity, and at the same time people would get to know me better, if they cared to look. Never mind Twitter, where I’ve read that we’ll get to know one-another through a glimpse of the small details of our lives. I don’t buy it. You’ll learn more about me by perusing my LibraryThing collection, I can assure you.

As I write this, I’ve added 110 books that I own, about a third of the total, I reckon.

This also got me thinking about e-portfolios and how the books I accumulated as a student reveal a lot more than my CV about my depth of study and research interests during and shortly after those periods of learning. To build a book collection over time is an achievement in itself. We tend to think of a portfolio as an accumulated, curated presentation of work that we have undertaken. It’s a product – something to show; but until recently we couldn’t include a book collection in our portfolio.

As a student, I spent more time reading than writing and I read wider than my term papers and exams reflected. Despite a good degree, my undergraduate essays aren’t worth your time today, even if I could still find them, but I still have the books I collected and they are worth your time. If I was employing me, I’d take an interest in my book collection. It’s a background check I can recommend.

Of course, I could have added books I don’t own to give a false impression of myself (I haven’t). And I could exclude books from LibraryThing that I do own, because they might give a false impression of who I think I am (I have – the exceptions are trivial). But this is my point about Identity and LibraryThing: I’ve got a collection of physical books that I’m now curating online to develop a ‘portfolio’ that better represents me.

I guess that’s what a lot of LibraryThing members do. Have you?

Facebook’s transparent use of OpenID

There was a bit of excitement last month when Facebook became an OpenID relaying party. Many of the big names such as Yahoo!, Google, MySpace, etc. have long been providers of OpenIDs to their users, but Facebook is now accepting third-party OpenIDs as a way to login to their site. What’s even more unusual and why I’m writing this post is that it wasn’t until a couple of days ago that I noticed how they’d implemented OpenID:

Existing and new users can now link their Facebook accounts with their Gmail accounts or with accounts from those OpenID providers that support automatic login. Once a user links his or her account with a Gmail address or an OpenID URL, logs in to that account, then goes to Facebook, that user will already be logged in to Facebook.

I don’t think this brief explanation on the Facebook developer blog does justice to how this works in practice. What it means is that if you link your Facebook account to your OpenID, you will automatically be logged in to Facebook if you are logged into your OpenID provider and visit http://facebook.com On any other OpenID enabled site, you click a button or type your OpenID into a login box and are then logged in to the site you’re visiting.

With Facebook, they’ve done away with the need to enter your OpenID credentials altogether. If you’re logged in to your OpenID provider, pause for three seconds on http://facebook.com and you’ll be automatically logged in. If you log out of Facebook and then visit http://facebook.com, you’ll be automatically logged in again. It doesn’t seem to work if you visit any other Facebook URL.

So, for example, if you link your Google account to your Facebook account and, like many of us, are logged in to Google throughout the day using GMail, Google Reader, Google Docs, Google Calendar, or whatever, you never have to think about logging in to Facebook. It’s the closest to a transparent single-sign-on across consumer/social sites that I can think of.

I exchanged a few comments about this with Paul Walk on Twitter, who is less impressed by this than I am. What if you want to log out of Facebook? Really log out? You’d need to log out of your OpenID provider. What if you want to stay logged in to your OpenID provider but log out of Facebook? Why would you want to do that? For most users, I can’t think of a reason. Occasionally I want to log out of a site and ensure I’m completely logged out because I’m testing something. When that happens, I open a different browser, clear cookies and/or use the private browsing mode in Safari or Chrome. The benefits to Facebook’s approach seem to outweigh the occasions when I’d want to do this.

Other than habit, can you think of a reason why you’d want to log out of Facebook but remain logged into your OpenID provider?

Surely what’s important here is whether you are logged in to the world-wide-web or logged out of the world-wide-web. It would be more secure, surely, to know if you were logged in or logged out rather than whether you were logged in to some sites and logged out of others. If I lock my front door, I know that every room in my house has been secured. I don’t need to lock every room in the house, too. When I unlock my front door, I have the freedom to move around my house. And so do guests. This is where single-sign-on becomes potentially dangerous, because it opens up multiple services that have been otherwise protected by multiple authentication credentials. If someone else uses your browser, they have access to all your accounts. That could be useful when you and your partner share accounts on some websites, but dangerous if you leave your PC unattended or have your laptop stolen from a public library.

However, I imagine that most people on the web are using one or two weak passwords across the web services they use because they can’t remember multiple login details. Surely one good password to protect multiple accounts which is used to log in and out of multiple services is better than one or two weak passwords that are used everywhere? If I have one ‘key’ that works everywhere, I’m more likely to get into the habit of using it than I am if I have to remember to log out of multiple sites.

I know of three important blog posts about Facebook’s use of OpenID, two from Luke Shepard, the principle developer of OpenID on Facebook and another from Simon Willison. A month before Facebook implemented their ‘linked accounts’ feature, Luke Shepard was discussing some ideas about OpenID login on his blog. Now that OpenID login to Facebook has been implemented, he’s been discussing the logout process. Following on from these two posts, Simon Willison provides a key overview to the current implementation in light of the new Facebook username feature:

At any rate, their consumer implementation is fascinating. It’s live right now, even though there’s no OpenID login box anywhere to be seen on the site. Instead, Facebook take advantage of the little known checkid_immediate mode. Once you’ve associated your OpenID with your Facebook account (using the “Linked Accounts” section of the settings pane) Facebook sets a cookie remembering your OpenID provider, which persists even after you log out of Facebook. When you later visit the Facebook homepage, a checkid_immediate request is silently sent to your provider, logging you in automatically if you are already authenticated there.

It’s brilliant (well, I think so), to see how a seemingly minor part of the OpenID specification, can be turned into a significant improvement (well, I think so), to the user experience and signals the way for a transparent single-sign-on experience across the web, using an OpenID provider of your choice. I look forward to the day when I login to my OpenID provider (actually, my browser does that automatically when I start it up), and from then on, I’m transparently logged in to the sites I use across the web, until I log out of my OpenID provider. One day, I’ll log in to my browser and be logged in to all the web services I use. One day, I’ll log out of my browser and be logged out of all the web services I use.

The user is in control

Just a quick nod to Andy Powell’s post yesterday about Identity in a Web 2.0 World. As I’ve said before, I’m trying to catch up with the issues Andy discusses and develop them into a blueprint for the Mozilla/Creative Commons/P2P University Open Education course, I am participating in.

Andy writes:

…identity in a Web 2.0 world is not institution-centric, as manifest in the current UK Federation, nor is it based on the currently deployed education-specific identity and access management technologies.  Identity in a Web 2.0 world is user-centric – that means the user is in control…. The important point is that learners (and staff) will come into institutions with an existing identity, they will increasingly expect to use that identity while they are there (particularly in their use of services ‘outside’ the institution) and that they will continue using it after they have left.  As a community, we therefore have to understand what impact that has on our provision of services and the way we support learning and research.

I am therefore reassured that my blueprint outline is not completely off the wall:

University students are at least 18 years old and have spent many years unconsciously accumulating or deliberately developing a digital identity. When people enter university they are expected to accept a new digital identity, one which may rarely acknowledge and easily exploit their preceding experience and productivity. Students are given a new email address, a university ID, expected to submit course work using new, institutionally unique tools and develop a portfolio of work over three to four years which is set apart from their existing portfolio of work and often difficult to fully exploit after graduation. I think this will be increasingly questioned and resisted by individuals paying to study at university.

My proposal is to show there are existing technical solutions which would allow an individual to register as a student at a university, provide the institution with their Facebook, Google, Yahoo!, OpenID, etc. identification and from then on, the student uses their existing ID to authenticate against any university online resource. There’s an example of how this might happen in the JISC Review of OpenID, which describes one of the project aims as the development of

bridging software that will allow OpenIDs from any source to be used as identities within the production UK (SAML) federation.

The University of Kent host a demonstrator of this OpenID-to-Shibboleth bridge.

The other aspect of my blueprint is institutional support of a Personal Learning Environment (PLE). I am suggesting that the WordPress Multi User platform is one technology that could support the characteristics of a PLE, being: ((Taken from, Personal Learning Environments: Challenging the dominant design of educational systems. Scott Wilson, Prof. Oleg Liber, Mark Johnson, Phil Beauvoir, Paul Sharples & Colin Milligan, University of Bolton. 2006))

  • Focus on coordinating connections between the user and services
  • Symmetric relationships
  • Individualized context
  • Open Internet standards and lightweight proprietary APIs
  • Open content and remix culture
  • Personal and global scope

The PLE implementation which I have in mind is not, like the VLE, a monolithic system but rather a platform which aggregates and co-ordinates external user-centric services into a coherent learning environment. It is a parasitic system, feeding off content from existing online services such as blogs, social bookmarking, wikis and social networks, but also a rewarding environment which supports and develops the student’s existing portfolio ((In many ways, I am thinking of ‘Identity’ and ‘Portfolio’ as being largely synonymous during the student’s period of study.)) throughout their period of study.

I’ve shown how WordPress can aggregate and archive course activity, how it can enhance the discovery and connectivity of an individual’s and institution’s online profile through the addition of semantic-web-enabling plugins, how it can syndicate filtered content to other internal and external systems (through the use of feed2js, it can also syndicate content to legacy systems like Blackboard, which don’t support embedded web feeds). I’ve also shown that it can support a lightweight social network that integrates with an institution’s LDAP/Active Directory authentication system, and that social network can be OpenID enabled, allowing users to optionally link their OpenID to their WordPress/LDAP account and login via OpenID instead. ((I’ve tested this with DiSo’s OpenID plugin, which works in principle, but I suspect that once set up, the OpenID login for the specified account, completely bypasses the LDAP authentication. Surely just a small amount of development would provide tighter integration. Incidentally, a Shibboleth plugin (by the same author of the OpenID plugin) for WordPress also exists.))

Finally, the institutional and wider benefits to the public can be found when the cumulative data of the platform is itself aggregated into a structured site that enables discovery and re-use of content. An example of this is our Community Posts site, and I have also previously discussed the potential development and exploitation of this resource. Designed and licensed carefully, such a site could provide open educational resources at both user and programmatic levels.

So what empowers the user/student and puts them in control? Data-Portability and Creative Commons licensing? ((Actually, I’m starting to think that CC licensing is little more than an interim step to a better understanding of ‘data’. See ‘You don’t nor need to own your data‘ When knowledge is transmitted online, every aspect of its representation is in a form of data. Both information and instruction become ‘data’ – isn’t it backwards to think of knowledge in terms of something ‘owned’ Do you think of instructional methods as ‘yours’?)).

Pimping your ride on the semantic web

Yesterday, I wrote about how I’d marked up my home page to create a semantic profile of myself that is both auto-discoverable and portable. A place where my identity on the web can be aggregated; not a hole I’ve dug for myself, but an identity that reaches out across the web but always leads back home.

While I enjoy polishing my text editor regularly and hand-crafting beautifully formed, structured data, we all know it’s a fool’s game and that the semantic web is about machines doing all the work for us. So here’s a quick and dirty run down of how to pimp your ride on the semantic web with WordPress and a few plugins.

You’ll need a self-hosted WordPress site that allows you to install plugins. I’ve got one on Dreamhost that costs me $6 a month. Next, you’ll want to install some plugins. I’ll explain what they do afterwards. One thing to note here is that I’m using plugins from the official plugin repository whenever possible. It means that you can install them from the WordPress Dashboard and you’ll get automatic updates (and they’re all GPL compatible). In no particular order…

I think that’s quite enough. All but the SIOC plugin are available from the official WordPress plugin repository. Here’s what they provide:

APML: Attention Profile Markup Language

APML (Attention Profiling Mark-up Language) is an XML-based format for capturing a person’s interests and dislikes. APML allows people to share their own personal attention profile in much the same way that OPML allows the exchange of reading lists between news readers.

The plugin creates an XML file like this one that marks up and weighs your WordPress tags as a measure of your interests. It also lists your blogroll/links and any embedded feeds.

Extended Profile

This plugin adds additional fields in your user profile which is encoded with hCard semantic microformat markup and can then be displayed in a page or as a sidebar widget. You can import hCard data, too. There might also be another use for this, too. (see below)

Micro Anywhere

Provides a couple of additional editor functions that allow you to create an hCard or hCalendar events page. Here’s an example.

OpenID

This plugin allows users to login to their local WordPress account using an OpenID, as well as enabling commenters to leave authenticated comments with OpenID. The plugin also includes an OpenID provider, enabling users to login to OpenID-enabled sites using their own personal WordPress account. XRDS-Simple is required for the OpenID Provider and some features of the OpenID Consumer.

This is key to your identity. You can use your blog URL as your OpenID or delegate a third-party service, such as MyOpenID or ClaimID. In fact, you’ve almost certainly got an OpenID already if you have a Yahoo!, Google, MySpace or AIM account. It’s up to you which one you choose to use as your persistent ID. Read more about OpenID here. It’s important and so are the issues it addresses.

XRDS-Simple

This is required to add further functionality to the OpenID plugin. It adds Attribute Exchange (AX) to your OpenID which basically means that certain profile information can be passed to third-party services (less form filling for you!) Like a lot of these plugins, install it and forget about it.

SIOC

Provides auto-discoverable SIOC metadata. “A SIOC profile describes the structure and contents of a weblog in a machine readable form.”

wp-RDFa

Provides an auto-discoverable FOAF (Friend of a Friend) profile, based on the members of your blog. I’ve been in touch with the author of this plugin and suggested that the extended profile information could also be pulled into the FOAF profile. This is largely dependent on the FOAF specification being finalised, but expect this plugin to do more as FOAF develops.

OAI-ORE Map

Provides an auto-discoverable OAI-ORE resource map of your blog. It conforms to version 0.9 of the specification, which recently made it to v1.0, so I imagine it will be updated in the near future. OAI-ORE metadata describes aggregated resources, so instead of seeing your blog post permalink as the single identifier for, say, a collection of text and multimedia, it creates a map of those resources and links them.

LinkedIn hResume

LinkedIn hResume for WordPress grabs the hResume microformat block from your LinkedIn public profile page allowing you to add it to any WordPress page and apply your own styles to it.

I like this plugin because you benefit from all the features of LinkedIn, but can bring your profile home. Ideal for students or anyone who wants to create a portfolio of work and offer their resume/CV on a single site. Depending on the theme you use, it does require some additional styling.

Get_OPML

This is a nice way to create an OPML file of your sidebar links. If, like on my personal blog, your links point to resources related to you, you can easily create an OPML file like this one. There’s a couple of things to note about this plugin though. The instructions mention a Technorati API key. I didn’t bother with this. When you create your links, just scroll down the page to the ‘advanced’ section and add the RSS feed there. Secondly, the plugin author has, for some stupid reason, hard-coded the feed to their own site into the plugin. Assuming you don’t want this spamming your personal OPML file, download a modified version from here or comment out line 101 in get-opml.php. I guess the plugin author thinks that you’ll be using this to import the OPML into a feed reader and from there, you can delete his feed. That’s no good to us though. Finally, you’ll want to make your OPML file auto-discoverable. You can do this by adding a line of html in your header, using the Header-Footer plugin below.

Header-Footer

This simply allows you to add code to the header and footer of your blog. In our case, you can use it to add an auto-discovery link to the header of every page of your blog.


<link rel="outline" type="text/xml+opml" title="ADD YOUR TITLE HERE" href="http://YOUR_BLOG_ADDRESS/opml.xml" />

WP Calais * + tagaroo

These three plugins use the OpenCalais API to examine your blog posts and return a bunch of semantic tags. I’ve written about this in more detail here (towards the end).

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

It’s an easy way to add relevant tags to your content and broadcast your content for indexing by OpenCalais. They place an additional link in your header that lists the tags for web crawlers and, I guess, improves the SEO for your site.

Extra Feed Links

I’ve written about this plugin previously, too. It adds additional autodiscovery links to your blog for author, category and tag feeds. WordPress feed functionality is very powerful and this plugin makes it especially easy to make those feeds visible.

Lifestream

This isn’t a semantic web plugin, but is a powerful way of aggregating all of your activity across the web into a single activity stream. See my example, here. It also produces a single RSS feed from your aggregated activity. Nice 😉

Wrapping things up

If you set all of this up, you’ll have a WordPress site that can act as your primary identity across the web, aggregates much of your activity on the web into a single site and also offers multiple ways for people to discover and read your site. You also get a ‘well-formed’ portfolio that is enriched with semantic markup and links you to the wider online community in a way that you control.

Bear in mind that some of these plugins might not appear to do anything at all. The semantic web is about machines being able to read and link data, right? If you look closely in the source of your home page, you’ll see a few lines that speak volumes about you in machine talk.


<link rel="meta" href="./wp-content/plugins/wp-rdfa/foaf.php"type="application/rdf+xml" title="FOAF"/>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<link rel="meta" type="text/xml" title="APML" href="http://blog.josswinn.org/apml/" />
<link rel="alternate" type="application/rss+xml" title="NoteStream RSS Feed" href="http://blog.josswinn.org/feed/" />
<link rel="resourcemap" type="application/atom+xml" href="http://blog.josswinn.org/wp-content/plugins/oai-ore/rem.php"/>

If you do want a way to view the data, I recommend the following Firefox add-ons

Operator: Auto-discovers any embedded microformats and provides useful ways to search for similar data via third-party services elsewhere on the web.

OPML Reader: Auto-discovers an OPML file if you have one linked in your header. Allows you to either download the file or read it on Grazr.

Semantic Radar: Auto-discovers embedded RDF data. Displays custom icons to indicate the presence of FOAF, SIOC, DOAP and RDFa formats.

The Tabulator Extension: Auto-discovers and provides a table-based display for RDF data on the Semantic Web. Makes RDF data readable to the average person and shows how data are linked together across different sites.

As always, please let me know how this overview could be improved or if you know of other ways to add semantic functionality to your WordPress blog. Thanks.

A few notes on data portability

I had a bit of fun over the weekend looking at how I could both aggregate my online presence and make it portable, all under my own domain name. I ended up touching on a bunch of interesting initiatives revolving around web and data standards. The minor output of this is over on my personal ‘home page’ at http://josswinn.org

You’ll see that there’s an Attention Profile (APML), Friend of a Friend document (FOAF), hCard generated from my contact details, an OPML file of the significant feeds I have spotted around the web (Delicious, this blog, Twitter, Last.fm, etc), an aggregated feed of my OPML file, and a link to my LinkedIn profile, which I happily learned includes hResume microformat markup. My OPML, FOAF profile and RSS feed are all auto-discoverable.

All links on the page are marked up using the XFN markup rel=”me” tag, which should help consolidated my identity on the web. There’s an interesting discussion over on Marshall Kirkpatrick’s blog about how our Twitter profiles are starting to rank higher in search engines than our personal blogs or home pages because Twitter is using the rel=”me” tag. Marshall suggests that we start using rel=”me” somewhere on our own sites to counteract that.

To add to the fun, I also tried to get the page to validate as HTML5, but in doing so, I had to remove the meta tag that provides OpenID Attribute Exchange via my OpenID Service Provider. I get the error:

Bad value X-XRDS-Location for attribute http-equiv on element meta.

Apparently the draft HTML5 spec currently disallows values for httpequiv. OpenID AX is a good thing if you want to consolidate your identity while at the same time ensure it is portable. It’s certainly more useful to me than validating as HTML5.

In addition to this, I added a Google Friend Connect (OpenSocial) widget and integrated Apture. I thought about adding the ability to leave comments via Disqus, the advantage being that comment authors could retain control over their own comments. But to be honest, I don’t think you or I need yet another method of communicating with each other. There are plenty of ways to do that already.

Other than providing a playground for fun, what this bit of tinkering on my home page has taught me is that microformats and the ethos of data portability is being embraced quite widely on the web and although I spent my time hand-crafting my new home page, there are opportunities to do much the same, quite easily, through the use of a WordPress blog and a bunch of third-party services. More on that later…

Open Calais + site-wide tags = semantic site architecture

Preamble about people

Over the last month, we’ve I’ve started to grow an embryonic social web publishing platform that can be many things but fundamentally offers a personalised and collaborative environment for research, teaching and learning. (Where? You’re looking at it!). There are a few active blogs (currently fewer than on the pilot Learning Lab blogs), nearly 70 users and the word is starting to get out at a pace that I can manage. So, now it’s time to look to the future…

By running BuddyPress, the connections between people are pretty much taken care of. Sign in to http://blogs.lincoln.ac.uk with a Lincoln username and password and you’ve joined a community that, as it grows, will increasingly and effortlessly connect people through the information they choose to add to their profile. Staff and students can click on a link and find other people who have similarly tagged their profile.

Notice the comma seprated hyper-linked data
Notice the comma-separated hyper-linked data

What is of equal interest to me, and potentially very useful to the university community, is how we link the content that is being generated by staff and students and make those links accessible. It is not difficult to appreciate what the potential is when you have a revolving community of 10,000 people who, over time, document their work, their research, teaching and learning using cutting edge web publishing tools, but I’m writing this post to try and understand and sketch out how I might evolve what I have begun.

Put simply, WordPress Multi-User (WPMU) allows one person (me) to provide and manage multiple web sites which other people (staff and students) take ownership of. Typically, every action, every new user and every new page and post on every site, is recorded and held in a shared database(s). Although at this low level, the data is relational, on the surface, when you look at one of the sites, they pretty much stand alone and so they should. We’re not talking about a single website with lots of users, we’re talking about lots of websites with lots of users. They might be working collaboratively with others, but they’re working as individuals or in distinct groups that benefit from a distinct online identity. BuddyPress helps bring things together by aggregating people’s actions (i.e. posting blog updates, making friends, joining groups, posting messages) but the visibility of those connections is transient. Social networks display our actions along a timeline and the connections between people are, for the most part, buried until the next time person A interacts with person Y.

Enough about connecting people.

Site-wide content aggregation

Site content is a mixture of text, multimedia and metadata. The last thing I’ll do when completing this blog post is to categorise and tag it. Each time I write, I publish text, (sometimes images) and metadata which summarises and categorises the full text. Why am I telling you this? You know it already. What you may not know is that each post created on our university WPMU installation, by any person, providing their blog is public, is aggregated into a single site and re-published a second time. So this post exists here on this site and there, on the Community Posts site. Notice how the Community Posts version links back to the original post. We’re not creating a whole new resource, we’re creating a powerful linked resource that allows others to search, filter, browse and discover content held across multiple sites. With only a few sites up and running here at the moment, the opportunity to discover varied content is limited, but over time that will change. Look at wordpress.com, where there are 5 million sites:

Browse by user-generated metadata

Search over 5 million sites
Search over 5 million sites

On the university blogs, this is made possible through the use of the site-wide-tags plugin, which was developed by @donncha, the same person that develops WPMU and the wordpress.com site. By using this plugin, a WPMU installation can share similar functionality to what you see on wordpress.com. I say ‘similar’ because, as I’ll mention later, designing how people discover content is key to all of this and something I, or we as a community, would benefit from thinking about and acting on collectively.

Community Posts
Community Posts

On the Community Posts site, you can search the full-text of every post, filter resources by category and tag, and subscribe to feeds from any combination of tag or category. Any search can be turned into a feed by appending ‘&feed=rss’ to the end of the resulting URL.

i.e. http://tags.blogs.lincoln.ac.uk/?s=gaming&feed=rss

To create a feed from a tag or category, just click on a tag or category and append ‘/feed’ to the end of the URL.

i.e. http://tags.blogs.lincoln.ac.uk/tag/games/

You can combine tags with ‘+’, too:

http://tags.blogs.lincoln.ac.uk/tag/games+development/

You can also specify the type of feed you want by appending:

/feed/rss/
/feed/rss2/
/feed/rdf/
/feed/atom/

Mixing categories and tags is currently broken by a bug but is due to be fixed in the next version of WordPress.

So it’s not difficult to imagine, over time, an active community of thousands of university web publishers, having their content aggregated into a site-wide resource that allows full text searching, browsing and filtering with a choice of feeds to syndicate that content elsewhere. See how it’s happening at the University of Mary Washington, where over 2400 sites have been created in under three years.

Semantic technology

Yesterday, I discovered OpenCalais. It’s a semantic technology that’s been around since January 2008, so you might be tired of hearing about it, but if not, ‘Welcome to Web 3.0!’

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

Nice. And it’s installed on this site. There are three Calais plugins available for WordPress. This one, allows writers to submit their blog posts to the OpenCalais web service API and fetch back a number of auto-generated tags based on the content of their post. The longer the post, the more tags are returned. Tags are returned in just seconds. Those tags can be added to the post in their entirety or used selectively (actually, you have to add them all and then remove those you don’t want to include – a minor irritation). This next plugin, allows you to automatically go through every post you’ve written and tags them using the Calais web service. It’s all or nothing, but following the auto-tagging of archive content, you can then go to the ‘tags’ menu and delete any tags you don’t want to use. I’ve done that to this site and to the Community Posts site. Calais looks for names, facts and events and the API allows for up to 40,000 transactions a day and up to four per second. It returns some predictable tags and a few odd ones, but on the whole is fast and works like magic.

The third plugin also allows blog authors to fetch tags for the post they are writing and, in addition, it also suggests Creative Commons licensed images based on a dynamic evaluation of the chosen or suggested tags.

The tagaroo interface
The tagaroo interface

Image suggestion is a nice idea, but tends to return some fairly generic images.

Having used OpenCalais to auto-tag the Community Posts site, a whole new and richer set of semantic metadata has been added with barely any effort. The challenge now is to figure out how to 1) automate this as a scheduled process, so that the Calais plugin looks for new content every hour, say, and tags whatever has been recently introduced (a cron job that calls the plugin and a modification to the plugin to look at the timestamp of the post and ignore anything older than when it was last run?); 2) present the semantic data in an accessible way and this mostly, I think, comes down to appropriate site design.  The wordpress.com screenshots above show one way of doing it. A del.icio.us style approach is a more powerful and versatile model of tag filtering. Until then, it’s a matter of constructing filters, searches and feeds in the way I’ve outlined above.

So how might all of this semantically structured data be used? It seems to me that most of the advantages are proportional to the quantity of information available. For teaching and learning, it could be used by students and staff who want to find and re-use material that has been posted in the past for a specific course or subject area. Great for new students who want to measure the type and quality of work produced by students in previous years. In a similar way, it could be used by staff looking for posts by colleagues on subjects they might be teaching, and because searches and tags can be turned into feeds, past content could be aggregated into a new course site. A widely adopted, semantically tagged WPMU installation could also reveal trends in the type of work occurring at the university and, by tagging names of people, queries against references to Prof. X’s work could be made (I also wonder whether through the use of feeds, content from the institutional repository could be joined up with all of this, too – but it’s late in the day and I can’t think straight).

You’ll see from the image below that using Calais on the Community Posts site, resulted in a much richer variety of tags than would have appeared if we relied on user-generated tagging alone (136 posts now have 558 tags). Some people don’t even bother to tag their work… Shame on them! Notice too, that with the Firefox Operator plugin, you can take a tag on the site and use it to find related resources elsewhere. So if you’re looking at work tagged ‘client-applications’ on WPMU, you can conveniently hop over to delicious and find further web resources or, on a whim, look at what books on this subject are available on Amazon.

Operator provides a way to use tags on one site to discover related resources on another site
Use tags on one site to discover related resources on another site

Anyway, if you’re still reading, you might remember from the title of this post that my overriding interest in all of this is how it can be understood as and developed into a site-wide ‘architecture’. Again, I’m thinking how user-generated tags have determined the way delicious is designed for navigation and searching of resources. I need to learn more about how WordPress themes are constructed and consider how available functions can be best exploited and usefully presented on this type of site. If you have any ideas or want to work on a specific theme to get the most out of the site-wide-tags plugin, please do leave a comment or get in touch on Twitter @josswinn

Outsourcing email and data storage case studies

The JISC published four case studies on Friday concerned with ‘outsourcing email and data storage’. They are quick reads and straight to the point. Pulling together all the ‘Lessons Learned’, we are told the following:

  • Handle the beta mentality – expect things to change, ask not how you can control change but how will you respond to it.
  • Web 2.0 is as much an attitude as any technical standard.
  • Ensure that your contractual and procurement processes allow for the provision of a free service. They may be designed for a traditional system of tendering with providers bidding to provide the service, and may not cope with a bidding system based on a ‘free’ service.
  • Ensure that students and staff are aware of the reasons behind the change.
  • Who is a student and who is a member of staff? If you have a high proportion of graduates who undertake various jobs and duties for the University, will they need a staff or a student email account, or both?
  • What emails and data do you need to keep private and confidential?
  • Are you aware of the jurisdiction that any external third party servers are under?

Useful observations. For me though, what the reports didn’t address was why each university was providing an email address to students in the first place. Isn’t the issue less about ’email and data storage’ and more about having a trusted and portable university identity? Providing a GMail or Windows Live hosted account still doesn’t guarantee that the majority of students would use that email address as their primary address (prior to outsourcing at the University of Westminster, “96% of students did not use the University email system”). I’m assuming that the new, third-party managed email addresses are still *.ac.uk accounts – this wasn’t clear to me from the reports. Having a *.ac.uk account is useful, primarily for online identification purposes.

Personally, I think that the benefit of having Google or Microsoft manage a trusted university identity for students, is not the email service itself (yet another address that students wouldn’t necessarily use for messaging), but the additional services that Google provide such as their online office apps, instant messaging, news reader (all accessible from mobiles) and, most importantly, the trusted identity that is used across and beyond those value-added services. Furthermore, as both Google and Microsoft embrace OpenID, that trusted identity will assume even greater ‘value’ beyond their own web services. Email addresses are well established forms of online identity and most people are happy to have that identity managed by a third-party.

I like the URI approach that OpenID currently uses although I think that adoption will be slow if users can’t alternatively use their email address (i.e. johnsmith@gmail.com, rather than http://johnsmith.id.google.com or whatever Google settles on). Some services do allow that option using Email Address to URL Translation, which highlights the value of having an email address, not for the communication of messages but for the communication of one’s identity.

Anyone with any thoughts on this? It’s pretty simple to get a message across these days but harder to manage our online identities.