Getting your Triples into Talis Connected Commons

A few days ago, I wrote about adding Triplify to your web application. Specifically, I wrote about adding it to WordPress, but the same information can be applied to most web publishing platforms. Earlier this month, TALIS announced their Connected Commons platform and yesterday they announced a commercial version of their platform for the structured storage of Linked Data. Storage is all very well, but more importantly they have an API for developers, so that the data can be queried and creatively re-used or mashed up.

So this got me thinking about JISCPress, our recent JISC Rapid Innovation Programme bid, which proposes a WordPress Multi-User based platform for publishing JISC funding calls and the reports of funded projects. This is based on my experience of running WriteToReply with Tony Hirst.

Although a service for comment and discussion around documents, one of the things that interests me most about WriteToReply and, consequently the JISCPress proposal, is the cumulative storage of data on the platform and how that data might be used. No surprise really as my background is in archiving and collections management. As with the University of Lincoln blogs, WriteToReply and the proposed JISCPress platform, aggregate published content into a site-wide ‘tags’ site that allows anyone to search and browse through all content that has been published to the public. In the case of the university blogs, that’s a large percentage of blogs, but for WriteToReply and JISCPress, it would be pretty much every document hosted on the platform.

You can see from the WriteToReply tags site that over time, a rich store of public documents could be created for querying and re-use. The site design is a bit clunky right now but under the hood you’ll notice that you can search across the text of every document, browse by document type and by tag. The tags are created by publishing the content to OpenCalais, which returns a whole bunch of semantic keywords for each document section. You’ll also notice that an RSS feed is available for any search query, any category and any tag or combination of tags.

Last night, I was thinking about the WriteToReply site architecture (note that when I mention WriteToReply, it almost certainly applies to JISCPress, too – same technology, similar principles, different content). Currently, we categorise each document by document type so you’ll see ‘Consultations‘, ‘Action Plans‘ ‘Discussion Papers‘, etc.. We author all documents under the WriteToReply username, too and tag each document section both manually and via OpenCalais. However, there’s more that we could do, with little effort, to mark up the documents and I’ve started sketching it out.

You’ll see from the diagram that I’m thinking we should introduce location and subject categories. There will be formal classification schemes we could use. For example, I found a Local Government Classification Scheme, which provides some high level subjects that are the type of thing I’m thinking about. I’m not suggesting we start ‘cataloguing’ the documents, but simply borrow, at the top level, from recognised classification schemes that are used elsewhere. I’m also thinking that we should start creating a new author for each document and in the case of WriteToReply, the author would be the agency who issued the consultation, report, or whatever.

So following these changes, we would capture the following data (in bold), for example:

The Home Office created Protecting the public in a changing communications environment on April 27th which is a consultation document for England, Wales and Scotland, categorised under Information and communication technology with 18 sections.

Section one is tagged Governor, Home Department, Office of Public Sector Information, Secretary of State, Surrey.

Section two is tagged communications data, communications industry, emergency services, Home Secretary, Jacqui Smith MP, Rt Hon Jacqui Smith MP.

Section three is tagged Broadband, BT, communications, communications changes, communications data, communications data capability, communications data limits, communications environment, communications event, communications industry, communications networks, communications providers, communications service providers, communications services, emergency services, Her Majesty’s Revenue and Customs, Home Office, intelligence agencies, internet browsing, Internet Protocol, Internet Service, IP, mobile telephone system, physical networks, public telecommunications service, registered owner, Serious Organised Crime Agency, social networking, specified communications data, The communications industry, United Kingdom.

Section four is tagged …(you get the picture)

Section five, paragraph six, has the comment “fully compatible with the ECHR” is, of course, an assertion made by the government, about its own legislation. Has that assertion ever been tested in a court? authored by Owen Blacker on April 28th 11:32pm.

Selected text from Section five, paragraph eight, has the comment Over my dead body! authored by Mr Angry on April 28th 9:32pm

Note that every author, document, section, paragraph, text selection, category, tag, comment and comment author has a URI, Atom, RSS and RDF end point (actually, text selection and comment author feeds are forthcoming features).

Now, with this basic architecture mapped out, we might wonder what Triplify could add to this. I’ve already shown in my earlier post that, with little effort, it re-publishes data from a relational database as N-Triples semantic data, so everything you see above, could be published as RDF data (and JSON, too).

So, in my simple view of the world, we have a data source that requires very little effort to generate content for and manage (JISCPress/WriteToReply/WordPress), a method of automatically publishing the data for the semantic web (Triplify) and, with TALIS, an API for data storage, data access, query, and augmentation.  As always, my mantra is ‘I am not a developer’, but from where I’m standing, this high-level ‘workflow’ seems reasonable.

The benefits for the JISC community would primarily be felt by using the JISCPress website, in a similar way (albeit with better, more informed design) to the WriteToReply ‘tags’ site. We could search across the full text of funding calls, browse the reports by author, categories and tags and grab news feeds from favourite authors, searches, tags or categories. This is all in addition to the comment, feedback and discussion features we’ve proposed, too. Further benefits would be had from ‘re-publishing’ the site content as semantic data to a platform such as TALIS. Not only could there be further Rapid Innovation projects which worked on this data, but it would be available for any member of the public to query and re-use, too. No longer would our final project reports, often the distillation of our research, sit idle as PDF files on institutional websites and in institutional repositories. If the documentation we produce it worth anything, then it’s worth re-publishing openly as semantic data.

Finally, in order to benefit from the (free) use of TALIS Connected Commons, the data being published needs to be licensed under a public domain or Creative Commons ‘zero’ licence. I suspect Crown Copyright is not compatible with either of these licenses, although why the hell public consultation documents couldn’t be licensed this way, I don’t know. Do you? For JISCPress, this would be a choice JISC could make. The alternative is to use the commercial TALIS platform or something similar.

As usual, tell me what you think… Thanks.

CommentPress

CommentPress is, for educators, one of the most important developments to come out of the WordPress community and one of the most significant innovations that I know of in online publishing. I first learned about it when I saw that Yale University Press were using it to invite comment on Yochai Benkler’s book, The Wealth of Networks. In its original form, CommentPress is a theme for WordPress that allows readers to comment on, annotate and discuss paragraphs of text. In fact, although installed as a theme, it transforms a site not only by design, but with functionality you’d normally expect from plugins. In CommentPress v1.x, form and function came as a single package. It’s worth reading about the background to CommentPress. You’ll see that it’s part of a larger course of research by the Institute for the Future of the Book.

Institute for the Future of the Book was founded in 2004 to [… stimulate] a broad rethinking—in publishing, academia and the world at large—of books as networked objects. CommentPress is a happy byproduct of this process, the result of a series of “networked book” experiments run by the Institute in 2006-7. The goal of these was to see whether a popular net-native publishing form, the blog, which, most would agree, is very good at covering the present moment in pithy, conversational bursts but lousy at handling larger, slow-developing works requiring more than chronological organization—whether this form might be refashioned to enable social interaction around long-form texts… We can imagine a number of possibilities: scholarly contexts: working papers, conferences, annotation projects, journals, collaborative glosses; educational: virtual classroom discussion around readings, study groups; journalism/public advocacy/networked democracy: social assessment and public dissection of government or corporate documents, cutting through opaque language and spin (like the Iraq Study Group Report, a presidential speech, the federal budget, a Walmart or Google press release); creative writing: workshopping story drafts, collaborative storytelling; recreational: social reading, book clubs.

You can also read about CommentPress in The Chronicle for Higher Education and The Journal of Electronic Publishing.

We have started to use CommentPress at the University of Lincoln for the discussion of internal documents and feedback from staff has been good. Many are astonished at what it makes possible. A departmental research strategy paper received over 100 comments from nine staff; something we’d never have had by emailing the document out for comment. Of course, I am keen to use it to support courses and a colleague and I have recently applied for funding to use CommentPress in a course with over 100 Criminology students, who are normally asked to critique texts and respond by emailing Word documents to their tutor. Using CommentPress allows for transparent and open, formative feedback and assessment by both staff and student peers.

Outside of my work for the university, I’ve been developing WriteToReply, with Tony Hirst from the Open University. You can read about how we started WriteToReply and you’ll see that CommentPress is fundamental to what we’re trying to achieve and we’re using it for networked democracy, as suggested above. CommentPress is in fact, a comment engine for each document site. Two things make this possible. First, and most obvious, is the fact that readers on a document site can direct comments to specific paragraphs of text. Readers can also respond to other readers’ comments and a happy by-product of our re-publication of the Digital Britain – Interim Report, is that the discussion still continues, despite the consultation period being over. So CommentPress is an engine for on-site comment and discussion. Texts are dissected but remain whole; they also become social objects.

The second important contribution CommentPress has made is the provision of permalinks for each paragraph in the text. This provides a unique URI or URL for each paragraph of text, making linked references from third-party web sites possible. Combined with the trackback/pingback system built into decent web publishing platforms, CommentPress makes remote commenting on text possible, as Tony explains on his blog.

What this means is that the paragraph, action point, section or whatever can become a linked resource, or linked context, and can support remote commenting. And in turn, the remark made on the third party site can become a linked annotation to the corresponding part of the original report… How? Well through the judicious use of trackbacks… So even if you don’t want to comment on the Digital Britain Interim report on the WriteToReply site, but you do care, why not post your thoughts on your own blog, and link your thoughts directly back to the appropriate part of the report on WriteToReply?

It’s this feature, so easily missed, which makes CommentPress a comment engine. An engine suggests an underlying technology that drives something greater. By introducing paragraph permalinks, text can now be linked at a much more accurate and deeper level than was previous possible. Texts are transformed into uniquely identifiable resources of data. Academics can now reference paragraphs rather than page numbers and readers can reflect, comment and participate in the analysis of texts from their own site. For the reader, CommentPress provides a fluid interface to the document as a whole but at a technical level, explodes it across the Internet.

In the running of WriteToReply, we’ve tested CommentPress quite hard and found it to be a complex and fragile tool. Until recently, it hasn’t been updated to reflect the fast changing development of WordPress and because of its extensive use of Javascript, it clashes with other plugins, so while it transforms a WordPress site, it also restricts functionality otherwise possible. Fortunately, CommentPress 2 is being actively worked on and I’ve been helping to test it with Eddie Tejeda, the original developer. It’s currently in beta, but Eddie is responding to my feedback and fixing issues rapidly. There is a mailing list for CommentPress and the code is publicly accessible.

CommentPress 2.2 Beta
CommentPress 2.2 Beta

If you test CommentPress 2, you’ll immediately see that it’s been split into a suite of plugins and themes and that it’s now much more flexible in terms of compatibility with other WordPress plugins and in being able to select different components, options and themes.  Notably, paragraph permalinks are available as a separate plugin, which means that any WordPress blog will be able to have paragraph-level URIs, without necessarily supporting paragraph level commenting. My test site is on WriteToReply. Feel free to have a look and post comments, if you wish. As I write, it’s not quite ready for everyday use, but at the speed which Eddie has been working over the last few days, I’m confident that I’ll be able to use it here at the university and on WriteToReply before the month’s out. If you’re used to using v1.4.1, you’ll notice a lot of change. Remember that it’s still beta software and that not all of the features have been fully implemented yet. It would be great if other people could help test it across various browsers and with different documents. Multimedia is not something I’ve yet been able to throw at it, for example.

Finally, CommentPress needs continued support in terms of testing, reporting issues, bug fxes and feature development. This can be done voluntarily, but given it’s potential to support education, business and government consultations, I for one, will be looking for ways to raise funding to help support all of this. If you know of any possible funding opportunities within UK Higher Education, please do let me know.

Facebook to the repository via SWORD

A post to note that I have successfully deposited a document into our institutional repository from my Facebook account using the Facebook SWORD app, written by Stuart Lewis

There’s a few things worth mentioning: It’s a 3.1.1 EPrints IR, hosted at our university and maintained by EPrints Services. EPrints has supported SWORD since v3.1. Originally, the FB app didn’t work for the following reasons:

  • The ‘Depositing on behalf of’ field has to be left empty. I was told by Seb at EPrints Services that this is ‘disabled by default’.
  • The repository URL needs to point at the ‘service document’, not the base URL of the IR. For us, that is http://eprints.lincoln.ac.uk/sword-app/servicedocument
  • We use LDAP for authentication and the IR configuration needed to be tweaked to account for this when depositing via SWORD.
Once we’d overcome these issues, my ‘test.txt’ doc was successfully deposited from my desktop to the University of Lincoln IR via Facebook:
…with a few caveats:
  • The app announced ‘Item Deposited!’ and gave a URL which resulted in a 404 dead link http://eprints.lincoln.ac.uk/sword-app/collections/1738/deposit. I don’t know why. I thought it was because I wasn’t logged in to the IR, but even after logging in, the link was dead.
  • The app (maybe it’s defined in the SWORD spec, I haven’t checked), zipped up my metadata and document, which resulted in depositing two items: My test.txt document and the original zip file were both showing in my item list. This could be because of the way our IR is configured to unpack zip folders. I don’t know.

  • The metadata mapping was partially successful. The referreed status didn’t map across at all and the URL reference I gave mapped to the ‘Identification number’ field in EPrints, rather than the ‘Related URLs’ field, which was what I was expecting. Maybe the SWORD app field could be renamed ‘Identification URL/DOI’ or similar? The title, abstract and my name were correctly mapped. It’s a shame that my email address wasn’t autocompleted as it would be if I were depositing through the normal EPrints workflow. 
Despite these issues, it’s good to see this working in principle and I imagine that the above could be rectified quite easily. Perhaps someone can offer their solutions here?

As Stuart notes on his blog, the main value in this kind of app is the ability to broadcast to your Facebook friends that you’ve just deposited something in an IR. My main gripe, however, would be that it doesn’t make the deposit process any easier, which is what interests me about the SWORD protocol. Working this way…

  • I have to use two applications to make my document public, the benefit being that other people are told about what I’ve just done. 
  • The EPrints URL that the app points to, even if it was working, points to a non-public space, so my friends don’t have a direct link to the document from within Facebook. 
  • The metadata fields in the present version of the app, are not configurable which means that I have to add more metadata through the EPrints interface. 
  • Finally, it does seem odd to upload a document from my desktop to Facebook only to send it to another application and finish off the process of deposit there. It would be more useful, if I could deposit files that I already hold in Facebook. I don’t use Facebook enough to really know if there are apps that allow you to create documents within Facebook, but if there were, then perhaps Facebook could be used as a (collaborative?) working space and the SWORD app used to deposit final versions to an IR.

The Virtual Studio

I am in Venice to present a paper with two colleagues from the School of Architecture, at a two-day conference organised by the Metadata for Architectural Materials in Europe (MACE) Project. Yesterday was a significant day, for reasons I want to detail below. Skip to the end of this long post, if you just want to know the outcome and why this conference has been an important and positive turning point in the Virtual Studio project.

I joined the university just over a year ago to work on the JISC-funded LIROLEM Project:

The Project aimed to lay the groundwork for the establishment of an Institutional Repository that supports a wide variety of non-textual materials, e.g. digital animations of 3-D models, architectural documentation such as technical briefings and photographs, as well as supporting text based materials. The project arose out of the coincidental demands for the University to develop a repository of its research outputs, and a specific project in the school of Architecture to develop a “Virtual Studio”, a web based teaching resource for the school of Architecture.

At the end of the JISC-funded period, I wrote a lengthy summary on the project blog, offering a personal overview of our achievements and challenges during the course of the project. Notably, I wrote:

The LIROLEM Project was tied to a Teaching Fellowship application by two members of staff in the School of Architecture. Their intentions were, and still are, to develop a Virtual Studio which compliments the physical design Studio. Although the repository/archive functionality is central to the requirements of the Virtual Studio, rather than being the primary focus of the Studio, a ‘designerly’, dynamic user interface that encourages participation and collaboration is really key to the success of the Studio as a place for critical thinking and working. In effect, the actual repository should be invisible to the Architect who has little interest, patience or time for the publishing workflow that EPrints requires. More often that not, the Architects were talking about wiki-like functionality, that allowed people to rapidly generate new Studio spaces, invite collaboration, bring in multimedia objects such as plans, images and models, offer comment, discussion and critique. As student projects developed in the Virtual Studio, finished products could be archived and showcased inviting another round of comment, critique and possibly derivative works from a wider community outside the classroom Studio.

Our conference paper discussed the difficulties of ensuring that the (minority) interests of the Architecture staff were met while trying to gain widespread institutional support and sustainability for the Institutional Repository which the LIROLEM project aimed, and had an obligation, to achieve. During the presentation (below), we asked:

Can academics and students working in different disciplines be easily accommodated within the same archival space?

Our presentation slides. My bicycle is a reference to Bijker (1997)

The paper argues that advances in technology result from complex and often conflicting social interests. Within the context of the LIROLEM Project, it was the wider interests of the Institution which took precedence, rather than the minority interests of the Architectural staff.  I’m not directing criticism towards decisions made during the project; after all, I made many of them so as to ensure the long-term sustainability of the repository, but yesterday we argued that

architecture is an atypical discipline; its emphasis is more visual than literary, more practice than research-based and its approach to teaching and learning is more fluid and varied than either the sciences or the humanities (Stevens, 1998). If we accept that it is social interests that underlie the development of technology rather than any inevitable or rational progress (Bijker, 1997), the question arises as to what extent an institutional repository can reconcile architectural interests with the interests of other disciplines. Architecture and the design disciplines are marginal actors in the debate surrounding digital archive development, this paper argues, and they bring problems to the table that are not easily resolved given available software and that lie outside the interests of most other actors in academia.

Prior to the conference, I was unsure of what to do next about the Virtual Studio. I felt that the repository was the wrong application for supporting a collaborative studio environment for architects. Central to this was the unappealing deposit and cataloguing workflow in the IR and the general aesthetic of the user interface which, despite some customisation, does not appeal to designers’ expectations of a visual tool for the deposit and discovery of architectural materials.

However, the MACE Project appears to have just come to our rescue with the development of tools that query OAI-PMH data mapped to their LOM profile, enriches the harvested metadata (by using external services such as Google Maps and collecting user generated tags, for example) and provides a social platform for searching participating repositories. I managed to ask several questions throughout the day to clarify how the anticipated architectural content in our repository could be exposed to MACE.  My main concern was our issue of having a general purpose Institutional Repository, but wanting to handle subject-specific (architecture) content in a unique way. I was told that the OAI-PMH has a ‘set‘ attribute which could be used to isolate the architectural content in the IR for harvesting by MACE. Another question related to the building of defined communities or groups within the larger MACE community (i.e. students on a specific course) and was told that this is a feature they intend to implement.

Because of the work of MACE, the development of a search interface and ‘studio’ community platform has largely been done for us (at least to the level of expectation we ever had for the project). Ironically, we came to the conference questioning the use of the IR as the repository for the Virtual Studio, but now believe that we may benefit from the interoperability of the IR, despite suffering some of its other less appealing attributes. One of the things that remains for us to do, is improve the deposit experience to ensure we collect content that can be exposed to the MACE platform.

For this, I hope we can develop a SWORD tool that simplifies the deposit process for staff and students, reducing the work flow process down to the two or three brief steps you find on Flickr or YouTube, repositories they are likely to be familar with and judge others against. User profile data could be collected from their LDAP login information and they would be asked to title, describe and tag their work. A default BY-NC-ND Creative Commons license would be chosen for them, which they could opt out of (but consequently also opt out of MACE harvesting, too).

Boris Müller, who works on the MACE project, spoke yesterday of the “joy of interacting with [software] interfaces.” This has clearly been a central concern of the MACE project as it has been for the Virtual Studio project, too. I’m looking forward to developing a simple but appealing interface that can bring at least a little joy to my architect colleagues and their students.

Open Educational Resources

This list of open courseware resources came up on a delicious news feed this morning. It’s quite a comprehensive list and usefully laid out.  If that leaves you hungry for more, try ZaidLearn’s page which lists many more resources.

But wait! If you’re looking for something specific and just want to search for that lesson on “Python programming“, then you’ll want to use Tony Hirst’s customised search, which searches over ZaidLearn’s collection of links.

Even better, Scott Leslie has created the Open Educational Resources Dynamic Search Engine. The OER search engine uses Google’s Custom Search widget to search all of ZaidLearn’s links and because it’s a wiki, anyone can add to the links which the widget searches.  It’s an inspired use of a wiki and it works well. Comparing the two with my ‘Python programming’ search I get more results back from Tony Hirst’s search page. Maybe Scott stripped some of the links? I don’t know.

UPDATE: I have just found, via Twitter, ccLearn, “a division of Creative Commons which is dedicated to realizing the full potential of the Internet to support open learning and open educational resources (OER).” They have a Universal Education Search project and search engine.

EPrints Session and OR08 Reflections

Back in the office, following a week away at the Open Repositories conference.

The last couple of days were spent in EPrints sessions, as that is the repository software we use here at Lincoln. I found the first session most interesting as the new features in EPrints 3.1 were discussed. The linked page explains in detail the changes in v3.1, but in summary they provide much more control for repository managers through a web interface, rather than editing config files directly. Les’ slides give a nice overview.

The following session on EPrints and the RAE generally reflected the experience we’ve had using EPrints 2 for the RAE last year.

A session on repository analytics was a very useful overview of using Google Analytics, AWStats and IRStats to measure the various uses of an EPrints repository. Very useful, in particular IRStats which has been developed at Southampton for EPrints. I look forward to installing it.

The final sessions were mainly aimed at developers with a knowledge of Perl. I found the session on how to write plugins for EPrints 3 clear and interesting, but not especially useful as I don’t understand Perl. Still, it was obvious, even to me, that with a basic knowledge of programming, plugins could be written quite easily. I think it’s important for repository managers to immerse themselves in the technicalities of repository development even if they don’t understand much of the detail. Just by sharing ideas and questions with developers, you get a better understanding of what is involved in rolling out new features and a sense of what can be achieved within given resources.

On the whole, the conference leaned towards the technical rather than the strategic and managerial aspects of institutional repositories. There were a lot of developers present and the number of technical projects discussed seemed high. Personally, I appreciated this and came away with a good sense of where the development of repositories is going. It would have been good to have had an event which explicitly aimed at bringing both developers and repository staff together.

Finally, I do wonder whether the open access repository community would benefit from engaging with developments in Enterprise Content Management, as there is a great deal of overlap, having to face similar issues around workflow, IPR and technical standards. Perhaps there are universities evaluating the open source Alfresco ECMS as a repository platform. If so, I’d like to hear about them.

Next year, the conference is in Atlanta, USA.

Session 7: Usage

This part of the conference ended with two excellent and very different presentations on measuring the usage and impact of scholarly output.

Tim Brody, from the University of Southampton, discussed his work developing IRStats, a tool to measure the use and impact of open access repositories. IRStats has been developed to answers questions such as, “What is Professor Smith’s most downloaded paper?” and “Who is the most highly downloaded author in Mathematics?” Existing tools such as Google Analytics and AWStats, don’t offer this level of detail, which can be useful for both strategically placing the repository as an important tool in the University and as a service to both individual scholars and departments. IRStats is available for EPrints and I intend to try it in our repository.

The final presentation was by Johan Bollen, from the Los Alamos National Laboratory. He took off from where Tim left us and discussed a much larger scale project called MESUR. This project also attempts to measure the impact of scholarly output by analysing metrics from usage data. It differs to the IRStats project in both its methodology and scale, combining the evaluation of usage, citation and bibliographic data. By analysing this data, they’ve produced some fascinating graphs which show the relationships between academic disciplines. This is a project I look forward to learning more about.

As I mentioned, this was the last session in this part of the conference. The next day-and-a-half, I will be attending an EPrints User Group Session, where I hope to learn more about the new version of EPrints, the experience people had of the RAE excercise and repository analytics. There’s also a couple of training and support sessions which will be useful.