Scholarly publishing with WordPress

Working on the JISCPress project, I’ve been thinking quite a lot about scholarly publishing on the web, and in particular with WordPress. This morning, I read a post over on the ArchivePress blog about some WordPress plugins which are useful additions for creating a scholarly blog and it got me thinking a bit more about what features WordPress would need to support scholarly publishing.

JISCPress does away with the idea that WordPress is a blogging tool, and instead uses WordPress Multi-User as a document publishing platform, where one site or ‘blog’ is a document. The way WPMU is structured means that despite serving multiple (potentially millions) of document sites, the platform remains relatively ‘lightweight’ as each document site generates just a handful of additional database tables, while sharing the same administrative core as a single WordPress install. So, 100 WordPress blogs on WPMU is nothing like the equivalent of running 100 separate WordPress blogs, both from the point of resource requirements and administration. In fact, quite soon, there will be no such thing as WPMU as the two products are going to be merged and because they share 90%+ of the same code already, it’s not too difficult to achieve. ((Has anyone done a diff on the two code bases to measure exactly what percentage of the code is shared between WP and WPMU?))

Anyway, my point here is to discuss whether WordPress can be extended to accommodate most conventions found in scholarly publishing and where it is lacking, to identify the development work required to meet the needs of most academic who wish to write on and publish to the web. ((Actually, I think I’ll save the discussion of its shortfalls for my next post. This one is already long enough.))

Scholarly publishing extends to a wide variety of published outputs. As a Content Management System (CMS) and technology development platform, I believe that WordPress has the potential to support any type of scholarly publishing that the web supports. It is extremely extensible, as can be seen from the 6000+ plugins that are available. However, what I’m interested in is what can be done now, by an academic wishing to publish their work through the use of WordPress acting as a CMS. What can be achieved with a few quid ((I pay $5/year for my domain name and as many sub-domains as I need. I pay $10/month for my hosting with unlimited storage and bandwidth.)) to self-host WordPress so that a few plugins can be installed and a well structured, typical, scholarly paper can be published.

My Dissertation

For some time, I’ve been meaning to publish my MA dissertation. Back in 2002, I undertook some unique research which has not, to my knowledge, been repeated and I think there is some value in having it easily accessible on the web. I have an OpenOffice file and a PDF and, in the course of a morning, have published it under my own domain. The reason I did not publish it on the university WPMU platform is because I have been experimenting with different plugins and did not want to install plugins that were untested or we may not support long-term.  In this case, I’ve used a single WordPress installation, but ideally an individual researcher, group of researchers or research institution, would run a WPMU installation which allowed multiple documents to be authored individually or collaboratively ((Like any decent CMS, WordPress supports role-based authoring and editing and maintains a revision history of edits, auto-saved once per minute. Revisions can be compared alongside of each other.)) and published directly to the web as XHTML.

BuddyPress, by the way, can make the experience even more natural, not only because it is based around a community of like-minded people writing together  on the same web publishing platform, but also because, with a few tweaks here and there, we can move away from the language of blogs and towards the language of documents.


BuddyPress admin bar

Profile menu

Enough of BuddyPress on WPMU for now and back to my dissertation. I set up the site in ten minutes, without using FTP or a command line because I use a host that provides a one-click install of WordPress and WordPress allows you to search for and install plugins from its Dashboard, rather than having to use FTP. Once the site was installed, I then  made some basic changes to the settings, turning on XML-RPC and AtomPub, so that, if I decided to, I could publish to the site using my Word Processor. ((On a scholarly WPMU installation, plugins could be pre-installed and activated, a default theme selected and settings tweaked so very little work is required by the academic author prior to writing her document.)) I didn’t use this in the end, but trust me, it works very well using recent versions of MS Word, Open Office (free) and other blogging clients such as MS Live Writer (free).

So, what are the common characteristics of an academic paper? What does WordPress have to support to provide functionality that meets most scholars’ publishing requirements? I scratched my head (and asked on Twitter) and came up with the following:

  • footnotes/endnotes
  • citations
  • use of LaTeX (sciences)
  • tables
  • images
  • bibliography
  • sub-headings
  • annexes
  • appendices
  • dedication
  • abstract
  • table of contents
  • index to figures
  • introduction
  • exposition
  • conclusion

Many of these are supported in WordPress by default and don’t require any additional plugins (tables, images, sub-headings, annexes, appendices, dedication, abstract, introduction, exposition, conclusion, are all either basic literary conventions or just part of a simply structured document).

For additional support, I installed digress.it, which we have funded through the JISCPress project. This is a WordPress plugin which allows readers to comment on the paragraphs of a document, rather than at the document section level. We’re adding a lot more functionality to meet the objectives of the JISCPress project, but I chose digress.it, principally for the reason that it is designed to turn a WordPress blog into a document site. I could have used any other WordPress theme, but digress.it automatically creates a Table of Contents and allows you to re-order WordPress posts when they are read so that you don’t have to author your document in reverse or adjust the publication dates so the document sections appear in the correct order.

My dissertaion published using digress.it
My dissertation published using digress.it

I added the abstract for my dissertation to the ‘about’ page, so it shows up on the front of the site. I also uploaded a PDF version so that people can download it directly. You’ll see that I also added some links to a related book and DVD, which will certainly appeal to people who are interested in my dissertation. The links pull an image and some basic metadata from Amazon, using the Amazon Machine Tags plugin. This could be used to link to the book in which your article is published and earn you money in click referrals. An alternative, would be the Open Book Book Data plugin, which retrieves a book cover and metadata from Open Library, where your book may already be catalogued. If it’s not on Open Library, catalogue it!

After setting this up, I installed a few more plugins:

Dublin Core for WordPress: Automatically adds ten Dublin Core metadata elements to the document mark up.

wp-footnotes: This allows you to easily add footnotes to your document by enclosing your footnote in double parentheses. ((I am using the plugin on this blog!))

OAI-ORE Resource Map: Automatically marks up the document sections with a OAI-ORE 1.0 resource map.

Google Analyticator: Adds Google Analytics support so you can collect statistics on the readership of your document.

WP Calais Archive Tagger: Analyses your entire document and automatically keywords each section, using the Open Calais API.

Search API: WordPress comes with search built in, but there is a new search API which will eventually make its way into the WordPress core. I’ve installed the plugin to provide full-text search across the document. It can also add Google Search to your document site.

wp-super-cache: This is simple to install and will significantly speed up your document site, making it a pleasure to navigate through and read 🙂

Plugins I didn’t use

wp-latex: Although I didn’t need it for my dissertation, it’s worth noting that WordPress supports the use of \LaTeX.

Academic Citation: You need to add a line of code to your theme for this to display. It supports the concept of an article being a single blog post, rather than a ‘document site’ and displays a variety of citation formats for readers to use.

Do you know of any other plugins for a scholarly blog?

The Beauty of Feeds

The other useful thing about managing a document using WordPress and in particular, using digress.it, is that you automatically get RSS/Atom feeds for the document. I’ve already discussed these in detail. It means that I was able to read my document in my feed reader, with footnotes and images displayed correctly.

Document in Google Reader

See how nicely the formatting is preserved. \LaTeX is also rendered correctly in feed readers.

Document formatted nicely in Google Reader
Reading my dissertation in Google Reader

You’ll see that the document sections are listed in order; that is, first section on top. As I noted above, blogs list posts in reverse (most recent first), so I sorted the feed items in Yahoo Pipes and sorted it in ascending order. Yahoo Pipes exports as RSS and it’s that feed that I subscribed to in Google Reader. Wouldn’t it be nice, if I could import my document feed into an Institutional Repository? Wait a minute, I can! 🙂

Importing an RSS feed into EPrints

Click to see the item in the repository
Click to see the item in the repository

When importing the default feed, the HTML output is accurate but in reverse order, while the RSS output from Yahoo Pipes didn’t import into EPrints very cleanly at all. I’ll work on this. UPDATE: Forget Yahoo Pipes. WordPress feeds can be sorted with a switch added to the URL: http://example.com/feed/?orderby=post_date&order=ASC

So there it is. An academic paper, published to the web using a modern CMS which supports most authoring and publishing requirements. I would favour an institutional WPMU platform for academics to author directly to, publish their pre-print to the web for open access and detailed comment, and import their RSS feed into the repository. As a proof of concept, I’m quite pleased with this. We are currently developing a widget that can be embedded in a web page or WordPress sidebar and allow a member of staff to upload a document or zipped folder of documents to the Institutional Repository. I wonder if we can also support the import of a feed from the widget, too?

So, what would your requirements be? Tell me and I’ll do my best to test WordPress against them.

The FolkSemantic Widget for OER discovery

I like this. A widget that analyses the content of your web page and suggest related Open Educational Resources (OERs), using FolkSemantic, a collaborative website that allows you to “browse and search over 110,000 OERs”. You can see the widget in the sidebar of this blog under the heading of ‘Related Educational Resources’ –>

So, if I dump a load of text relating to ‘physics’, say, you should see physics-related OERs… Does it work? 🙂 Some random tests on other blog posts, suggests it is a bit hit and miss, but is certainly matching some OERs to the content. I wonder if we could use this approach to find related documents on the JISCPress project?

Physics (Greekphysis – φύσις meaning “nature“) is a natural science; it is the study of matter[1] and its motion through spacetime and all that derives from these, such as energy and force.[2] More broadly, it is the general analysis of nature, conducted in order to understand how the world and universe behave.[3][4]

Physics is one of the oldest academic disciplines, perhaps the oldest through its inclusion of astronomy.[5] Over the last two millennia, physics had been considered synonymous with philosophychemistry, and certain branches of mathematics and biology, but during the Scientific Revolution in the 16th century, it emerged to become a unique modern science in its own right.[6] However, in some subject areas such as in mathematical physics and quantum chemistry, the boundaries of physics remain difficult to distinguish.

Physics is both significant and influential, in part because advances in its understanding have often translated into new technologies, but also because new ideas in physics often resonate with the other sciences, mathematics and philosophy.

For example, advances in the understanding of electromagnetism led directly to the development of new products which have dramatically transformed modern-day society (e.g., television, computers, and domestic appliances); advances in thermodynamics led to the development of motorized transport; and advances in mechanics inspired the development of calculus.

Physics covers a wide range of phenomena, from the smallest sub-atomic particles, to the largest galaxies. Included in this are the very most basic objects from which all other things are composed, and therefore physics is sometimes said to be the “fundamental science”.[7]

Physics aims to describe the various phenomena that occur in nature in terms of simpler phenomena. Thus, physics aims to both connect the things we see around us to root causes, and then to try to connect these causes together in the hope of finding anultimate reason for why nature is as it is.

For example, the ancient Chinese observed that certain rocks (lodestone) were attracted to one another by some invisible force. This effect was later called magnetism, and was first rigorously studied in the 17th century.

A little earlier than the Chinese, the ancient Greeks knew of other objects such as amber, that when rubbed with fur would cause a similar invisible attraction between the two. This was also first studied rigorously in the 17th century, and came to be calledelectricity.

Thus, physics had come to understand two observations of nature in terms of some root cause (electricity and magnetism). However, further work in the 19th century revealed that these two forces were just two different aspects of one force – electromagnetism. This process of “unifying” forces continues today (see section Current research for more information).

Physics uses the scientific method to test the validity of a physical theory, using a methodical approach to compare the implications of the theory in question with the associated conclusions drawn from experiments and observations conducted to test it. Experiments and observations are to be collected and matched with the predictions and hypotheses made by a theory, thus aiding in the determination or the validity/invalidity of the theory.

Theories which are very well supported by data and have never failed any competent empirical test are often called scientific laws, or natural laws. Of course, all theories, including those called scientific laws, can always be replaced by more accurate, generalized statements if a disagreement of theory with observed data is ever found [8]. ((Source: Wikipedia))

JISCPress: Developing a community platform for the JISC funding process

I’m very pleased to announce that my bid with Tony Hirst at the Open University, to develop a community platform for the JISC funding call process based on WriteToReply, was successful. The original bid document is publicly available and currently offers the most information on this six month, £32,500 project.

Note that this is an open project using open source software and we welcome volunteer contributions from anyone. I’ve set up a project blog, mailing list, wiki and code repository. Feel free to join us if this WriteToReply spin-off appeals to you. If you know anyone that might be interested, please do let them know.

If you’ve been following WriteToReply, you’ll know that we use WordPress Multi-User and CommentPress. Eddie Tejeda, the developer of CommentPress will be working with us on the project and this will result in significant further development of CommentPress 2. So, if you’re interested in CommentPress (as many people are), please consider following, contributing to and testing JISCPress.

I should also note that while the project is a spin-off of our work on WriteToReply, neither Tony or I are personally receiving any funds from JISC.  The contributions from JISC to cover our time on this project are paid directly to our employers and does not result in any financial benefit to us or WriteToReply (which is in the process of being formalised as a non-profit business).  In other words, while WriteToReply is a personal project, JISCPress is part of our normal work as employees of our universities (both Tony and I are expected to bid and win project funds – you get used to it after a while!). Money has been allocated to fund dedicated developer time to the project, which will pay Eddie and Alex, a student at the University of Lincoln,  for their work.

Anyway, on with the project! Here’s the outline from the bid document:

This project will deliver a demonstrator prototype publishing platform for the JISC funding call and dissemination process. It will seek to show how WordPress Multi-User (WPMU) can be used as an effective document authoring, publishing, discussion and syndication platform for JISC’s funding calls and final project reports, and demonstrate how the cumulative effect of publishing this way will lead to an improved platform for the discovery and dissemination of grant-related information and project outputs. In so doing, we hope to provide a means by which JISC project investigators can more effectively discover, and hence build on, related JISC projects. In general, the project will seek to promote openness and collaboration from the point of bid announcements onwards.

The proposed platform is inspired and informed by WriteToReply, a service developed by the principle project staff (Joss Winn and Tony Hirst) in Spring 2009 which re-publishes consultation documents for public comment and allows anyone to re-publish a document for comment by their target community. In our view, this model of publishing meets many of the intended benefits and deliverables of the Rapid Innovation call and Information Environment Programme. The project will exploit well understood and popular open source technologies to implement an alternative infrastructure that enables new processes of funding-related content creation, improves communication around funding calls and enables web-centric methods of dissemination and content re-use. The platform will be extensible and could therefore be the object of further future development by the HE developer community through the creation of plugins that provide desired functionality in the future.

Getting your Triples into Talis Connected Commons

A few days ago, I wrote about adding Triplify to your web application. Specifically, I wrote about adding it to WordPress, but the same information can be applied to most web publishing platforms. Earlier this month, TALIS announced their Connected Commons platform and yesterday they announced a commercial version of their platform for the structured storage of Linked Data. Storage is all very well, but more importantly they have an API for developers, so that the data can be queried and creatively re-used or mashed up.

So this got me thinking about JISCPress, our recent JISC Rapid Innovation Programme bid, which proposes a WordPress Multi-User based platform for publishing JISC funding calls and the reports of funded projects. This is based on my experience of running WriteToReply with Tony Hirst.

Although a service for comment and discussion around documents, one of the things that interests me most about WriteToReply and, consequently the JISCPress proposal, is the cumulative storage of data on the platform and how that data might be used. No surprise really as my background is in archiving and collections management. As with the University of Lincoln blogs, WriteToReply and the proposed JISCPress platform, aggregate published content into a site-wide ‘tags’ site that allows anyone to search and browse through all content that has been published to the public. In the case of the university blogs, that’s a large percentage of blogs, but for WriteToReply and JISCPress, it would be pretty much every document hosted on the platform.

You can see from the WriteToReply tags site that over time, a rich store of public documents could be created for querying and re-use. The site design is a bit clunky right now but under the hood you’ll notice that you can search across the text of every document, browse by document type and by tag. The tags are created by publishing the content to OpenCalais, which returns a whole bunch of semantic keywords for each document section. You’ll also notice that an RSS feed is available for any search query, any category and any tag or combination of tags.

Last night, I was thinking about the WriteToReply site architecture (note that when I mention WriteToReply, it almost certainly applies to JISCPress, too – same technology, similar principles, different content). Currently, we categorise each document by document type so you’ll see ‘Consultations‘, ‘Action Plans‘ ‘Discussion Papers‘, etc.. We author all documents under the WriteToReply username, too and tag each document section both manually and via OpenCalais. However, there’s more that we could do, with little effort, to mark up the documents and I’ve started sketching it out.

You’ll see from the diagram that I’m thinking we should introduce location and subject categories. There will be formal classification schemes we could use. For example, I found a Local Government Classification Scheme, which provides some high level subjects that are the type of thing I’m thinking about. I’m not suggesting we start ‘cataloguing’ the documents, but simply borrow, at the top level, from recognised classification schemes that are used elsewhere. I’m also thinking that we should start creating a new author for each document and in the case of WriteToReply, the author would be the agency who issued the consultation, report, or whatever.

So following these changes, we would capture the following data (in bold), for example:

The Home Office created Protecting the public in a changing communications environment on April 27th which is a consultation document for England, Wales and Scotland, categorised under Information and communication technology with 18 sections.

Section one is tagged Governor, Home Department, Office of Public Sector Information, Secretary of State, Surrey.

Section two is tagged communications data, communications industry, emergency services, Home Secretary, Jacqui Smith MP, Rt Hon Jacqui Smith MP.

Section three is tagged Broadband, BT, communications, communications changes, communications data, communications data capability, communications data limits, communications environment, communications event, communications industry, communications networks, communications providers, communications service providers, communications services, emergency services, Her Majesty’s Revenue and Customs, Home Office, intelligence agencies, internet browsing, Internet Protocol, Internet Service, IP, mobile telephone system, physical networks, public telecommunications service, registered owner, Serious Organised Crime Agency, social networking, specified communications data, The communications industry, United Kingdom.

Section four is tagged …(you get the picture)

Section five, paragraph six, has the comment “fully compatible with the ECHR” is, of course, an assertion made by the government, about its own legislation. Has that assertion ever been tested in a court? authored by Owen Blacker on April 28th 11:32pm.

Selected text from Section five, paragraph eight, has the comment Over my dead body! authored by Mr Angry on April 28th 9:32pm

Note that every author, document, section, paragraph, text selection, category, tag, comment and comment author has a URI, Atom, RSS and RDF end point (actually, text selection and comment author feeds are forthcoming features).

Now, with this basic architecture mapped out, we might wonder what Triplify could add to this. I’ve already shown in my earlier post that, with little effort, it re-publishes data from a relational database as N-Triples semantic data, so everything you see above, could be published as RDF data (and JSON, too).

So, in my simple view of the world, we have a data source that requires very little effort to generate content for and manage (JISCPress/WriteToReply/WordPress), a method of automatically publishing the data for the semantic web (Triplify) and, with TALIS, an API for data storage, data access, query, and augmentation.  As always, my mantra is ‘I am not a developer’, but from where I’m standing, this high-level ‘workflow’ seems reasonable.

The benefits for the JISC community would primarily be felt by using the JISCPress website, in a similar way (albeit with better, more informed design) to the WriteToReply ‘tags’ site. We could search across the full text of funding calls, browse the reports by author, categories and tags and grab news feeds from favourite authors, searches, tags or categories. This is all in addition to the comment, feedback and discussion features we’ve proposed, too. Further benefits would be had from ‘re-publishing’ the site content as semantic data to a platform such as TALIS. Not only could there be further Rapid Innovation projects which worked on this data, but it would be available for any member of the public to query and re-use, too. No longer would our final project reports, often the distillation of our research, sit idle as PDF files on institutional websites and in institutional repositories. If the documentation we produce it worth anything, then it’s worth re-publishing openly as semantic data.

Finally, in order to benefit from the (free) use of TALIS Connected Commons, the data being published needs to be licensed under a public domain or Creative Commons ‘zero’ licence. I suspect Crown Copyright is not compatible with either of these licenses, although why the hell public consultation documents couldn’t be licensed this way, I don’t know. Do you? For JISCPress, this would be a choice JISC could make. The alternative is to use the commercial TALIS platform or something similar.

As usual, tell me what you think… Thanks.