Posts tagged semantic web

We finished JISCPress. If you’re interested, I’ve written a long overview of the work we’ve done with WPMU as a document discussion platform, based on WriteToReply. You’ll see that the project has, among other things, produced three plugins: digress.it, and two Linked Data plugins that run as background services across the platform, create relationships between documents and document sections and post RDF to the Talis Data Store. Fancy!

I like this. A widget that analyses the content of your web page and suggest related Open Educational Resources (OERs), using FolkSemantic, a collaborative website that allows you to “browse and search over 110,000 OERs”. You can see the widget in the sidebar of this blog under the heading of ‘Related Educational Resources’ –>

So, if I dump a load of text relating to ‘physics’, say, you should see physics-related OERs… Does it work? :-) Some random tests on other blog posts, suggests it is a bit hit and miss, but is certainly matching some OERs to the content. I wonder if we could use this approach to find related documents on the JISCPress project?

Physics (Greekphysis – φύσις meaning “nature“) is a natural science; it is the study of matter[1] and its motion through spacetime and all that derives from these, such as energy and force.[2] More broadly, it is the general analysis of nature, conducted in order to understand how the world and universe behave.[3][4]

Physics is one of the oldest academic disciplines, perhaps the oldest through its inclusion of astronomy.[5] Over the last two millennia, physics had been considered synonymous with philosophychemistry, and certain branches of mathematics and biology, but during the Scientific Revolution in the 16th century, it emerged to become a unique modern science in its own right.[6] However, in some subject areas such as in mathematical physics and quantum chemistry, the boundaries of physics remain difficult to distinguish.

Physics is both significant and influential, in part because advances in its understanding have often translated into new technologies, but also because new ideas in physics often resonate with the other sciences, mathematics and philosophy.

For example, advances in the understanding of electromagnetism led directly to the development of new products which have dramatically transformed modern-day society (e.g., television, computers, and domestic appliances); advances in thermodynamics led to the development of motorized transport; and advances in mechanics inspired the development of calculus.

Physics covers a wide range of phenomena, from the smallest sub-atomic particles, to the largest galaxies. Included in this are the very most basic objects from which all other things are composed, and therefore physics is sometimes said to be the “fundamental science”.[7]

Physics aims to describe the various phenomena that occur in nature in terms of simpler phenomena. Thus, physics aims to both connect the things we see around us to root causes, and then to try to connect these causes together in the hope of finding anultimate reason for why nature is as it is.

For example, the ancient Chinese observed that certain rocks (lodestone) were attracted to one another by some invisible force. This effect was later called magnetism, and was first rigorously studied in the 17th century.

A little earlier than the Chinese, the ancient Greeks knew of other objects such as amber, that when rubbed with fur would cause a similar invisible attraction between the two. This was also first studied rigorously in the 17th century, and came to be calledelectricity.

Thus, physics had come to understand two observations of nature in terms of some root cause (electricity and magnetism). However, further work in the 19th century revealed that these two forces were just two different aspects of one force – electromagnetism. This process of “unifying” forces continues today (see section Current research for more information).

Physics uses the scientific method to test the validity of a physical theory, using a methodical approach to compare the implications of the theory in question with the associated conclusions drawn from experiments and observations conducted to test it. Experiments and observations are to be collected and matched with the predictions and hypotheses made by a theory, thus aiding in the determination or the validity/invalidity of the theory.

Theories which are very well supported by data and have never failed any competent empirical test are often called scientific laws, or natural laws. Of course, all theories, including those called scientific laws, can always be replaced by more accurate, generalized statements if a disagreement of theory with observed data is ever found [8].1

  1. Source: Wikipedia []

I’ve just re-submitted this proposal for a demonstration at ALT-C 2009. It’s called WordPress Multi-User: BuddyPress and Beyond. It won’t be confirmed until June, but for the record, here it is…

‘BuddyPress’ is a new social networking layer for WordPress Multi-User blogs. It provides familiar, easy to use social networking features in addition to a high-quality and popular blogging platform. The University of Lincoln have been trialing WordPress MU since May 2008 and have been using BuddyPress since February 2009 to promote an institutional social networking community built around personalised and collaborative web publishing.

This session will demonstrate the versatility of the WordPress MU platform. We’ll look at an installation that is enhanced with BuddyPress, LDAP authentication, mobile phone support and advanced privacy controls. You’ll see how simple it is to set up site-wide RSS syndication and aggregation, enhance your blog with semantic web tools, publish mathematical formulae with LaTeX, send realtime notifications to Facebook, Twitter and IM, publish podcasts to iTunes, and embed GPX and KML mapping files. We’ll also look at how to embed WordPress content in your VLE and other institutional websites. The use of a temporary ‘ALT-C 2009 BuddyPress’ installation will be encouraged.

There will be opportunities throughout for questions and answers and participants will leave with a good understanding of the advantages and disadvantages of WordPress and the resources and skills required to provide a social networking and blogging platform in your institution.

A few days ago, I wrote about adding Triplify to your web application. Specifically, I wrote about adding it to WordPress, but the same information can be applied to most web publishing platforms. Earlier this month, TALIS announced their Connected Commons platform and yesterday they announced a commercial version of their platform for the structured storage of Linked Data. Storage is all very well, but more importantly they have an API for developers, so that the data can be queried and creatively re-used or mashed up.

So this got me thinking about JISCPress, our recent JISC Rapid Innovation Programme bid, which proposes a WordPress Multi-User based platform for publishing JISC funding calls and the reports of funded projects. This is based on my experience of running WriteToReply with Tony Hirst.

Although a service for comment and discussion around documents, one of the things that interests me most about WriteToReply and, consequently the JISCPress proposal, is the cumulative storage of data on the platform and how that data might be used. No surprise really as my background is in archiving and collections management. As with the University of Lincoln blogs, WriteToReply and the proposed JISCPress platform, aggregate published content into a site-wide ‘tags’ site that allows anyone to search and browse through all content that has been published to the public. In the case of the university blogs, that’s a large percentage of blogs, but for WriteToReply and JISCPress, it would be pretty much every document hosted on the platform.

You can see from the WriteToReply tags site that over time, a rich store of public documents could be created for querying and re-use. The site design is a bit clunky right now but under the hood you’ll notice that you can search across the text of every document, browse by document type and by tag. The tags are created by publishing the content to OpenCalais, which returns a whole bunch of semantic keywords for each document section. You’ll also notice that an RSS feed is available for any search query, any category and any tag or combination of tags.

Last night, I was thinking about the WriteToReply site architecture (note that when I mention WriteToReply, it almost certainly applies to JISCPress, too – same technology, similar principles, different content). Currently, we categorise each document by document type so you’ll see ‘Consultations‘, ‘Action Plans‘ ‘Discussion Papers‘, etc.. We author all documents under the WriteToReply username, too and tag each document section both manually and via OpenCalais. However, there’s more that we could do, with little effort, to mark up the documents and I’ve started sketching it out.

You’ll see from the diagram that I’m thinking we should introduce location and subject categories. There will be formal classification schemes we could use. For example, I found a Local Government Classification Scheme, which provides some high level subjects that are the type of thing I’m thinking about. I’m not suggesting we start ‘cataloguing’ the documents, but simply borrow, at the top level, from recognised classification schemes that are used elsewhere. I’m also thinking that we should start creating a new author for each document and in the case of WriteToReply, the author would be the agency who issued the consultation, report, or whatever.

So following these changes, we would capture the following data (in bold), for example:

The Home Office created Protecting the public in a changing communications environment on April 27th which is a consultation document for England, Wales and Scotland, categorised under Information and communication technology with 18 sections.

Section one is tagged Governor, Home Department, Office of Public Sector Information, Secretary of State, Surrey.

Section two is tagged communications data, communications industry, emergency services, Home Secretary, Jacqui Smith MP, Rt Hon Jacqui Smith MP.

Section three is tagged Broadband, BT, communications, communications changes, communications data, communications data capability, communications data limits, communications environment, communications event, communications industry, communications networks, communications providers, communications service providers, communications services, emergency services, Her Majesty’s Revenue and Customs, Home Office, intelligence agencies, internet browsing, Internet Protocol, Internet Service, IP, mobile telephone system, physical networks, public telecommunications service, registered owner, Serious Organised Crime Agency, social networking, specified communications data, The communications industry, United Kingdom.

Section four is tagged …(you get the picture)

Section five, paragraph six, has the comment “fully compatible with the ECHR” is, of course, an assertion made by the government, about its own legislation. Has that assertion ever been tested in a court? authored by Owen Blacker on April 28th 11:32pm.

Selected text from Section five, paragraph eight, has the comment Over my dead body! authored by Mr Angry on April 28th 9:32pm

Note that every author, document, section, paragraph, text selection, category, tag, comment and comment author has a URI, Atom, RSS and RDF end point (actually, text selection and comment author feeds are forthcoming features).

Now, with this basic architecture mapped out, we might wonder what Triplify could add to this. I’ve already shown in my earlier post that, with little effort, it re-publishes data from a relational database as N-Triples semantic data, so everything you see above, could be published as RDF data (and JSON, too).

So, in my simple view of the world, we have a data source that requires very little effort to generate content for and manage (JISCPress/WriteToReply/WordPress), a method of automatically publishing the data for the semantic web (Triplify) and, with TALIS, an API for data storage, data access, query, and augmentation.  As always, my mantra is ‘I am not a developer’, but from where I’m standing, this high-level ‘workflow’ seems reasonable.

The benefits for the JISC community would primarily be felt by using the JISCPress website, in a similar way (albeit with better, more informed design) to the WriteToReply ‘tags’ site. We could search across the full text of funding calls, browse the reports by author, categories and tags and grab news feeds from favourite authors, searches, tags or categories. This is all in addition to the comment, feedback and discussion features we’ve proposed, too. Further benefits would be had from ‘re-publishing’ the site content as semantic data to a platform such as TALIS. Not only could there be further Rapid Innovation projects which worked on this data, but it would be available for any member of the public to query and re-use, too. No longer would our final project reports, often the distillation of our research, sit idle as PDF files on institutional websites and in institutional repositories. If the documentation we produce it worth anything, then it’s worth re-publishing openly as semantic data.

Finally, in order to benefit from the (free) use of TALIS Connected Commons, the data being published needs to be licensed under a public domain or Creative Commons ‘zero’ licence. I suspect Crown Copyright is not compatible with either of these licenses, although why the hell public consultation documents couldn’t be licensed this way, I don’t know. Do you? For JISCPress, this would be a choice JISC could make. The alternative is to use the commercial TALIS platform or something similar.

As usual, tell me what you think… Thanks.