Data – Joss Winn

Work at web scale* on the Orbital project

Joss Winn / December 16, 2011January 10, 2012 / Data, Fun, Open Source, Projects, Software, Standards & Specs, Web

This job is now formally open for applications.

Just a heads up to say that we’ll be advertising for a Web Developer to work on Orbital, our JISC-funded ‘Managing Research Data’ project. The post, starting in March/April, will be a 12 month, full-time, grade 5 (c.£21K) position.

The Web Developer (‘you’) will be working in the Centre for Educational Research and Development, alongside Nick Jackson, Lead Developer on Orbital, and also benefit from being in a team that includes staff in central ICT services and the Library. Orbital builds on and extends previous work we’ve been doing over the last couple of years, so if you’re interested, you should read through our projects pages.

If we were to summarise our technologies and interests I guess they would be #agile, #opensource, #opendata #LAMP, #php, #codeigniter, #mongoDB, #OAuth, #APIs, #HTML5, #CSS3, #github and moving towards #RDF and #LinkedData.

Just seeing these hashtags listed together should cause your heart to beat with excitement 🙂

When we advertise in January, you’ll see that the job spec is actually a pretty standard affair. What I want to emphasise here is how interesting and fun the job will be.

The key section in the Job Description is what you’d be working on with Nick:

Development and implementation of a set of web services, which re-use and develop our previous, JISC-funded work as well as other initiatives (e.g. SWORD and DataCite DOIs).
Documented source code will be made available under an open source license by the end of the project.
Development and implementation of mechanisms for managing and transferring data, including the use of MongoDB, OAuth, read/write RESTful APIs, SWORD2 interoperability, and integration with the administrative functions of EPrints.

That actually summarises a lot of work.

I’m managing the project and try to run things with as little hierarchy as possible within a university environment. You’ll always know the project priorities and will be trusted to self-organise and deliver on time, working to two-week iterations and, roughly, monthly releases. I regularly reflect on how we work and our overall working environment. For Orbital, I favour the Crystal Clear agile methodology, as does Nick. You’ll be encouraged to reflect on this with us, too.

We work hard, and not always 9-5pm, but we work at a pace that is sustainable over a long period of time. We take our work seriously but, in the spirit of hacking, are always looking for ways to have fun, too. We recognise that we’re fortunate to be working in a diverse and intellectually stimulating academic environment, but are user/product focused at the end of the day. You’ll be working directly with our users, who are Researchers in the School of Engineering and Siemens, and staff in the Library and ICT. You’ll need to be showing them refreshed, working software every couple of weeks and iteratively improving Orbital, based on their feedback and requirements. There may also be times when you’ll be asked to talk publicly about your work and you’ll be encouraged to blog about it every so often, too. I expect the project to produce one or two conference/journal papers, and you’ll be named as a contributor and can take as active role in that as you like.

I hope this sounds like an interesting job. At £21K, I recognise that it will probably attract younger developers looking to gain experience, though of course, we welcome applications from anyone whatever your age. By the time the post starts, we’ll have set up a decent dev/staging/production environment, hosted in the cloud, and relying on Github and Jenkins to keep things versioned, integrated and tested. Nick will have been developing Orbital for a couple of months or more and laid the groundwork for someone to start coding quickly in a supportive environment.

If you’re thinking of applying and don’t live in Lincoln, you’ll be pleased to know that it’s a decent small city, and a relatively cheap place to live. The campus is modern and sits by a Marina in the middle of the city. You can walk to work. I love the place. Oh, and you can choose your own hardware for development, within reason. Most of us use Macs, but whatever suits you. I’ll ask the successful candidate what they prefer when we offer them the job.

If, after reading around the project website, you’ve got any questions about the post, please do get in touch. Thanks.

* Wondering what the hell ‘web scale’ means? Something like this.

LNCD, DevXS and Orbital

Joss Winn / September 9, 2011 / Funding, Projects

It’s been a busy summer, to say the least.

The DIVERSE project is up and running and the Linking You and Jerome projects came to a successful end. We were joined by Jamie Mahoney, a new full-time Web Developer Intern and established a new group (LNCD). We’re co-organising a national student developer conference (DevXS) and have just been awarded a £241,500 grant for Orbital, an 18 month project to develop and pilot a new research infrastructure for the School of Engineering. Orbital is a great opportunity to build on some of our earlier work and get stuck into the challenges of managing raw research data. Which reminds me: data.lincoln.ac.uk is live, too 🙂

Please do tell your students about DevXS.

Ten reasons why you should pay attention to the geeks because actually they have something quite important to say which us non-geeky people should be listening to

Joss Winn / August 1, 2009 / Data, Mashups, Web

Re-broadcasting Mike Ellis’ recent presentation…

Don't Think Websites, think data

View more documents from Mike Ellis.

Triplify: Make your blog mashable

Joss Winn / April 27, 2009August 30, 2009 / Commons, Data, Fun, Mashups, Standards & Specs, Web

Last week, I wrote about how it is relatively simple to ‘pimp your ride on the semantic web‘. Over the weekend, I stumbled upon Triplify, a small ‘plugin’ for pretty much any web publishing platform, that “reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.” What is so appealing about Triplify is how easy it is to implement, especially alongside a WordPress site.

I can confirm that the three-step installation process is all it takes, although I wouldn’t undertake implementing this blindly as you are, literally, exposing a semantic representation of your database content. In other words, you should look at the configuration file you’re using and check that it’s going to expose the right data and not clear text passwords and unpublished posts and comments. Before I implemented it, I realised that it would expose comments on a bunch of posts that I have since made private (they were imported from an old, private blog), so I had to ‘unapprove’ those comments so the script didn’t expose them to the public. A five minute job. Alternatively, the script could probably be modified to work around my problem, by only exposing comments after a certain date, for example.

The end result is that, with a WordPress site, you expose a semantic representation of your users, posts, pages, tags, categories, comments and attachments in RDF (N-Triples) and JSON formatted data (for JSON, just add ‘?t-output=json’ to the end of the URI). Like I said though, it could be used on any database driven web application. Here’s what you get when you expose the high level links to your content:


&lt;http://blog.josswinn.org/triplify/&gt; &lt;http://www.w3.org/2000/01/rdf-schema#comment&gt; "Generated by Triplify V0.5 (http://Triplify.org)" .
&lt;http://blog.josswinn.org/triplify/&gt; &lt;http://creativecommons.org/ns#license&gt; &lt;http://creativecommons.org/licenses/by/2.0/uk/&gt; .
&lt;http://blog.josswinn.org/triplify/post&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/attachment&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/tag&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/category&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/user&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/comment&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .

Here’s an example of what you get when you expose the full content:


&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://rdfs.org/sioc/ns#Post&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://rdfs.org/sioc/ns#has_creator&gt; &lt;http://blog.josswinn.org/triplify/user/1&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://purl.org/dc/terms/created&gt; "2008-10-06T05:55:25"^^&lt;http://www.w3.org/2001/XMLSchema#dateTime&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://rdfs.org/sioc/ns#content&gt; "Up early to go to Sheffield for LPI exams. The last week has left me underprepared. Never mind." .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://purl.org/dc/terms/modified&gt; "2008-10-06T20:12:15"^^&lt;http://www.w3.org/2001/XMLSchema#dateTime&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/27&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/41&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/42&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/belongsToCategory&gt; &lt;http://blog.josswinn.org/triplify/category/22&gt; .

...

&lt;http://blog.josswinn.org/triplify/tag/154&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/Tag&gt; .
&lt;http://blog.josswinn.org/triplify/tag/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/tagName&gt; "valentine" .

You can choose to expose different levels of information in your HTML source. If you have more than a moderate amount of content, you’ll probably want to just expose the top level links as in the first example and let the users of your data dig deeper. You’ll also note that you can (and should) attach a license to your data.

A number of namespaces are recognised as well as a WordPress vocabulary.


$triplify['namespaces']=array(
'vocabulary'=&gt;'http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/',
'rdf'=&gt;'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs'=&gt;'http://www.w3.org/2000/01/rdf-schema#',
'owl'=&gt;'http://www.w3.org/2002/07/owl#',
'foaf'=&gt;'http://xmlns.com/foaf/0.1/',
'sioc'=&gt;'http://rdfs.org/sioc/ns#',
'sioctypes'=&gt;'http://rdfs.org/sioc/types#',
'dc'=&gt;'http://purl.org/dc/elements/1.1/',
'dcterms'=&gt;'http://purl.org/dc/terms/',
'skos'=&gt;'http://www.w3.org/2004/02/skos/core#',
'tag'=&gt;'http://www.holygoat.co.uk/owl/redwood/0.1/tags/',
'xsd'=&gt;'http://www.w3.org/2001/XMLSchema#',
'update'=&gt;'http://triplify.org/vocabulary/update#',
);

So, what’s the point in doing this? Well, it’s fairly trivial and if you think that structured, linked, machine-readable licensed data is a Good Thing, why not? The Triplify website lists an number of advantages:

Such a triplification of your Web application has tremendous advantages:

The installations of the Web application are better found and search engines can better evaluate the content.

Different installations of the Web application can easily syndicate arbitrary content without the need to adopt interfaces, content representations or protocols, even when the content structures change.

It is possible to create custom tailored search engines targeted at a certain niche. Imagine a search engine for products, which can be queried for digital cameras with high resolution and large zoom.

Ultimately, a triplification will counteract the centralization we faced through Google, YouTube and Facebook and lead to an increased democratization of the Web

The vision of the semantic web and semantic publishing is one of meaningfully identifying objects (and people) on the Internet and showing their relationships. This should improve searches for things on the web, but also improve how we exchange knowledge, re-use information and help clarify our identity on the web, too. It’s an ambitious task, but made easier with tools like Triplify. The semantic web also raises questions over individual privacy and, if data is well formed and accessible, it may be easier to control and therefore censor. The creator of Triplify recently gave a technical presentation on Triplify and how it is being used to publish data collected by the OpenStreetMap project. It shows how geodata exposed in this way can result in mashup applications that directly benefit you and me.

HEFCE HE Grant Allocations 2009-10 Visualised

Joss Winn / March 10, 2009April 9, 2009 / Fun, Mashups

In our weekly team meeting, I mentioned that I’d created some visualisations of the RAE research funding allocations. I also mentioned that Tony Hirst had previously done the same for the HEFCE teaching funding allocations. I offered to send everyone links to these, but before do so, I thought I’d have a go at re-creating the HEFCE visualisations myself to get a bit more practice in with IBM’s Many Eyes Wikified. So this is a companion piece to my previous post. All credit to Tony for opening my eyes to this stuff.

So, HEFCE have announced the 2009/10 grant allocations for UK Higher and Further Education institutions and provided full spreadsheets of the figures. I’ve imported the data into Google Spreadsheets and made the three tables publicly accessible as CSV files (1), (2), (3). Note that I’ve stripped out all data relating to FE grant allocations, which is included in the original spreadsheets.

Next, I’ve imported the CSV files into IBM’s Many Eyes Wikified (1), (2), (3), and these wikified tables are now the data sources for the following visualisations.

Recurrent grant for academic year 2009-10

The Pie

Bar Chart

Matrix

Bubble Chart

Comparison with 2008-09 academic year recurrent grant

The Pie

Bar Chart

Matrix

Bubble

Non-recurrent funding for 2009-10

The Pie

Bar Chart

Matrix

Bubble Chart

RAE: UK research funding results visualised

Joss Winn / March 6, 2009April 9, 2009 / Mashups

Yesterday, the results of the funding allocation for research in UK Higher Education were announced and published on the Times Higher Education website.

Successive RAEs have concentrated research cash in the hands of the elite. This time around, the pie has been shared more widely.

The full spreadsheet of results being available, I thought this was a good opportunity for someone to visualise the data, so I published the data on Google Docs as a CSV file, which Tony Hirst fed into IBM’s Many Eyes wiki and now we can really see how the pie has been shared. Click on the images to view the interactive visualisations.

A pie…

RAE Funding Pie — Funding allocation by university group

Some bubbles…

FTE staff submitted to RAE by institution and coloured by group

A bar chart…

% change in total recurrent research funding by group

and a matrix…

Cash change in funding by institution and group

You can read about the University of Lincoln’s 628% increase in funding, here and here.

Web Trend Map

Joss Winn / August 8, 2008April 9, 2009 / Fun, Web

Following their predictions in January, the Web Trend Map 3 from Information Architects, offers an interesting overview of the 300 most influential websites, illustrated along the lines of the Tokyo train map.

To get the full picture you need to either view the PDF or buy the poster. Cast your eye over the PDF and you’ll see that among the big names that stand out are Yahoo!, MSN, Google, Wikipedia, Amazon, YouTube, eBay, WordPress and Friendster. No real surprises there.

The layout is meaningful in that the train lines correspond to different web trends and Google sits in the centre because it is “slowly becoming a metaphor of the Internet itself”. Each of the 300 sites occupy different train stations in Tokyo, depending on the current status they’re deemed to have. The cool sites can be seen in cool parts of Tokyo and likewise the boring sites (i.e. Facebook) have been moved to the boring areas of the city. The creators are clearly having fun at times, too. Yahoo News, for example, is located in Sugamo, where old ladies go shopping, because Yahoo News “recently hijacked the online advertisement revenue of around 250 local newspapers and locked them into a binding contract. Who reads local news? Old people.”

Despite the sarcasm, it is a genuinely useful and interesting illustration of who the players are on the web and what spaces they dominate. There are also two forecast and branding plates which, as the names suggest, illustrate where the weather is turning for some sites and how certain brands are resonating with users.

It’s good to see WordPress being in the centre of it all; an open source product (which the Learning Lab runs on), not far from the centre of everything, located between the Google Vatican and the News district, on the Technology and Social Networking lines. The popularity of WordPress is no doubt due to it’s focus on usability and good presentation but also because as an open source product, it attracts a large developer community who write plugins to extend the basic functionality of the blogging platform, making it attractive to people who want their blog to integrate with sites like Facebook, Bebo, YouTube, Flickr and Twitter. WordPress leverage this voluntary manpower by enhancing their commercial product. Integration between sites is key as each compete for our time so it’s not surprising that dataportability.org, despite being a recent initiative, sits in the Brains district among all the big players.

The DataPortability Project is a group created to promote the idea that individuals have control over their data by determing how they can use it and who can use it. This includes access to data that is under the control of another entity.

In practice, this means that we should expect to be able to login to WordPress, select images from our Flickr account and publish them in a blog to Facebook, painlessly and securely. Web applications, including those sold to the Education market, that inhibit the secure but effortless portability of data are digging themselves into a hole.