Posts tagged Data

This job is now formally open for applications.

Just a heads up to say that we’ll be advertising for a Web Developer to work on Orbital, our JISC-funded ‘Managing Research Data’ project. The post, starting in March/April, will be a 12 month, full-time, grade 5 (c.£21K) position.

The Web Developer (‘you’) will be working in the Centre for Educational Research and Development, alongside Nick Jackson, Lead Developer on Orbital, and also benefit from being in a team that includes staff in central ICT services and the Library. Orbital builds on and extends previous work we’ve been doing over the last couple of years, so if you’re interested, you should read through our projects pages.

If we were to summarise our technologies and interests I guess they would be #agile, #opensource, #opendata #LAMP, #php, #codeigniter, #mongoDB, #OAuth, #APIs, #HTML5, #CSS3, #github and moving towards #RDF and #LinkedData.

Just seeing these hashtags listed together should cause your heart to beat with excitement :-)

When we advertise in January, you’ll see that the job spec is actually a pretty standard affair. What I want to emphasise here is how interesting and fun the job will be.

The key section in the Job Description is what you’d be working on with Nick:

  1. Development and implementation of a set of web services, which re-use and develop our previous, JISC-funded work as well as other initiatives (e.g. SWORD and DataCite DOIs).
  2. Documented source code will be made available under an open source license by the end of the project.
  3. Development and implementation of mechanisms for managing and transferring data, including the use of MongoDB, OAuth, read/write RESTful APIs, SWORD2 interoperability, and integration with the administrative functions of EPrints.

That actually summarises a lot of work.

I’m managing the project and try to run things with as little hierarchy as possible within a university environment. You’ll always know the project priorities and will be trusted to self-organise and deliver on time, working to two-week iterations and, roughly, monthly releases. I regularly reflect on how we work and our overall working environment. For Orbital, I favour the Crystal Clear agile methodology, as does Nick. You’ll be encouraged to reflect on this with us, too.

We work hard, and not always 9-5pm, but we work at a pace that is sustainable over a long period of time. We take our work seriously but, in the spirit of hacking, are always looking for ways to have fun, too. We recognise that we’re fortunate to be working in a diverse and intellectually stimulating academic environment, but are user/product focused at the end of the day. You’ll be working directly with our users, who are Researchers in the School of Engineering and Siemens, and staff in the Library and ICT. You’ll need to be showing them refreshed, working software every couple of weeks and iteratively improving Orbital, based on their feedback and requirements. There may also be times when you’ll be asked to talk publicly about your work and you’ll be encouraged to blog about it every so often, too. I expect the project to produce one or two conference/journal papers, and you’ll be named as a contributor and can take as active role in that as you like.

I hope this sounds like an interesting job. At £21K, I recognise that it will probably attract younger developers looking to gain experience, though of course, we welcome applications from anyone whatever your age. By the time the post starts, we’ll have set up a decent dev/staging/production environment, hosted in the cloud, and relying on Github and Jenkins to keep things versioned, integrated and tested. Nick will have been developing Orbital for a couple of months or more and laid the groundwork for someone to start coding quickly in a supportive environment.

If you’re thinking of applying and don’t live in Lincoln, you’ll be pleased to know that it’s a decent small city, and a relatively cheap place to live. The campus is modern and sits by a Marina in the middle of the city. You can walk to work. I love the place. Oh, and you can choose your own hardware for development, within reason. Most of us use Macs, but whatever suits you. I’ll ask the successful candidate what they prefer when we offer them the job.

If, after reading around the project website, you’ve got any questions about the post, please do get in touch. Thanks.

* Wondering what the hell ‘web scale’ means? Something like this.

It’s been a busy summer, to say the least.

The DIVERSE project is up and running and the Linking You and Jerome projects came to a successful end. We were joined by Jamie Mahoney, a new full-time Web Developer Intern and established a new group (LNCD). We’re co-organising a national student developer conference (DevXS) and have just been awarded a £241,500 grant for Orbital, an 18 month project to develop and pilot a new research infrastructure for the School of Engineering. Orbital is a great opportunity to build on some of our earlier work and get stuck into the challenges of managing raw research data. Which reminds me: data.lincoln.ac.uk is live, too :-)

Please do tell your students about DevXS.

    Re-broadcasting Mike Ellis’ recent presentation

    Last week, I wrote about how it is relatively simple to ‘pimp your ride on the semantic web‘. Over the weekend, I stumbled upon Triplify, a small ‘plugin’ for pretty much any web publishing platform, that “reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.” What is so appealing about Triplify is how easy it is to implement, especially alongside a WordPress site.

    I can confirm that the three-step installation process is all it takes, although I wouldn’t undertake implementing this blindly as you are, literally, exposing a semantic representation of your database content. In other words, you should look at the configuration file you’re using and check that it’s going to expose the right data and not clear text passwords and unpublished posts and comments. Before I  implemented it, I realised that it would expose comments on a bunch of posts that I have since made private (they were imported from an old, private blog), so I had to ‘unapprove’ those comments so the script didn’t expose them to the public. A five minute job. Alternatively, the script could probably be modified to work around my problem, by only exposing comments after a certain date, for example.

    The end result is that, with a WordPress site, you expose a semantic representation of your users, posts, pages, tags, categories, comments and attachments in RDF (N-Triples) and JSON formatted data (for JSON, just add ‘?t-output=json’ to the end of the URI). Like I said though, it could be used on any database driven web application. Here’s what you get when you expose the high level links to your content:

    
    <http://blog.josswinn.org/triplify/> <http://www.w3.org/2000/01/rdf-schema#comment> "Generated by Triplify V0.5 (http://Triplify.org)" .
    <http://blog.josswinn.org/triplify/> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by/2.0/uk/> .
    <http://blog.josswinn.org/triplify/post> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
    <http://blog.josswinn.org/triplify/attachment> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
    <http://blog.josswinn.org/triplify/tag> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
    <http://blog.josswinn.org/triplify/category> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
    <http://blog.josswinn.org/triplify/user> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
    <http://blog.josswinn.org/triplify/comment> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
    

    Here’s an example of what you get when you expose the full content:

    
    <http://blog.josswinn.org/triplify/post/154> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#Post> .
    <http://blog.josswinn.org/triplify/post/154> <http://rdfs.org/sioc/ns#has_creator> <http://blog.josswinn.org/triplify/user/1> .
    <http://blog.josswinn.org/triplify/post/154> <http://purl.org/dc/terms/created> "2008-10-06T05:55:25"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
    <http://blog.josswinn.org/triplify/post/154> <http://rdfs.org/sioc/ns#content> "Up early to go to Sheffield for LPI exams. The last week has left me underprepared. Never mind." .
    <http://blog.josswinn.org/triplify/post/154> <http://purl.org/dc/terms/modified> "2008-10-06T20:12:15"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
    
    ...
    
    <http://blog.josswinn.org/triplify/post/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag> <http://blog.josswinn.org/triplify/tag/27> .
    
    ...
    
    <http://blog.josswinn.org/triplify/post/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag> <http://blog.josswinn.org/triplify/tag/41> .
    <http://blog.josswinn.org/triplify/post/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag> <http://blog.josswinn.org/triplify/tag/42> .
    
    ...
    
    <http://blog.josswinn.org/triplify/post/154> <http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/belongsToCategory> <http://blog.josswinn.org/triplify/category/22> .
    
    ...
    
    <http://blog.josswinn.org/triplify/tag/154> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/Tag> .
    <http://blog.josswinn.org/triplify/tag/154> <http://www.holygoat.co.uk/owl/redwood/0.1/tags/tagName> "valentine" .
    

    You can choose to expose different levels of information in your HTML source. If you have more than a moderate amount of content, you’ll probably want to just expose the top level links as in the first example and let the users of your data dig deeper. You’ll also note that you can (and should) attach a license to your data.

    A number of namespaces are recognised as well as a WordPress vocabulary.

    
    $triplify['namespaces']=array(
    'vocabulary'=>'http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/',
    'rdf'=>'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'rdfs'=>'http://www.w3.org/2000/01/rdf-schema#',
    'owl'=>'http://www.w3.org/2002/07/owl#',
    'foaf'=>'http://xmlns.com/foaf/0.1/',
    'sioc'=>'http://rdfs.org/sioc/ns#',
    'sioctypes'=>'http://rdfs.org/sioc/types#',
    'dc'=>'http://purl.org/dc/elements/1.1/',
    'dcterms'=>'http://purl.org/dc/terms/',
    'skos'=>'http://www.w3.org/2004/02/skos/core#',
    'tag'=>'http://www.holygoat.co.uk/owl/redwood/0.1/tags/',
    'xsd'=>'http://www.w3.org/2001/XMLSchema#',
    'update'=>'http://triplify.org/vocabulary/update#',
    );
    

    So, what’s the point in doing this? Well, it’s fairly trivial and if you think that structured, linked, machine-readable licensed data is a Good Thing, why not?  The Triplify website lists an number of advantages:

    Such a triplification of your Web application has tremendous advantages:

    • The installations of the Web application are better found and search engines can better evaluate the content.
    • Different installations of the Web application can easily syndicate arbitrary content without the need to adopt interfaces, content representations or protocols, even when the content structures change.
    • It is possible to create custom tailored search engines targeted at a certain niche. Imagine a search engine for products, which can be queried for digital cameras with high resolution and large zoom.

    Ultimately, a triplification will counteract the centralization we faced through Google, YouTube and Facebook and lead to an increased democratization of the Web

    The vision of the semantic web and semantic publishing is one of meaningfully identifying objects (and people) on the Internet and showing their relationships. This should improve searches for things on the web, but also improve how we exchange knowledge, re-use information and help clarify our identity on the web, too. It’s an ambitious task, but made easier with tools like Triplify.  The semantic web also raises questions over individual privacy and, if data is well formed and accessible, it may be easier to control and therefore censor. The creator of Triplify recently gave a technical presentation on Triplify and how it is being used to publish data collected by the OpenStreetMap project. It shows how geodata exposed in this way can result in mashup applications that directly benefit you and me.