Encryption and Google docs

We’ve recently started providing staff training on using Google apps and one of the questions that always comes up is around privacy and security. Following one of our sessions, one member of staff is using Google docs to manage a large number of sensitive documents, with several other colleagues. The sharing of folders and documents with different people is proving very useful. Recently, that member of staff asked me about whether it was possible to encrypt files stored on Google docs so I had a look around to see what the situation is. I knew that transport encryption is available (i.e. https) and that there was no feature in Google docs to encrypt a file, but wanted to provide a thorough response to my colleague.

As I said, Google doesn’t provide the facility to encrypt data held in Google docs. You can however, encrypt a file and upload it to Google docs for online storage only. To read the file, it has to be downloaded and decrypted. I tested this with a .pgp file.

I searched around on the web for a few more clues and there’s the suggestion (last comment) that the data is ‘sharded’ across multiple servers and when you click on the name of a file, the data is brought together into the file for you to work on. I haven’t found any official confirmation of this technique being used.

There’s a Google docs employee on Get Satisfaction that has responded a few times to people’s questions around this area. These replies offer some clarity:

In summary, there is no encryption of data on Google’s servers, but Google are using the same systems to manage their private corporate data and they comply with international (including the UK) data privacy policies. Introducing encryption is technically feasible but would introduce many negative consequences to the features they provide (slower, no collaboration, etc.)

If you’ve got any other, officially confirmed, information on the security of Google docs, please do leave a comment. Thanks.

Falling in love (with libraries)

I’ve just come from the Library, having been invited to join colleagues in a day long strategy workshop, led by a nice bloke called Ken Chad. Throughout the day, we discussed library users’ needs, took a pragmatic view in assessing the work to be done, looked at the barriers we face and some potential solutions. One of the contributions I made was around the benefits of getting to know the users of our Library better and using that knowledge to further improve our library services. There’s nothing remarkable about that. What got me thinking throughout the day was a brief discussion about the role of surveys in soliciting feedback on the services we provide. It got me thinking about some reading I’ve been doing recently around ‘resilience theory’ and a key component of resilience theory is learning from feedback so as to adapt and survive. Resilience theory is a branch of the ecological sciences that “emphasizes non-linear dynamics, thresholds, uncertainty and surprise, how periods of gradual change interplay with periods of rapid change and how such dynamics interact across temporal and spatial scales” (Folke 2006). Folke lists the attributes of a resilient social-ecological system as:

  1. the amount of disturbance a system can absorb and still remain within the same state or domain of attraction,
  2. the degree to which the system is capable of self-organization (versus lack of organization, or organization forced by external factors), and
  3. the degree to which the system can build and increase the capacity for learning and adaptation.

It’s the last point that interests me here. That is, the degree to which something has the capacity to learn and adapt. So, resilience theory is a theory of learning, adaptation and change. It’s not a theory of preservation but rather one of sustainability. Hopkins (2008) has likewise summarised the ‘ingredients’ of resilient systems as:

  1. Diversity
  2. Modularity
  3. Tightness of feedbacks

I think resilience theory is a theory which can be usefully applied to eco-systems, single organisms, individuals, even library systems. Anything that has an interest in longevity or sustainability in the face of inevitable change. So it seems to me that the use of surveys is an implicit admittance of failure in terms of knowing the people who you are surveying.

In our relationships we don’t issue quarterly or annual surveys to find out what people think about us. As I said in the workshop, I’ve never surveyed my wife. I listen to her, I get to know her as she changes and I change, adapt and respond to her needs. This is what it’s like to fall in love. In my experience, you meet someone and the first few months are a concentrated effort to get to know that person. Long days and late nights, talking to each other, discovering connections, sharing ideas and ideals, each person looking for a sense of surprise and delight as we unfold our lived experience in front of each other. In other words, we get to know that person and at the point or the period of falling in love, we commit ourselves to continually learning more about that person, listening to them, taking their feedback and adapting ourselves, growing old together. A relationship where neither or only one person takes on this commitment to listen, learn and adapt is, frankly, living hell.

And in a way, that’s what the most successful online services are engaging in. I’ve never been issued a survey from Google or Amazon. They don’t need to survey me, because they’ve been learning about me, with every click, every purchase, every email, every movement and decision they can track.1 And using that feedback, that learning, they’ve adapted their services to respond to what they think are my needs.2 The ‘tightness of feedbacks’, as Hopkins puts it, is essential to long-term friendships, marriages and, yes, the sustainability of library services. We need to get to the point where the feedback we receive from surveys is not necessarily perfect (what relationship is?), but is no longer of any use to us, because we already know what library users need, enjoy and are interested in. By creating a library system that learns from every person who uses it and adapts over time to the environment it is part of, we create a resilient and therefore a sustainable library system that its users fall in love with.

  1. I completely neglect to discuss privacy issues here. Needless to say, falling in love is quite different to being stalked. []
  2. Sometimes they impose features on users and the technology can drive our actions and create artificial needs, and many of us recognise this manipulation or domination of the technology and begin to reject it, calling off the relationship. Sometimes people can become subservient in the relationship, too. []

Jailbreaking WordPress with Web hooks

As is often the case, I struggle at first glance to see the full implications of a new development in technology, which is why I so often rely on others to kick me up the arse before I get it.1

Where I ramble about WordPress as a learning tool for the web…

I first read about web hooks while looking at WordPress, XMPP and FriendFeed’s SUP and then again when writing about PubSubHubbub. Since then, Dave Winer’s RSSCloud has come along, too, so there’s now plenty of healthy competition in the world of real time web and WordPress is, predictably, a mainstream testing ground for all of it. Before I go on to clarify my understanding of the implications of web hooks+WordPress, I should note that my main interest here is not web hooks nor specifically the real time web, which is interesting but realistically, not something I’m going to pursue with fervour. My main interest is that WordPress is an interesting and opportunistic technology platform for users, administrators and developers, alike. Whoever you are, if you want to understand how the web works and how innovations become mainstream, WordPress provides a decent space for exercising that interest. I find it increasingly irritating to explain WordPress in terms of ‘blogging’. I’ve very little interest in WordPress as a blog. I tend to treat WordPress as I did Linux, ten years ago. Learning about GNU/Linux is a fascinating, addictive and engaging way to learn about Operating Systems and the role of server technology in the world we live in. Similarly, I have found that learning about WordPress and, perhaps more significantly, the ecosystem of plugins and themes2 is instructive in learning about the technologies of the web. I encourage anyone with an interest, to sign up to a cheap shared host such as Dreamhost, and use their one-click WordPress offering to set up your playground for learning about the web. The cost of a domain name and self-hosting WordPress need not exceed $9 or £7/month.3

… and back to web hooks

Within about 15 minutes of Tony tweeting about HookPress, I had watched the video, installed the plugin and sent a realtime tweet using web hooks from WordPress.

It’s pretty easy to get to grips with and if a repository of web hook scripts develops, even the non-programmers like me could make greater use of what web hooks offer.

Web hooks are user-defined callbacks over HTTP. They’re intended to, in a sense, “jailbreak” our web applications to become more extensible, customizable, and ultimately more useful. Conceptually, web applications only have a request-based “input” mechanism: web APIs. They lack an event-based output mechanism, and this is the role of web hooks. People talk about Unix pipes for the web, but they forget: pipes are based on standard input and standard output. Feeds are not a sufficient form of output for this, which is partly why Yahoo Pipes was not the game changer some people expected. Instead, we need adoption of a simple, real-time, event-driven mechanism, and web hooks seem to be the answer. Web hooks are bringing a new level of event-based programming to the web.

I think the use of the term ‘jailbreak’ is useful in understanding what HookPress brings to the WordPress ecosystem. WordPress is an application written in PHP and if you wish to develop a plugin or theme for WordPress you are required to use the PHP programming language. No bad thing but the HookPress plugin ‘jailbreaks’ the requirement to work with WordPress in PHP by turning WordPress’ hooks (‘actions’ and ‘filters’) into web hooks.

WordPress actions and filters, are basically inbuilt features that allow developers to ‘hook’ into WordPress with their plugins and themes. Here’s the official definition:

Hooks are provided by WordPress to allow your plugin to ‘hook into’ the rest of WordPress; that is, to call functions in your plugin at specific times, and thereby set your plugin in motion. There are two kinds of hooks:

  1. Actions: Actions are the hooks that the WordPress core launches at specific points during execution, or when specific events occur. Your plugin can specify that one or more of its PHP functions are executed at these points, using the Action API.
  2. Filters: Filters are the hooks that WordPress launches to modify text of various types before adding it to the database or sending it to the browser screen. Your plugin can specify that one or more of its PHP functions is executed to modify specific types of text at these times, using the Filter API.

So, if I understand all this correctly, what HookPress does is turn WordPress hooks into web hooks which post the output of the executed actions or filters to scripts written in other languages such as Python, Perl, Ruby and Javascript (they can be written in PHP, too) hosted elsewhere on the web.   In the example given in the HookPress video, the WordPress output of the action, ‘publish_post‘, along with two variables ‘post_title’ and ‘post_url’, was posted to a script hosted on scriptlets.org,  which performs the event of sending a tweet which includes the title and URL of the WordPress post that has just been published. All this happens as fast as the component parts of the web allows, i.e. in ‘real time’.

In other words, what is happening is that WordPress is posting data to a URL, where lies a script, which takes that data and creates an event which notifies another application. Because the scripts can be hosted elsewhere, on large cloud platforms such as Google’s AppEngine, the burden of processing events can be passed off to somewhere else. I see now, why web hooks are likened to Unix pipes, in that the “output of each process feeds directly as input to the next one” and so on. In the case of HookPress, the output of the ‘publish_post’ hook feeds directly as input to the scriptlet and the output of that feeds directly as input to the Twitter API which outputs to the twitter client.

Besides creating notifications from WordPress actions, the other thing that HookPress does (still with me on this ‘learning journey’ ??? I’ve been reading, writing and revising this blog post for hours now…), is extend the functionality of WordPress through the use of WordPress filters. Remember that filters in WordPress, modify text before sending it to the database and/or displaying it on your computer screen. The example in the video, shows the web hook simply reversing the text before it is rendered on the screen. ‘This is a test’ becomes ‘tset a si sihT’.

The output of the ‘the_content‘ filter has been posted to the web hook, which has reversed the order of the blog post content and returned it back to WordPress which renders the modified blog post.

Whereas the action web hooks are about providing event-driven notifications, the filter web hooks allow developers to extend the functionality of WordPress itself in PHP and other scripting languages.  In both cases, web hooks ‘jailbreak’ WordPress by turning it into a single process in a series of piped processes where web hooks create, modify and distribute data.

Finally, I’ll leave you with this presentation, which is all about web hooks.

In the presentation, there are two quotes which I found useful. One from Wikipedia which kind of summarises what HookPress is doing to WordPress:

“In computer programming, hooking is a technique used to alter or augment the behaviour of [a programme], often without having access to its source code.”

and another from Marc Prensky, which relates back to my point about using WordPress as a way to learn about web technologies in a broader sense. WordPress+HookPress is where programming for WordPress leaves the back room:

As programming becomes more important, it will leave the back room and become a key skill and attribute of our top intellectual and social classes, just as reading and writing did in the past.

  1. I am not ashamed to admit that I’m finding that my career is increasingly influenced by following the observations of Tony Hirst. Some people are so-called ‘thought-leaders’. I am not one of them and that is fine by me. I was talking to Richard Davis about this recently and, in mutual agreement, he quoted Mario Vargas Llosa, who wrote: “There are men whose only mission is to serve as intermediaries to others; one crosses them like bridges, and one goes further.” That’ll do me. []
  2. Note that themes are not necessarily a superficial makeover of a WordPress site. Like plugins, they have access to a rich and extensible set of functions. []
  3. I am thinking of taking the idea of WordPress as a window on web technology further and am tentatively planning on designing such a course with online journalism lecturer, Bernie Russell. It would be a boot camp for professional journalists wanting (needing…?) to understand the web as a public space and we would start with and keep returning to WordPress as a mainstream expression of various web technologies and standards. []

Scholarly publishing with WordPress

Working on the JISCPress project, I’ve been thinking quite a lot about scholarly publishing on the web, and in particular with WordPress. This morning, I read a post over on the ArchivePress blog about some WordPress plugins which are useful additions for creating a scholarly blog and it got me thinking a bit more about what features WordPress would need to support scholarly publishing.

JISCPress does away with the idea that WordPress is a blogging tool, and instead uses WordPress Multi-User as a document publishing platform, where one site or ‘blog’ is a document. The way WPMU is structured means that despite serving multiple (potentially millions) of document sites, the platform remains relatively ‘lightweight’ as each document site generates just a handful of additional database tables, while sharing the same administrative core as a single WordPress install. So, 100 WordPress blogs on WPMU is nothing like the equivalent of running 100 separate WordPress blogs, both from the point of resource requirements and administration. In fact, quite soon, there will be no such thing as WPMU as the two products are going to be merged and because they share 90%+ of the same code already, it’s not too difficult to achieve.1

Anyway, my point here is to discuss whether WordPress can be extended to accommodate most conventions found in scholarly publishing and where it is lacking, to identify the development work required to meet the needs of most academic who wish to write on and publish to the web.2

Scholarly publishing extends to a wide variety of published outputs. As a Content Management System (CMS) and technology development platform, I believe that WordPress has the potential to support any type of scholarly publishing that the web supports. It is extremely extensible, as can be seen from the 6000+ plugins that are available. However, what I’m interested in is what can be done now, by an academic wishing to publish their work through the use of WordPress acting as a CMS. What can be achieved with a few quid3 to self-host WordPress so that a few plugins can be installed and a well structured, typical, scholarly paper can be published.

My Dissertation

For some time, I’ve been meaning to publish my MA dissertation. Back in 2002, I undertook some unique research which has not, to my knowledge, been repeated and I think there is some value in having it easily accessible on the web. I have an OpenOffice file and a PDF and, in the course of a morning, have published it under my own domain. The reason I did not publish it on the university WPMU platform is because I have been experimenting with different plugins and did not want to install plugins that were untested or we may not support long-term.  In this case, I’ve used a single WordPress installation, but ideally an individual researcher, group of researchers or research institution, would run a WPMU installation which allowed multiple documents to be authored individually or collaboratively4 and published directly to the web as XHTML.

BuddyPress, by the way, can make the experience even more natural, not only because it is based around a community of like-minded people writing together  on the same web publishing platform, but also because, with a few tweaks here and there, we can move away from the language of blogs and towards the language of documents.


BuddyPress admin bar

Profile menu

Enough of BuddyPress on WPMU for now and back to my dissertation. I set up the site in ten minutes, without using FTP or a command line because I use a host that provides a one-click install of WordPress and WordPress allows you to search for and install plugins from its Dashboard, rather than having to use FTP. Once the site was installed, I then  made some basic changes to the settings, turning on XML-RPC and AtomPub, so that, if I decided to, I could publish to the site using my Word Processor.5 I didn’t use this in the end, but trust me, it works very well using recent versions of MS Word, Open Office (free) and other blogging clients such as MS Live Writer (free).

So, what are the common characteristics of an academic paper? What does WordPress have to support to provide functionality that meets most scholars’ publishing requirements? I scratched my head (and asked on Twitter) and came up with the following:

  • footnotes/endnotes
  • citations
  • use of LaTeX (sciences)
  • tables
  • images
  • bibliography
  • sub-headings
  • annexes
  • appendices
  • dedication
  • abstract
  • table of contents
  • index to figures
  • introduction
  • exposition
  • conclusion

Many of these are supported in WordPress by default and don’t require any additional plugins (tables, images, sub-headings, annexes, appendices, dedication, abstract, introduction, exposition, conclusion, are all either basic literary conventions or just part of a simply structured document).

For additional support, I installed digress.it, which we have funded through the JISCPress project. This is a WordPress plugin which allows readers to comment on the paragraphs of a document, rather than at the document section level. We’re adding a lot more functionality to meet the objectives of the JISCPress project, but I chose digress.it, principally for the reason that it is designed to turn a WordPress blog into a document site. I could have used any other WordPress theme, but digress.it automatically creates a Table of Contents and allows you to re-order WordPress posts when they are read so that you don’t have to author your document in reverse or adjust the publication dates so the document sections appear in the correct order.

My dissertaion published using digress.it

My dissertation published using digress.it

I added the abstract for my dissertation to the ‘about’ page, so it shows up on the front of the site. I also uploaded a PDF version so that people can download it directly. You’ll see that I also added some links to a related book and DVD, which will certainly appeal to people who are interested in my dissertation. The links pull an image and some basic metadata from Amazon, using the Amazon Machine Tags plugin. This could be used to link to the book in which your article is published and earn you money in click referrals. An alternative, would be the Open Book Book Data plugin, which retrieves a book cover and metadata from Open Library, where your book may already be catalogued. If it’s not on Open Library, catalogue it!

After setting this up, I installed a few more plugins:

Dublin Core for WordPress: Automatically adds ten Dublin Core metadata elements to the document mark up.

wp-footnotes: This allows you to easily add footnotes to your document by enclosing your footnote in double parentheses.6

OAI-ORE Resource Map: Automatically marks up the document sections with a OAI-ORE 1.0 resource map.

Google Analyticator: Adds Google Analytics support so you can collect statistics on the readership of your document.

WP Calais Archive Tagger: Analyses your entire document and automatically keywords each section, using the Open Calais API.

Search API: WordPress comes with search built in, but there is a new search API which will eventually make its way into the WordPress core. I’ve installed the plugin to provide full-text search across the document. It can also add Google Search to your document site.

wp-super-cache: This is simple to install and will significantly speed up your document site, making it a pleasure to navigate through and read :-)

Plugins I didn’t use

wp-latex: Although I didn’t need it for my dissertation, it’s worth noting that WordPress supports the use of [latex]\LaTeX[/latex].

Academic Citation: You need to add a line of code to your theme for this to display. It supports the concept of an article being a single blog post, rather than a ‘document site’ and displays a variety of citation formats for readers to use.

Do you know of any other plugins for a scholarly blog?

The Beauty of Feeds

The other useful thing about managing a document using WordPress and in particular, using digress.it, is that you automatically get RSS/Atom feeds for the document. I’ve already discussed these in detail. It means that I was able to read my document in my feed reader, with footnotes and images displayed correctly.

Document in Google Reader

See how nicely the formatting is preserved. [latex]\LaTeX[/latex] is also rendered correctly in feed readers.

Document formatted nicely in Google Reader

Reading my dissertation in Google Reader

You’ll see that the document sections are listed in order; that is, first section on top. As I noted above, blogs list posts in reverse (most recent first), so I sorted the feed items in Yahoo Pipes and sorted it in ascending order. Yahoo Pipes exports as RSS and it’s that feed that I subscribed to in Google Reader. Wouldn’t it be nice, if I could import my document feed into an Institutional Repository? Wait a minute, I can! :-)

Importing an RSS feed into EPrints

Click to see the item in the repository

Click to see the item in the repository

When importing the default feed, the HTML output is accurate but in reverse order, while the RSS output from Yahoo Pipes didn’t import into EPrints very cleanly at all. I’ll work on this. UPDATE: Forget Yahoo Pipes. WordPress feeds can be sorted with a switch added to the URL: http://example.com/feed/?orderby=post_date&order=ASC

So there it is. An academic paper, published to the web using a modern CMS which supports most authoring and publishing requirements. I would favour an institutional WPMU platform for academics to author directly to, publish their pre-print to the web for open access and detailed comment, and import their RSS feed into the repository. As a proof of concept, I’m quite pleased with this. We are currently developing a widget that can be embedded in a web page or WordPress sidebar and allow a member of staff to upload a document or zipped folder of documents to the Institutional Repository. I wonder if we can also support the import of a feed from the widget, too?

So, what would your requirements be? Tell me and I’ll do my best to test WordPress against them.

  1. Has anyone done a diff on the two code bases to measure exactly what percentage of the code is shared between WP and WPMU? []
  2. Actually, I think I’ll save the discussion of its shortfalls for my next post. This one is already long enough. []
  3. I pay $5/year for my domain name and as many sub-domains as I need. I pay $10/month for my hosting with unlimited storage and bandwidth. []
  4. Like any decent CMS, WordPress supports role-based authoring and editing and maintains a revision history of edits, auto-saved once per minute. Revisions can be compared alongside of each other. []
  5. On a scholarly WPMU installation, plugins could be pre-installed and activated, a default theme selected and settings tweaked so very little work is required by the academic author prior to writing her document. []
  6. I am using the plugin on this blog! []

There is a tension between being relevant and being reputable

…or ‘how to get read on the World-Wide-Web.’

This is a presentation about Search Engine Optimisation (SEO), but it is also about literacy and reputation in the age of the Internet. It is about how to understand and write well for the web so that like-minded people can learn about what you’ve got to say and be compelled to tell others about what you’ve got to say, too.

Although it’s not aimed at scholarly writing, that doesn’t matter. To Google’s crawlers, HTML source code is HTML source code, whether you publish articles about research into HIV or have something pointless to say about the latest gadget. No matter what the content is about there are literary as well as technical observations that can improve your communication and the impact of your writing.

Much of the presentation elaborates on this: “There is a tension between relevance and reputable.” It’s interesting.

PubSubHubbub: Realtime RSS and Atom Feeds

It’s made Dave Winer happy, which is no easy task, so I think PubSubHubbub is worth mentioning here. If it’s working as it should, this post should appear in my Google Reader, almost immediately after I’ve published it. That’s because PubSubHubbub is “a simple, open, server-to-server web-hook-based pubsub (publish/subscribe) protocol as an extension to Atom [and RSS].” My blog feed is managed by FeedBurner which has already implemented the new protocol, as has Google Reader FriendFeed. They should therefore ‘talk’ to each other in realtime. Watch the video and you’ll see how it works. It’s pretty straightforward. It just takes a company the size of Google to push it through to adoption. The engineers say they were using it like Instant Messaging the night before the demo, which says something about how responsive this is. Technically, it should be another challenge to Twitter in that it allows for a distributed method of near realtime communication.  I’d like to see that. I feel like an idiot communicating within the confines of  Twitter, sometimes.

Think you know how to use Google Search?

Following yesterday’s post about Google’s 15 second search tips, I thought it would be pretty easy to pull together and develop an on-going series of short tutorials on how to use Google’s search engine. I was also motivated to do this because, co-incidentally, at the Teaching and Learning Symposium yesterday, I attended an elective where the question was asked by a secondary school teacher, “tell me what I can do to help develop my students’ IT skills for when they attend your university.” One of the answers he got was “teach them how to use Google search.”

A lot of time can be spent by both new students and staff, on achieving a basic level of digital literacy. Google’s search engine is a powerful tool disguised by a very simple interface which many of us don’t use to full effect. New features are being added rapidly, too, so I thought a blog which brought together tutorials made by Google and, in time, made by me, might be a useful resource for both staff and students. It’ll also give me the opportunity to learn more about Google’s search engine, which I’m sure I don’t always use to full effect either.

Think you know how to use Google Search? Google Search Tutorials