Basile Simon (blog) Data, hackery, stories

A quick word about csv,conf,v2

By the way, I also went earlier this month to Berlin to speak at csv,conf,v2.

This all things data conference was one of the finest I’ve seen, and you should probably consider attending next year. The organisers did a great job at keeping the event together, in a very German way: everything you needed was there, the talks and keynotes started and ended on time.

I proposed a talk entitled “Hackers trying to stay relevant: linked data and structured journalism at the BBC.” Surely I deserved a prize for the longest talk title.

There will be a video of my ramblings online soon, I understand.

But in the meantime, I uploaded rather cryptic slides here, if you want to have a look.

The purpose of the talk was to give a brief overview of what has been going on in online journalism and publishing when it comes to data and the web. The BBC has been experimenting with linked data for a while now, under the influence of visionary and talented data architects.

But today we’re contemplating much more massive changes, that fall under the umbrella of structured journalism.

We are starting to realise that we may be wasting knowledge by publishing hundreds of articles that are, from a data architecture point of view, unstructured datasets. Editors who read these lines are by now wondering what I’m on about here, since articles do have narrative and stylistic structures.

But I’m talking about the data here. Sentences such as “The new PM, David Cameron, surprisingly won a majority in the Commons” are stupidly hard to parse so a program could retain its semantic information (which is that the office of Prime Minister is now occupied by David Cameron MP).

There’s enormous value in retaining this knowledge, this collective creation of data by a whole newsroom. There are a lot of things that could be done with this information.

Anyway, enough chit chat. More on this later. Probably.