2013/08/09

adding the human element

DISCLAIMER 1.0 : This entry is related (everything in my life is related to working with ROC/NetKernel) to but not about working with ROC/NetKernel.

DISCLAIMER 2.0 : This has the makings of a rant. It is not, but if you are easily offended ... you've been warned.

Before I actually start foaming around the mouth, I'd like to set the stage. Back in 1995, I left mainframe development for the prestigious job of Database Administrator (how and why I left is a story in itself, but this is not the forum for that). IDMS. On mainframe.

Now, while being a commercial product (from Computer Associates back then, no idea who runs that show nowadays) an IDMS database was actually pretty simple (technically speaking) and to this very moment I can (in any programming language you care to mention) build a working single user version of it in less than a day.

One day I arrive at my desk to find a dusty tape. A note attached to it says : Tom, this is a database dump from five years ago. We can't restore it because it's a version of the database software we no longer have. However, there's some data on it that the Big Man has asked for. Can you get it ? P.S. Back then we applied encryption on the specific records you need (very sensitive information as you can imagine), we've attached a printout of the assembler routine we used for the encryption.

I love shit like that. Don't you ? That's the stuff that makes IT interesting. Finding a - still working - tape drive (everything was being moved to disks, I'm not that old) that could read the tape was a challenge in itself. Getting the files on disk another. Then reading up on the specifications of that old database software. Verifying what I read about it on the files that I had. Writing a database simulator that could read the files and get the records. Then learning about the assembler and writing a decryption routine. Took me week all in all, but I delivered the goods to the Big Man himself.

Why am I telling this old war story ? 
  • To show that I don't mind challenges. Quite the contrary.
  • To show that I've done my bit of data cleaning (literally in some cases), massaging it into a - machine or human - readable form.

Nowadays I work a lot with/around Linked Data. And I really like the ideas behind it, where we build together towards a web of all things connected taking the combined knowledge of the human race to the next level. Without a sliver of sarcasm I applaud and support that !

Note that I highlighted together though. For I find that that bit is missing today. Linked Data today is the realm of a class of - quite aloof - priests. They talk in coded language which is often so obscure they themselves require quite a bit of decoding to get it right (if they agree on the meaning at all, which is funny, because that is exactly one of the things about linked data, the meaning has to be clear).

Now, most of the electronic data out there is legacy (in this case meaning non-linked-data). It will be so for quite a while to come because the growth of electronic data is going ever faster and only a relatively small portion of it is in a linked data form.

This means that there's going to be a lot of grunt work, a lot of work for simple people like myself, to massage such data into a linked data form. That in turn means we need to understand. Linked Data is not just the realm of the Eloi that let the machines sort out the rest, there are quite a lot of Morlocks here that work hard to keep the machines running you know !

So, I read there's four rules and five stars. Excellent. Here are my extras (instead of adding them you can of course also subtract stars if that works better for you) :
  • The way you structure data is as simple as possible. Occam's razor does apply -> one additional star
  • When you make your structure publicly available, you document it so an average human being can understand it -> one additional star
  • When you make your structure publicly available, you provide examples, examples and some more examples ... and you keep those up to date with the latest version of your structure -> one additional star
  • When you make your structure publicly available, you link (that's what it is about after all, isn't it), to (valid) alternatives, explaining why you chose to have your own structure -> one additional star
I would - personally - go so far as to say I'd rather not have your data and structure publicly available (which - if you've been subtracting stars - takes away the last star) otherwise, for the only thing that you are adding is complexity.