2013/08/26

poink revisited

I didn't get it. No, really, I had seen the poink module in the apposite repository several times already, but I had dismissed it (forgive me, Thomas Hicks) as a variation on Ping Pong or something.



As you can read in some of my other recent posts, the business of massaging data into other formats (a process we call transreption in Resource Oriented Computing) is booming. A lot of data today lives in spreadsheets. Or lets call it by the brand, a lot of data today lives in excel sheets.

Natively that format is not available in NetKernel, but as my post of last week showed, that can be quickly mended. So I looked around for a likely candidate library and I found one, Apache POI.

I didn't see it. No, really, so I did a bit of studying of the library myself and came up with the stuff I needed for the task at hand. Next project comes round and again excel sheets are involved, so I smile happily, until my lead says ... right, I've tried PoiNK on it, but it doesn't quite seem to do the job, can you have a look at that. PoiNK ? Apache POI for NetKernel ! Oops.

Sorry about that Thomas. You're stuff is very good. It needed a bit of a refresh (work on it was done in 2008, the Apache POI library has evolved a bit since then), so I started on that. The result (representations and transreptors ready) is work in progress and can be found here.

Note the following things :
  • Although I renamed things in my version, Thomas Hicks still retains the original rights to the code. At this time (2013/08/26) it is still his module in the Apposite repository and this is just an alternative that you can use if you run into a problem with the version in Apposite. Regardless of license (Apache 2.0 in this case), I always render onto Caesar what is Caesar's.
  • I'm only provided sources. Watch this space for a couple of posts on building NetKernel modules soon.
  • The Apache POI library has to be added in the lib-subdirectory. I tested it with just the poi-3.9-20131203.jar file and that works fine.
Enjoy ! 

2013/08/16

belgian invention

It is rare in IT, but on occasion things do not have to be difficult or obscure :

String aMessage = aContext.source("arg:message",String.class);

ByteArrayOutputStream vBAOS = new ByteArrayOutputStream();
Document vMessageDocument = new Document();
PdfWriter.getInstance(vMessageDocument, vBAOS); vMessageDocument.setPageSize(PageSize.A8.rotate());
 

vMessageDocument.open();
      
Font vFont = new Font(FontFamily.COURIER, 14, Font.BOLD, BaseColor.BLACK);
vMessageDocument.add(new Paragraph(aMessage, vFont));
 

vMessageDocument.close();
      
ByteArrayRepresentation vBAR = new ByteArrayRepresentation(vBAOS);
      
INKFResponse vResponse = aContext.createResponseFrom(vBAR);
vResponse.setMimeType("application/pdf");


The relevant imports (in case you're conditioned to see Document as org.w3c.dom.Document) are :

import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Document;
import com.itextpdf.text.Font;
import com.itextpdf.text.Font.FontFamily;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
 

 
iText ® is a Belgian invention, check it out ! NetKernel ® is a British invention, check it out ! Integration of the former in the latter was quick and painless.

Have a great weekend !

2013/08/09

adding the human element

DISCLAIMER 1.0 : This entry is related (everything in my life is related to working with ROC/NetKernel) to but not about working with ROC/NetKernel.

DISCLAIMER 2.0 : This has the makings of a rant. It is not, but if you are easily offended ... you've been warned.

Before I actually start foaming around the mouth, I'd like to set the stage. Back in 1995, I left mainframe development for the prestigious job of Database Administrator (how and why I left is a story in itself, but this is not the forum for that). IDMS. On mainframe.

Now, while being a commercial product (from Computer Associates back then, no idea who runs that show nowadays) an IDMS database was actually pretty simple (technically speaking) and to this very moment I can (in any programming language you care to mention) build a working single user version of it in less than a day.

One day I arrive at my desk to find a dusty tape. A note attached to it says : Tom, this is a database dump from five years ago. We can't restore it because it's a version of the database software we no longer have. However, there's some data on it that the Big Man has asked for. Can you get it ? P.S. Back then we applied encryption on the specific records you need (very sensitive information as you can imagine), we've attached a printout of the assembler routine we used for the encryption.

I love shit like that. Don't you ? That's the stuff that makes IT interesting. Finding a - still working - tape drive (everything was being moved to disks, I'm not that old) that could read the tape was a challenge in itself. Getting the files on disk another. Then reading up on the specifications of that old database software. Verifying what I read about it on the files that I had. Writing a database simulator that could read the files and get the records. Then learning about the assembler and writing a decryption routine. Took me week all in all, but I delivered the goods to the Big Man himself.

Why am I telling this old war story ? 
  • To show that I don't mind challenges. Quite the contrary.
  • To show that I've done my bit of data cleaning (literally in some cases), massaging it into a - machine or human - readable form.

Nowadays I work a lot with/around Linked Data. And I really like the ideas behind it, where we build together towards a web of all things connected taking the combined knowledge of the human race to the next level. Without a sliver of sarcasm I applaud and support that !

Note that I highlighted together though. For I find that that bit is missing today. Linked Data today is the realm of a class of - quite aloof - priests. They talk in coded language which is often so obscure they themselves require quite a bit of decoding to get it right (if they agree on the meaning at all, which is funny, because that is exactly one of the things about linked data, the meaning has to be clear).

Now, most of the electronic data out there is legacy (in this case meaning non-linked-data). It will be so for quite a while to come because the growth of electronic data is going ever faster and only a relatively small portion of it is in a linked data form.

This means that there's going to be a lot of grunt work, a lot of work for simple people like myself, to massage such data into a linked data form. That in turn means we need to understand. Linked Data is not just the realm of the Eloi that let the machines sort out the rest, there are quite a lot of Morlocks here that work hard to keep the machines running you know !

So, I read there's four rules and five stars. Excellent. Here are my extras (instead of adding them you can of course also subtract stars if that works better for you) :
  • The way you structure data is as simple as possible. Occam's razor does apply -> one additional star
  • When you make your structure publicly available, you document it so an average human being can understand it -> one additional star
  • When you make your structure publicly available, you provide examples, examples and some more examples ... and you keep those up to date with the latest version of your structure -> one additional star
  • When you make your structure publicly available, you link (that's what it is about after all, isn't it), to (valid) alternatives, explaining why you chose to have your own structure -> one additional star
I would - personally - go so far as to say I'd rather not have your data and structure publicly available (which - if you've been subtracting stars - takes away the last star) otherwise, for the only thing that you are adding is complexity.

2013/08/02

conventions are good

It has been a while since I last blogged, so ... hello and nice to see you again ! Yesterday I was allowed (and very privileged) to help Paul Hermans build a Linked Data Server for the Dutch government.

There's a more than natural match between NetKernel and Linked Data. Data transformations (aka transreptions) are bread-and-butter, the RESTOverlay makes mapping of the internet facing URLs pretty easy and dynamically building SPARQL queries is a breeze with a tool like the Text Recursion Language.

Now, I don't know if Paul noticed, but about ten minutes into our day, this error started popping up (in the console and cron log) : 
HDS Space Aggregation Failure, response to [res:/etc/system/CronConfig.xml] of type [java.lang.String] from [urn:com:proxml:stelselvanbasisregistraties:server] not transreptable to HDS

Strange because there's no res:/etc/system/CronConfig.xml resource in the module. I let it rest as it didn't impact the actual functionality and I needed all my brain capacity to understand all the Linked Data requirements.

This morning however I did the actual test and used the Request Resolution Trace Tool to see what happened :


Oops ... it does resolve.  Ok, what is going on here ?
  • Dynamic Imports. A powerful capability of NetKernel. By adding a certain resource to your module (for example res:/etc/system/CronConfig.xml or res/etc/system/SimpleDynamicImportHook.xml), a relevant space of your module gets imported elsewhere. Of course, linked to this is a mechanism of finding these resources, which is called Space Aggregation.
  • A wide RESTOverlay. Since the Linked Data server was to serve a whole domain, I used <basepath>/</basepath>.
  • <auto404/>, another feature of the RESTOverlay, automagically returning a 404 resource if a request into the RESTOverlay fails.
While I had covered all bases for incoming requests through the Frontend Fulcrum, I hadn't even thought about internal requests that might get through to the - widely open - RESTOverlay.
 
The solution is very simple, add a Limiter endpoint before the RESTOverlay that catches all unwanted internal requests : 

        <endpoint>
            <prototype>Limiter</prototype>
            <grammar>res:/etc/system/
                <regex type="anything"/>
            </grammar>
        </endpoint>


And this is where a convention actually comes to my rescue. In theory, there's no way of knowing what internal requests are going to try and spam my RESTOverlay. However, convention states that Dynamic Imports and such work with res:/etc/system/.* resources.


It going to be the hottest day of the year here today (35+ Celcius), so I'm off to a place with some shade now. Enjoy your weekend !


P.S. No, haven't forgotten about the CRUD server and all. Working on it, watch this space !