2014/05/28

back to the beginning ... xml recursion language

ROC/NetKernel was originally thought out and developed with XML in mind. Note that it was never (not then, not today) bound to this data exchange format, everything is after all a resource and another format is just a transreption away, but at the time it was the prevailing format. Today JSON is, tomorrow we might lose confidence in braces, ... actually that whole discussion is moot.

Transreption : Lossless transformation of one representation format to another representation format.

If you're a fanboy ... yes, I dropped the word isomorphic from the above definition, that word may mean something to you, it means I like using difficult words to me.

It'll come as no surprise then that there are quite a few NetKernel batteries for slicing and dicing XML.

Battery : Normally some sort of electrochemical cell, here it means an addition that makes usage of a given thing easier, in this case NetKernel.

The single most powerful of these is the XML Recursion Language (XRL). In order to discuss what it does, here's a small XML snippet (that could be the footer of an HTML page) :
<div id="footer">
    <p>All rights reserved. © <span>2013</span> Elephant Bird Consulting BVBA</p>
</div>


No, my calendar is not behind. This snippet is a (file) resource that I use over and over again as footer for my webpages. Only, I have to manually update it every year, on every server where I use it. Tedious work and I quite often forget to change it here or there.

Here's the same small XML snippet that solves my problem using XRL :
<div xmlns:xrl="http://netkernel.org/xrl" id="footer">
    <p>All rights reserved. © <span xrl:eval="text">active:widgetCurrentYear</span>
Elephant Bird Consulting BVBA</p>
</div>


Now, when I use this template in an active:xrl2 request, it in turn requests active:widgetCurrentYear which is a small component that returns the current year.

That's cool, but it gets even better. Consider this template :
<html xmlns:xrl="http://netkernel.org/xrl">
    <xrl:include identifier="res:/elbeesee/demo/xrl/header"/>
    <xrl:include identifier="res:/elbeesee/demo/xrl/body"/>
    <xrl:include identifier="res:/elbeesee/demo/xrl/footer"/>
</html>

Do you see ? When we request active:xrl2 with this template, it will request (and include) the specified resources. Our footer snippet could be the last one. And this is where the recursion comes in. Automagically it will then request active:widgetCurrentYear. And so further, as deep as you care to go !

By the way, it's active:xrl2 because NetKernel 3 contained a quite different version of the tool which is kept available for legacy reasons.

If you want the example, the basic bits (just for the footer, you can no doubt add the complete page with header and body yourself) can be found in my public github archive, you'll need the following modules :
  • urn.com.elbeesee.tool.widget
  • urn.com.elbeesee.demo.xrl
Enjoy !

2014/05/14

lifesaver for batch processing

It's been a while since the last post, I've been quite occupied with KBOData, more information on that will follow soon (here and on http://practical-linkeddata.blogspot.com). Today a short tip.

Batch processing. At some point in my IT career it occupied all of my time. Batch processing on mainframe (using PL/I and IDMS) to be exact. The performance we got back then is unmatched by anything I've seen since, you just can't beat the big iron as far as batch processing goes.

Standard ROC processing isn't optimized for batch processing. Look at it this way ... say you request the resource that is your batch process then out-of-the-box NetKernel keeps track of every dependency that makes up that resource. In a batch process this can pretty quickly turn nasty on memory usage. And think about it, rarely do you want the result of a batch process to be cached.

It is possible to do very efficient batch processing with ROC though. You can fan out requests asynchronously for example. More on that another time. For now, here's the lifesaver I got from Tony Butterfield yesterday. Not only did it shorten execution time massively, it also kept memory usage down (to next to nothing) :
<request>
  <identifier>active:csv2nt+filein@data:text/plain,file:/var/tmp/input/address.csv+fileout@data:text/plain,file:/var/tmp/output/address.ttl+template@data:text/plain,address</identifier>
  <header name="forget-dependencies">
    <literal type="boolean">true</literal>
  </header>
</request>


What this exact request does is not so important (it converts a huge csv file into rdf syntax using a freemarker template on each line), what is important is the header. For that header makes the difference between "batch sucks on NetKernel" and "batch roc(k)s on NetKernel". Thanks Tony !