2017/05/22

collusion - groundwork

Some of you may already have heard that I've recently joined Neo Technology. Actually nobody knows the company with that name and you'd be even more surprised to learn the name of the Swedish mother-company (I definitely was) ... but if I tell you their flagship product is Neo4j ... bells might start ringing.

I am not going to start a Practical Neo4j blog. That's completely unnecessary, half the company has such a blog and you can find excellent ones here, here and here (I randomly picked three, I could easily have picked twenty). No lack of brains and no lack of creative writers.

Much more interesting is writing about the collusion between the two worlds, the collusion of resource oriented computing and graph databases.

Admit it, you expected a Trump-related image for "collusion", didn't you ?
This is actually not the first time I'm writing about Neo4j in this blog. I already did that in 2012 and that was definitely a talking point when going through the application process. More recently I've explored the use of NetKernel for the publishing of RDF-data. There is a NetKernel framework for that and several implementations can be found on Github.

So, what did I do next to going through the basics of graph theory (check out the Youtube videos of Dr. Sarada Herke and forget what a pile of crap Youtube usually is) and tons of technical Neo4j documents in the past couple of weeks ? Well, I started on another NetKernel framework obviously. The idea is that I want a second implementation of KBOData, this time using Neo4j. As an extra goal I want the framework to be so dynamic that I can actually point it to any dataset (loaded into Neo4j).

A bit of groundwork is required (and everything can be found on Github). The urn.com.ebc.tool.system module now contains an EnvironmentAccessor which allows you to turn an operating system variable into a resource. This is actually something that was developed for the Flemish government Milieuinfo site (I'll present that when it is moved to production but it is very much like KBOData or Stelselcatalogus).

The EnvironmentAccessor is extensively used in the urn.com.ebc.neo4j.database module (which defines the neo4j:databaseurl, neo4j:databaseuser and neo4j:databasepassword resources). We no longer require an application-specific module, the operating system environment determines where we will point our requests.

Next up is the urn.org.neo4j.driver module which repackages the Neo4j Java driver. As per usual you will not find the actual jar-file in the module. You can find it here and you need to drop it in the lib-subdirectory of the module.

Based on the driver the urn.com.ebc.neo4j.client module can issue a Cypher request to a Neo4j database-server (I'm not working embedded this time round) and return the result. Currently a bare-bones-work-in-progress RowsAccessor is available. More work is to be done here.

A urn.com.ebc.neo4j.fulcrum module has been created to provide the HTTP server for the framework. A fixed port (8500) has been set for the server but as you know that can be overridden on the commandline so it does not violate our everything should be dynamic principle.

Last but not least I've started work on the urn.com.ebc.neo4j.server module. Note once more that this is not an application-specific module, the idea is that it can and should be completely agnostic of the underlying database. For starters it already has the capability to determine all possible node-labels in the database and each and every node can already be queried as a resource (and is served in the HTTP server)) : res:/node/<label>/<id>.

Not a bad start if I say so myself and there's more to come in the next weeks, so stay tuned !


P.S. I do realize that having a neo4j:databasepassword resource is not actually the most secure option. I'm working on a better solution that a) does not violate the everything should be dynamic principle and b) does not require Kerberos or LDAP or anything else that is only included in the Neo4j Enterprise Edition.

P.P.S. As goes for an SQL-endpoint and a SPARQL-endpoint, a Cypher-endpoint is only as good as the query you launch through it. If you want to bring the database down, you can. There is still a need for a more general solution for this. I wonder how the fragment-idea is coming along ...

P.P.P.S If you're interested in seeing my first show-and-dance for Neo4j, you're most welcome on the Amsterdam GraphDay on June 6th. I'm doing the Introductory Training Session in the afternoon.