Packaging, pid-files and systemd

When I first built my ArangoDB package one of the problems I had was getting ArangoDB to start after a reboot. While reworking it for Arango 3.0 I ran into this again.
The reason this can be tricky is that ArangoDB, like basically all forking processes needs to write a pid file somewhere. Where things get confusing is that that anything you create in /var/run will be gone next time you reboot leading to errors like this:

-- Unit arangodb.service has begun starting up.
Aug 24 08:50:27 longshot arangod[10366]: {startup} starting up in daemon mode
Aug 24 08:50:27 longshot arangod[10366]: cannot write pid-file '/var/run/arangodb3/arangod.pid'
Aug 24 08:50:27 longshot systemd[1]: arangodb.service: Control process exited, code=exited status=1
Aug 24 08:50:27 longshot systemd[1]: Failed to start ArangoDB.
-- Subject: Unit arangodb.service has failed

If you DuckDuckGo it you can see that people stumble into this pretty regularly.

To understand what’s going on here it’s important to know about what /var/run is actually for.

The Filesystem Hierarchy Standard describes it as a folder for “run-time variable data” and lays out some rules for the folder:

This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process. Programs may have a subdirectory of /var/run; this is encouraged for programs that use more than one run-time file. Process identifier (PID) files, which were originally placed in /etc , must be placed in /var/run . The naming convention for PID files is .pid. For example, the crond PID file is named /var/run/crond.pid.

Since those words were written in 2004, evolving init systems, variations across distributions and the idea of storing pid-files (which shouldn’t survive reboot) with logs and stuff (which should) have all conspired to push for the creation of a standard place to put ephemeral data: /run.

Here in 2016, /run is a done deal, and for backwards compatibility, /var/run is now simply a simlink to /run:

mike@longshot ~/⭐  ls -l /var/
total 52
...
lrwxrwxrwx  1 root root     11 Sep 30  2015 lock -> ../run/lock
lrwxrwxrwx  1 root root      6 Sep 30  2015 run -> ../run
...

Looking back at our cannot write pid-file '/var/run/arangodb3/arangod.pid' error, two things are clear: we should probably stop using /var/run and somehow our /run/arangodb3 directory needs to exist before our systemd unit file is run.

As it happens, systemd has a subproject that deals with this: tmpfiles.d.

The well-named tmpfiles.d creates tmpfiles in /run and /tmp (and a few others). It does this by reading conf files written in a simple configuration format out of certain folders. A quick demo:

mike@longshot ~⭐  sudo bash -c "echo 'd /run/foo 0755 mike users -' > /usr/lib/tmpfiles.d/foo.conf"
mike@longshot ~⭐  sudo systemd-tmpfiles --create foo.conf
mike@longshot ~⭐  ls -l /run
...
drwxr-xr-x  2 mike     users     40 Aug 24 14:18 foo
d
...

While we specified an individual conf file by name running systemd-tmpfiles --create would create the files for all the conf files that exist in /usr/lib/tmpfiles.d/.

mike@longshot ~⭐  ls -l /usr/lib/tmpfiles.d/
total 104
-rw-r--r-- 1 root root   30 Jul  5 10:35 apache.conf
-rw-r--r-- 1 root root   78 May  8 16:35 colord.conf
-rw-r--r-- 1 root root  574 Jul 25 17:10 etc.conf
-rw-r--r-- 1 root root  595 Aug 11 08:04 gvfsd-fuse-tmpfiles.conf
-rw-r--r-- 1 root root  362 Jul 25 17:10 home.conf
...

Tying all this together is a systemd service that runs just before sysinit.target that uses that exact command to create all the tmpfiles:

mike@longshot ~/⭐  systemctl cat systemd-tmpfiles-setup.service
# /usr/lib/systemd/system/systemd-tmpfiles-setup.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Create Volatile Files and Directories
Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
DefaultDependencies=no
Conflicts=shutdown.target
After=local-fs.target systemd-sysusers.service
Before=sysinit.target shutdown.target
RefuseManualStop=yes

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev

If your unit file includes After=sysinit.target you know that tmpfiles you specified will exist when your unit file is run.

Knowing that this plumbing is in place, your package should include a conf file which gets installed into /usr/lib/tmpfiles.d/. Here is mine for ArangoDB:

mike@longshot ~/projects/arangodb_pkg (master)⭐  cat arangodb-tmpfile.conf 
d /run/arangodb3 0755 arangodb arangodb -

While this will ensure that tmpfiles are created next time the computer boots, we also need to make sure the service can be started right now. On Arch that means having a post_install hook that looks like this:

post_install() {
  systemd-tmpfiles --create arangodb.conf
}

If you are running systemd, and you probably are, this is the way to go. While it’s not hard to find people using mkdir in their unit file’s ExecStartPre section (been there, done that) or writing some sort of startup script, this is much cleaner. Make use of the infrastructure that is there.

D3 and React 3 ways

D3 and React are two of the most popular libraries out there and a fair bit has been written about using them together.
The reason this has been worth writing about is the potential for conflict between them. With D3 adding and removing DOM elements to represent data and React tracking and diffing of DOM elements, either library could end up with elements being deleted out from under it or operations returning unexpected elements (their apparent approach when finding such an element is “kill it with fire“).

One way of avoiding this situation is simply telling a React component not to update it’s children via shouldComponentUpdate(){ return false }. While effective, having React manage all the DOM except for some designated area doesn’t feel like the cleanest solution. A little digging shows that there are some better options out there.

To explore these, I’ve taken D3 creator Mike Bostock’s letter frequency bar chart example and used it as the example for all three cases. I’ve updated it to ES6, D3 version 4 and implemented it as a React component.

letter_frequency
Mike Bostock’s letter frequency chart

Option 1: Use Canvas

One nice option is to use HTML5’s canvas element. Draw what you need and let React render the one element into the DOM. Mike Bostock has an example of the letter frequency chart done with canvas. His code can be transplanted into React without much fuss.

class CanvasChart extends React.Component {

  componentDidMount() {
    //All Mike's code
  }

  render() {
    return <canvas width={this.props.width} height={this.props.height} ref={(el) => { this.canvas = el }} />
  }
}

I’ve created a working demo of the code on Plunkr.
The canvas approach is something to consider if you are drawing or animating a large amount of data. Speed is also in it’s favour, but React probably narrows the speed gap a bit.

A single element is produced since the charts are drawn with Javascript no other elements need be created or destroyed, avoiding the conflict with React entirely.

Option 2: Use react-faux-dom

Oliver Caldwell’s react-faux-dom project creates a Javascript object that passes for a DOM element. D3 can do it’s DOM operations on that and when it’s done you just call toReact() to return React elements. Updating Mike Bostock’s original bar chart demo gives us this:

import React from 'react'
import ReactFauxDOM from 'react-faux-dom'
import d3 from 'd3'

class SVGChart extends React.Component {

  render() {
    let data = this.props.data

    let margin = {top: 20, right: 20, bottom: 30, left: 40},
      width = this.props.width - margin.left - margin.right,
      height = this.props.height - margin.top - margin.bottom;

    let x = d3.scaleBand()
      .rangeRound([0, width])

    let y = d3.scaleLinear()
      .range([height, 0])

    let xAxis = d3.axisBottom()
      .scale(x)

    let yAxis = d3.axisLeft()
      .scale(y)
      .ticks(10, "%");

    //Create the element
    const div = new ReactFauxDOM.Element('div')
    
    //Pass it to d3.select and proceed as normal
    let svg = d3.select(div).append("svg")
      .attr("width", width + margin.left + margin.right)
      .attr("height", height + margin.top + margin.bottom)
      .append("g")
      .attr("transform", `translate(${margin.left},${margin.top})`);

      x.domain(data.map((d) => d.letter));
      y.domain([0, d3.max(data, (d) => d.frequency)]);

    svg.append("g")
      .attr("class", "x axis")
      .attr("transform", `translate(0,${height})`)
      .call(xAxis);

    svg.append("g")
      .attr("class", "y axis")
      .call(yAxis)
      .append("text")
      .attr("transform", "rotate(-90)")
      .attr("y", 6)
      .attr("dy", ".71em")
      .style("text-anchor", "end")
      .text("Frequency");

    svg.selectAll(".bar")
      .data(data)
      .enter().append("rect")
      .attr("class", "bar")
      .attr("x", (d) => x(d.letter))
      .attr("width", 20)
      .attr("y", (d) => y(d.frequency))
      .attr("height", (d) => {return height - y(d.frequency)});

    //DOM manipulations done, convert to React
    return div.toReact()
  }

}

This approach has a number of advantages, and as Oliver points out, one of the big ones is being able to use this with Server Side Rendering. Another bonus is that existing D3 visualizations hardly need to be modified at all to get them working with React. If you look back at the original bar chart example, you can see that it’s basically the same code.

Option 3: D3 for math, React for DOM

The final option is a full embrace of React, both the idea of components and it’s dominion over the DOM. In this scenario D3 is used strictly for it’s math and formatting functions. Colin Megill put this nicely stating “D3’s core contribution is not its DOM model but the math it brings to the client”.

I’ve re-implemented the letter frequency chart following this approach. D3 is only used to do a few calculations and format numbers. No DOM operations at all. Creating the SVG elements is all done with React by iterating over the data and the arrays generated by D3.

Screenshot from 2016-06-02 09-24-34
My pure React re-implementation of Mike Bostock’s letter frequency bar chart. D3 for math, React for DOM. No cheating.

What I learned from doing this, is that D3 does a lot of work for you, especially when generating axes. You can see in the code there is a fair number of “magic values”, a little +5 here or a -4 there to get everything aligned right. Probably all that stuff can be cleaned up into props like “margin” or “padding”, but it’ll take a few more iterations (and possibly actual reuse of these components) to get that stuff all cleaned up. D3 has already got that stuff figured out.

This approach is a lot of work in the short term, but has some real benefits. First, I like this approach for it’s consistency with the React way of doing things. Second, long term, after good boundaries between components are established you can really see lots of possibilities for reuse. The modular nature of D3 version 4 probably also means this approach will lead to some reduced file sizes since you can be very selective about what functions you include.
If you can see yourself doing a lot of D3 and React in the future, the price paid for this purity would be worth it.

Where to go from here

It’s probably worth pointing out that D3 isn’t a charting library, it’s a generic data visualisation framework. So while the examples above might be useful for showing how to integrate D3 and React, they aren’t trying to suggest that this is a great use of D3 (though it’s not an unreasonable use either). If all you need is a bar chart there are libraries like Chart.js and react-chartjs aimed directly at that.

In my particular case I had and existing D3 visualization, and react-faux-dom was the option I used. It’s a perfect balance between purity and pragmatism and probably the right choice for most cases.

Hopefully this will save people some digging.

Graph migrations

One of the things that is not obvious at first glance is how “broad” ArangoDB is. By combining the flexibility of the document model with the joins of the graph model, ArangoDB has become a viable replacement for Postgres or MySQL, which is exactly how I have been using it; for the last two years it’s been my only database.

One of the things that falls out of that type of usage is a need to actually change the structure of your graph. In a graph, structure comes from the edges that connect your vertices. Since both the vertices and edges are just documents, that structure is actually just more data. Changing your graph structure essentially means refactoring your data.

There are definitely patterns that appear in that refactoring, and over the last little while I have been playing with putting the obvious ones into a library called graph_migrations. This is work in progress but there are some useful functions working already and could use some proper documentation.

eagerDelete

One of the first of these is what I have called eagerDelete. If you were wanting to delete Bob from graph below, Charlie and Dave would be orphaned.

Screenshot from 2016-04-06 10-55-54

Deleting Bob with eagerDelete means that Bob is deleted as well as any neighbors whose only neighbor is Bob.

gm = new GraphMigration("test") //use the database named "test"
gm.eagerDelete({name: "Bob"}, "knows_graph")

alice_eve

mergeVertices

Occasionally you will end up with duplicate vertices, which should be merged together. Below you can see we have an extra Charlie vertex.

extra_charlie

gm = new GraphMigration("test")
gm.mergeVertices({name: "CHARLIE"},{name: "Charlie"}, "knows_graph")

merged_charlie

attributeToVertex

One of the other common transformations is needing to make a vertex out of attribute. This process of “promoting” something to be a vertex is sometimes called reifying. Lets say Eve and Charlie are programmers.

knows_graph

Lets add an attribute called job to both Eve and Charlie identifying them as programmers:

adding_job_attr

But lets say that we decide that it makes more sense for job: "programmer" to be a vertex on it’s own (we want to reify it). We can use the attributeToVertex function for that, but because Arango allows us to split our edge collections and it’s good practice to do that, lets add a new edge collection to our “knows_graph” to store the edges that will be created when we reify this attribute.

adding_works_as

With that we can run attributeToVertex, telling it the attribute(s) to look for, the graph (knows_graph) to search and the collection to save the edges in (works_as).

gm = new GraphMigration("test")
gm.attributeToVertex({job: "programmer"}, "knows_graph", "works_as", {direction: "inbound"})

The result is this:

after_attrTo_vertex

vertexToAttribute

Another common transformation is exactly the reverse of what we just did; folding the job: "programmer" vertex into the vertices connected to it.

gm = new GraphMigration("test")
gm.vertexToAttribute({job: "programmer"}, "knows_graph", {direction: "inbound"})

That code puts us right back to where we started, with Eve and Charlie both having a job: "programmer" attribute.

knows_graph

redirectEdges

There are times when things are just not connected the way you want. Lets say in our knows_graph we want all the inbound edges pointing at Bob to point instead to Charlie.
knows_graph

We can use redirectEdges to do exactly that.

gm = new GraphMigration("test")
gm.redirectEdges({_id: "persons/bob"}, {_id: "persons/charlie"}, "knows_graph", {direction: "inbound"})

And now Eve and Alice know Charlie.

redirected_edges

Where to go from here.

As the name “graph migrations” suggests the thinking behind this was to create something similar to the Active Record Migrations library from Ruby on Rails but for graphs.

As more and more of this goes from idea to code and I get a chance to play with it, I’m less sure that a direct copy of Migrations makes sense. Graphs are actually pretty fine-grained data in the end and maybe something more interactive makes sense. It could be that this makes more sense as a Foxx app or perhaps part of Arangojs or ArangoDB’s admin interface. It feels a little to early to tell.

Beyond providing a little documentation the hope here is make this a little more visible to people who are thinking along the same lines and might be interested in contributing.

Back up your data, give it a try and tell me what you think.

That stonewalling thing

There is a meme in the current crypto “debate” that makes me cringe whenever I read it: the idea of “stonewalling”. It’s come up in the Apple vs FBI case as the ForbesLA Times, Jacobin magazine and others all described Apple as “stonewalling” the FBI.

Wired’s recent Whatsapp story mentioned that “WhatsApp is, in practice, stonewalling the federal government” and while Foreign Policy magazine avoided the word, they captured the essence when they described Whatsapp as a “a service willing to adopt technological solutions to prevent compliance with many types of court orders”.

All of these articles make it sound like Apple/Whatsapp has the data, but it unwilling to give it to the government.

�
  !��s�����|Ǧ�2}|q�h�J�,�^��=&/
                                    _,e�r%����/D@�1f��"�
                                                                ]�?c�,��y�l?��3�lF�'���ǘ��IA��O�Y�i�����ё�R��`�[�]�H���P�1'��������S����~tF\�������^��f@��<P�g�	!X���6eh�U�rN���=d@܉eQe���B�lk����\ҠcE��
�$�d&���_xor�s�-���l,v���44�E����n�[���1YL�o�ޜ�g�m�����Tx�f	܁�����å+e�LR�E1���ޅx
                                                                                              �a*�Զ\l�ϫ)4&���or�-�4���C���q��|-2[͘7 ��
��0�ǹ����+�5b!�wV����������3\n�꨻�R�,Ĝ�

\F����P�IJ<Ը$�`Q/���D�w��̣���v"|��z�g/I��@!�(�z������]ɹ3}+f1�
                                                                  ju��vw�y~#7�w��K������M\g�.uW�i
                                                                                                    TYc���I@�s�;�/��
                                                                                                                        �����s�c�ݮ���C�
                                                                                                                                         �6~�e

Blobs of encrypted text like the one above are useless for anyone put the holder of the decryption key. Where the company holds the decryption key and refused to give it up, it seems reasonable to call that “stonewalling”.

Without the decryption key, you may be in possession of such a blob but you can’t meaningfully be described as “having” the data within it. Calls of “stonewalling” in cases like that are either grandstanding or reveal an opinion-disqualifying level of ignorance.

These accusations of stonewalling obscure what I think is the real appeal of encryption and tools such as Tor: it’s not that these technologies prevent compliance, it’s that companies can prevent the collection certain types of data in the first place.

The authors of a recent paper called “Cryptopolitik and the Darknet” did exactly that when they crawled the darknet for data:

“In order to avoid illegal material, such as media files of child pornography or publications by terrorist organisations, only textual content was harvested automatically. Any other material was either filtered out or immediately discarded.”

Nobody would think to accuse them of stonewalling or adopting “technological solutions to prevent compliance” for finding a way to do their darknet crawl without accumulating a bunch of data that is going to bring with it complicated law enforcement dealings.

When Whatapp wants to “avoid illegal material” while still remaining in the messaging business, they do it with end-to-end encryption.

 

Why end-to-end? In end-to-end encryption, the end users hold the decryption keys. Companies who decide to keep the keys themselves become a target of every spy agency on the planet and run the risk of becoming the next Gemalto.

That technologies and architectural choices exist which allow you to filter the data you are exposed to, and therefore your level of legal liability/obligation feels new. Or maybe what’s new is companies willingness to actually implement them.

No-one is interested in obstructing investigations, just managing risk and legal exposure. “If you collect it, they will come” is becoming a common phrase among programmers and security people, and for companies who don’t want to end up holding data related to a crime, painting a giant target on themselves, dedicating resources to servicing government requests, or having awkward public relations moments, end-to-end encryption starts to look like good risk management. Doubly so when you are dealing with multiple governments.

In that context, governments pushing back against end-to-end seem to indicate an existing idea that companies are somehow obligated to collect data on behalf of the government and that using encryption to limit your collection is not OK. This is visible in the issue of the government conscripting companies to do it’s work raised by the FBI’s recent use of the 1789 All-Writs Act law to try to force Apple to build software to hack it’s own phone.

With many years of enthusiastic support from companies like AT&T it’s easy to see where that idea might have come from. As the American government oscillates between attacking tech companies and asking them to do it’s job, and authoritarian governments and international customers look on, it’s not hard to see why many tech companies are far less enthusiastic about facilitating government aims. So far “stonewalling” seems to be a deliberately provocative framing for the “we’ve stopped collecting that data, leave us out of this” reality that end-to-end encryption creates.

Seeing that kind of inflammatory rhetoric from the FBI or congress is one thing, but it’s widespread use by journalists is very disconcerting.

As cries of “stonewalling” turn to accusations of tech companies thinking they are “above the law” and now draft anti-encryption legislation, it’s probably good to remember that blob of encrypted text. It’s not that these companies are getting in the way of the FBI getting data, they are trying to get themselves out of the way by removing their own access to it.

Of all people, former NSA director Michael Hayden recently observed “America is simply more secure with unbreakable end-to-end encryption”. I never thought I would be hoping more people would listen to him.

Graph traversals in ArangoDB

ArangoDB’s AQL query language was created to offer a unified interface for working with key/value, document and graph data. While AQL has been easy to work with and learn, it wasn’t until the addition of AQL traversals in ArangoDB 2.8 that it really felt like it has achieved it’s goal.

Adding keywords GRAPH, OUTBOUND, INBOUND and ANY suddenly made iteration using a FOR loop the central idea in the language. This one construct can now be used to iterate over everything; collections, graphs or documents:

//FOR loops for everything
FOR person IN persons //collections
  FOR friend IN OUTBOUND person GRAPH "knows_graph" //graphs
    FOR value in VALUES(friend, true) //documents
    RETURN DISTINCT value

AQL has always felt more like programming than SQL ever did, but the central role of the FOR loop gives a clarity and simplicity that makes AQL very nice to work with. While this is a great addition to the language, it does however, mean that there are now 4 different ways to traverse a graph in AQL and a few things are worth pointing out about the differences between them.

AQL Traversals

There are two variations of the AQL traversal syntax; the named graph and the anonymous graph. The named graph version uses the GRAPH keyword and a string indicating the name of an existing graph. With the anonymous syntax you can simply supply the edge collections

//Passing the name of a named graph
FOR vertex IN OUTBOUND "persons/eve" GRAPH "knows_graph"
//Pass an edge collection to use an anonymous graph
FOR vertex IN OUTBOUND "persons/eve" knows

Both of these will return the same result. The traversal of the named graph uses the vertex and edge collections specified in the graph definition, while the anonymous graph uses the vertex collection names from the _to/_from attributes of each edge to determine the vertex collections.

If you want access to the edge or the entire path all you need to do is ask:

FOR vertex IN OUTBOUND "persons/eve" knows
FOR vertex, edge IN OUTBOUND "persons/eve" knows
FOR vertex, edge, path IN OUTBOUND "persons/eve" knows

The vertex, edge and path variables can be combined and filtered on to do some complex stuff. The Arango docs show a great example:

FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
  FILTER p.edges[0].theTruth == true
  AND p.edges[1].theFalse == false
  FILTER p.vertices[1]._key == "G"
  RETURN p

Notes

Arango can end up doing a lot of work to fill in those FOR v, e, p IN variables. ArangoDB is really fast, so to show the effect these variables can have, I created the most inefficient query I could think of; a directionless traversal across a high degree vertex with no indexes.

The basic setup looked like this except with 10000 vertices instead of 10. The test was getting from start across the middle vertex to end.

Screenshot from 2016-04-05 10-07-04

What you can see is that adding those variables comes at a cost, so only declare ones you actually need.

effects_of_traversal_variables
Traversing a supernode with 10000 incident edges with various traversal methods. N=5. No indexes used.

GRAPH_* functions and TRAVERSAL

ArangoDB also has a series of “Named Operations” that feature among
them a few that also do traversals. There is also a super old-school TRAVERSAL function hiding in the “Other” section. What’s interesting is how different their performance can be while still returning the same results.

I tested all of the traversal functions on the same supernode described above. These are the queries:

//AQL traversal
FOR v IN 2 ANY "vertices/1" edges
  FILTER v.name == "end"
    RETURN v

//GRAPH_NEIGHBORS
RETURN GRAPH_NEIGHBORS("db_10000", {_id: "vertices/1"}, {direction: "any", maxDepth:2, includeData: true, neighborExamples: [{name: "end"}]})

//GRAPH_TRAVERSAL
RETURN GRAPH_TRAVERSAL("db_10000", {_id:"vertices/1"}, "any", {maxDepth:2, includeData: true, filterVertices: [{name: "end"}], vertexFilterMethod: ["exclude"]})

//TRAVERSAL
RETURN TRAVERSAL(vertices, edges, {_id: "vertices/1"}, "any", {maxDepth:2, includeData: true, filterVertices: [{name: "end"}], vertexFilterMethod: ["exclude"]})

All of these returned the same vertex, just with varying levels of nesting within various arrays. Removing the nesting did not make a signficant difference in the execution time.

traversal_comparison
Traversing a supernode with 10000 incident edges with various traversal methods. N=5.

Notes

While TRAVERSAL and GRAPH_TRAVERSAL were not stellar performers here, the both have a lot to offer in terms of customizability. For ordering, depthfirst searches and custom expanders and visitors, this is the place to look. As you explore the options, I’m sure these get much faster.

Slightly less obvious but still worth pointing out that where AQL traversals require an id (“vertices/1000” or a document with and _id attribute), GRAPH_* functions just accept an example like {foo: “bar”} (I’ve passed in {_id: “vertices/1”} as the example just to keep things comparable). Being able to find things, without needing to know a specific id, or what collection to look in is very useful. It lets you abstract away document level concerns like collections and operate on a higher “graph” level so you can avoid hardcoding collections into your queries.

What it all means

The difference between these, at least superficially, similar traversals are pretty surprising. While some where faster than others, none of the options for tightening the scope of the traversal were used (edge restrictions, indexes, directionality). That tells you there is likely a lot of headroom for performance gains for all of the different methods.

The conceptual clarity that AQL traversals bring to the language as a whole is really nice, but it’s clear there is some optimization work to be done before I go and rewrite all my queries.

Where I have used the new AQL traversal syntax, I’m also going to have to check to make sure there are no unused v,e,p variables hiding in my queries. Where you need to use them, it looks like restricting yourself to v,e is the way to go. Generating those full paths is costly. If you use them, make sure it’s worth it.

Slowing Arango down is surprisingly instructive, but with 3.0 bringing the switch to Velocypack for JSON serialization, new indexes, and more, it looks like it’s going to get harder to do. :)

 

Flash messages for Mapbox GL JS

I’ve been working on an application where I’m using ArangoDB’s WITHIN_RECTANGLE function to pull up documents within the current map bounds. The obvious problem there is that the current map bounds can be very very big.

Dumping the entire contents of your database every time the map moves sounded decidedly sub-optimal to me so I decided to calculate the area within the requested bounds using Turf.js and send back an error if it’s to big.

So far so good, but I wanted a nice way to display that error message  as a notification right on the map. There are lots of ways to tackle that sort of thing, but given that this seemed very specific to the map, I thought I might take a stab at making it a mapbox-gl.js plugin.

The result is mapbox-gl-flash. Currently you would install it from github:

npm install --save mapbox-gl-flash

I’m using babel so I’ll use the ES2015 syntax and get a map going.

import mapboxgl from 'mapbox-gl'
import Flash from 'mapbox-gl-flash'

//This is mapbox's api token that it uses for it's examples
mapboxgl.accessToken = 'pk.eyJ1IjoibWlrZXdpbGxpYW1zb24iLCJhIjoibzRCYUlGSSJ9.QGvlt6Opm5futGhE5i-1kw';
var map = new mapboxgl.Map({
    container: 'map', // container id
    style: 'mapbox://styles/mapbox/streets-v8', //stylesheet location
    center: [-74.50, 40], // starting position
    zoom: 9 // starting zoom
});

// And now set up flash:
map.addControl(new Flash());

This sets up an element on the map that listens for a “mapbox.setflash” event.

Next the element that is listening has a class of .flash-message, so lets set up a little basic styling for it:

.flash-message {
  font-family: 'Ubuntu', sans-serif;
  position: relative;
  text-align: center;
  color: #fff;
  margin: 0;
  padding: 0.5em;
  background-color: grey;
}

.flash-message.info {
  background-color: DarkSeaGreen;
}

.flash-message.warn {
  background-color: Khaki;
}

.flash-message.error {
  background-color: LightCoral;
}

With that done lets fire an CustomEvent and see what it does.

document.dispatchEvent(new CustomEvent('mapbox.setflash', {detail: {message: "foo"}}))

foo_message

Ruby on Rails has three different kinds of flash messages: info, warn and error. That seems pretty reasonable so I’ve implemented that here as well. We’ve already set up some basic styles for those classes above and we can apply one of those classes by adding another option to out custom event detail object:

document.dispatchEvent(new CustomEvent('mapbox.setflash', {detail: {message: "foo", info: true}}))

document.dispatchEvent(new CustomEvent('mapbox.setflash', {detail: {message: "foo", warn: true}}))

document.dispatchEvent(new CustomEvent('mapbox.setflash', {detail: {message: "foo", error: true}}))

These events add the specified class to the flash message.

flash_message_classes

One final thing that I expect is for the flash message to fade out after a specified number of seconds. The is accomplished by adding a fadeout attribute:


document.dispatchEvent(new CustomEvent('mapbox.setflash', {detail: {message: "foo", fadeout: 3}}))

Lastly you can make the message go away by firing the event again with an empty string.

With a little CSS twiddling I was able to get the nice user-friendly notification I had in mind to let people know why there is no more data showing up.

flash-message

I’m pretty happy with how this turned out. Now I have a nice map specific notification that not only works in this project, but is going to be easy to add to future ones too.

Running Gephi on Ubuntu 15.10

A while ago I gave a talk at the Ottawa graph meetup about getting started doing graph data visualizations with Gephi. Ever the optimist, I invited people to install Gephi on their machines and then follow along as I walked through doing various things with the program.

java_install

What trying to get a room of 20 people to install a Java program has taught me is that the installer’s “Java is found everywhere” is not advertising; it’s a warning. I did indeed experience the power of Java, and after about ten minutes of old/broken/multiple  Java versions, broken classpaths and Java 7/8 compatiblity drama, I gave up and completed the rest of the talk as a demo.

All of this was long forgotten until my wife and I started a little open data project recently and needed to use Gephi to visualize the data. The Gephi install she had attempted the day of the talk was still lingering on her Ubuntu system and so it was time to actually figure out how to get it going.

The instructions for installing Gephi are pretty straight forward:

  1. Update your distribution with the last official JRE 7 or 8 packages.
  2. After the download completes, unzip and untar the file in a directory.
  3. Run it by executing ./bin/gephi script file.

The difficulty was that after doing that, Gephi would show its splash screen and then hang as the loading bar said “Starting modules…“.

If you have every downloaded plugins for Gephi, you will have noticed that they have an .nbm extension, which indicates they, and (if you will pardon the pun) by extension, Gephi itself is built on top of the Netbeans IDE.
So the next question was, does Netbeans itself work?

sudo apt-get install netbeans
netbeans

Wouldn’t you know it, that Netbeans also freezes while loading modules.

Installing Oracle’s version of Java was suggested and the place to get that is the Webupd8 Team’s ppa:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer oracle-java8-set-default
# The java version that got installed:
java -version
java version &quot;1.8.0_72&quot;
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

That finally left us with a working version of gephi.

Gephi 0.9.1 running on Ubuntu 15.10
Gephi 0.9.1 running on Ubuntu 15.10

Installing Gephi on Arch Linux was (thankfully) drama-free, but interestingly installs the OpenJDK, they very thing that seemed to causing the problems on Ubuntu:

yaourt -S gephi
java -version
openjdk version &quot;1.8.0_74&quot;
OpenJDK Runtime Environment (build 1.8.0_74-b02)
OpenJDK 64-Bit Server VM (build 25.74-b02, mixed mode)

It’s a mystery to me why Gephi on Ubuntu seems to require Oracle’s Java but on Arch I can run it on OpenJDK.
With a little luck it can remain a mystery.