GraphQL and security

Imagine you have a web application that allows people view widgets by name. Somewhere deep in your codebase, a programmer has thoughtfully updated the existing ES5 SQL injection to this stylish new ES6 SQL injection:

`select * from widgets where name = '${name}';`

Injection attacks just like this one persist in-spite of the fact that a search for “sql injection tutorial” returns around 3,740,000 results on Google. They have made the top of the OWASP top 10 in 2010 and 2013 and probably will again in 2016 and likely for the foreseeable future. Motherboard even calls it “the hack that will never go away“.

The standard answer to this sort of problem is input sanitization. It’s standard enough that it shows up in jokes.

xkcd’s famous “Exploits of a Mom

But when faced with such consistent failure, it’s reasonable to ask if there isn’t something systemic going on.

There is a sub-field of security research known as Language Theoretic Security (Langsec) that is asking exactly that question.

Exploitation is unexpected computation caused reliably or probabilistically by some crafted inputs. — Meredith Patterson

Dropping the students table in the comic is exactly the kind of “unexpected computation” they are talking about. To combat it, Langsec encourages programmers to consider their inputs as a language and the sum of the adhoc checks on those inputs as a parser for that language.

They advocate a clean separation of concerns between the recognition and processing of inputs, and bringing a recognizer of appropriate strength to bear on the input you are parsing (ie: not validating HTML/XML with a regex)

Where this starts intersecting with GraphQL is in what this actually implies:

This design and programming paradigm begins with a description of valid inputs to a program as a formal language (such as a grammar or a DFA). The purpose of such a disciplined specification is to cleanly separate the input-handling code and processing code.

A LangSec-compliant design properly transforms input-handling code into a recognizer for the input language; this recognizer rejects non-conforming inputs and transforms conforming inputs to structured data (such as an object or a tree structure, ready for type or value-based pattern matching).

The processing code can then access the structured data (but not the raw inputs or parsers’ temporary data artifacts) under a set of assumptions regarding the accepted inputs that are enforced by the recognizer.

If all that starts sounding kind of familiar, well it did to me too.


GraphQL allows you to define a formal language via it’s type system. As queries arrive, they are lexed, parsed, matched against the user defined types and formed into an Abstract Syntax Tree (AST). The contents of that AST are then made available to processing code via resolve functions.

The promise of Langsec is “software free from broad and currently dominant classes of bugs and vulnerabilities related to incorrect parsing and interpretation of messages between software components”, and GraphQL seems poised to put this within reach of every developer.

Turning back to the example we started with, how could we use GraphQL to protect against that SQL injection? Lets get this going in a test.


import expect from 'expect'
import {
} from 'graphql'
import { GraphQLError } from 'graphql/error';
import { Kind } from 'graphql/language';

describe('SQL injection', () => {

  it('returns a SQL injected string', async () => {

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          widgets: {
            type: GraphQLString,
            args: {
              name: {
                description: 'The name of the widget',
                type: new GraphQLNonNull(GraphQLString)
            resolve: (source, {name}) => `select * from widgets where name = '${name}';`

    let query = `
      query widgetByName($name: String!) {
        widgets(name: $name)

    //args: schema, query, rootValue, contextValue, variables
    let result = await graphql(schema, query, null, null, {name: "foo'; drop table widgets; --"})
    expect("select * from widgets where name = 'foo'; drop table widgets; --'")


GraphQL brings lots of benefits (no need for API versioning, all data in a single round trip, etc…), and while those are compelling, simply passing strings into our backend systems misses an opportunity to do something different.

Let’s create a custom type that is more specific than just a “string”; an AlphabeticString.

  it('is fixed with custom types', async () => {

    let AlphabeticString = new GraphQLScalarType({
      name: 'AlphabeticString',
      description: 'represents a string with no special characters.',
      serialize: String,
      parseValue: (value) => {
        if(value.match(/^([A-Za-z]|\s)+$/)) {
          return value
        return null
      parseLiteral: (ast) => {
        if(value.match(/^([A-Za-z]|\s)+$/)) {
          return ast.value
        return null

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          widgets: {
            type: GraphQLString,
            args: {
              name: {
                description: 'The name of the widget',
                type: new GraphQLNonNull(AlphabeticString)
            resolve: (source, {name}) => {
              return `select * from widgets where name = '${name}';`

    let query = `
    query widgetByName($name: AlphabeticString!) {
        widgets(name: $name)

    //args: schema, query, rootValue, contextValue, variables
    let result = await graphql(schema, query, null, null, {name: "foo'; drop table widgets; --"})
    expect(result.errors[0].message).toInclude("got invalid value")

This test now passes; the SQL string is rejected during parsing before ever reaching the resolve function.

On making custom types

There are already libraries out there that provide custom types for things like URLs, datetime’s and other things, but you will definitely want to be able to make your own.

To start defining your own types you will need to understand the role
the various functions play:

    let MyType = new GraphQLScalarType({
      name: 'MyType',
      description: 'this will show up in the documentation!',
      serialize: (value) => { //... }
      parseValue: (value) => { //... },
      parseLiteral: (ast) => { //... }

The serialize function is the easiest to understand: GraphQL responses are serialized to JSON, and if this type needs special treatment before being included in a response, this is the place to do it.

Understanding parseValue and parseLiteral requires a quick look at two different queries:

//name is a string literal, so parseLiteral is called 
query {
  widgets(name: "Soft squishy widget")

//name is a value so parseValue is called
query widgetByName($name: AlphabeticString!) {
  widgets(name: $name)

In the first query, name is a string literal, so your types parseLiteral function will be called. The second query, the name is supplied as the value of a variable so parseValue is called.

Your type could end up being used in either of those scenarios so it’s important to do your validation in both of those functions.

The standard GraphQL types (GraphQLInt, GraphQLFloat, etc.) also implement those same functions.

Putting the whole thing together, the process then looks something like this:

A query arrives, gets tokenized, the parser builds the AST, calling parseValue and parseLiteral as needed. When the AST is complete, it recurses down the AST calling resolve, using parseValue and parseLiteral again on whatever is returned before calling serialize on each to create the response.

Where to go from here

Langsec is a deep topic, with implications wider than just what is discussed here. While GraphQL is certainly not “Langsec in a box”, it not only seems to be making the design patterns that follow from Langsec’s insights a reality, it has a has a shot at making them mainstream. I’d love to see the Langsec lens turned on GraphQL and see how it can guide the evolution of the spec and the practices around it.

I would encourage you to dig into the ideas of Langsec, and the best place to start is here:

Packaging, pid-files and systemd

When I first built my ArangoDB package one of the problems I had was getting ArangoDB to start after a reboot. While reworking it for Arango 3.0 I ran into this again.
The reason this can be tricky is that ArangoDB, like basically all forking processes needs to write a pid file somewhere. Where things get confusing is that that anything you create in /var/run will be gone next time you reboot leading to errors like this:

-- Unit arangodb.service has begun starting up.
Aug 24 08:50:27 longshot arangod[10366]: {startup} starting up in daemon mode
Aug 24 08:50:27 longshot arangod[10366]: cannot write pid-file '/var/run/arangodb3/'
Aug 24 08:50:27 longshot systemd[1]: arangodb.service: Control process exited, code=exited status=1
Aug 24 08:50:27 longshot systemd[1]: Failed to start ArangoDB.
-- Subject: Unit arangodb.service has failed

If you DuckDuckGo it you can see that people stumble into this pretty regularly.

To understand what’s going on here it’s important to know about what /var/run is actually for.

The Filesystem Hierarchy Standard describes it as a folder for “run-time variable data” and lays out some rules for the folder:

This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process. Programs may have a subdirectory of /var/run; this is encouraged for programs that use more than one run-time file. Process identifier (PID) files, which were originally placed in /etc , must be placed in /var/run. The naming convention for PID files is .pid. For example, the crond PID file is named /var/run/

Since those words were written in 2004, the evolving needs of init systems, variations across distributions and the idea of storing pid-files (which shouldn’t survive reboot) with logs and stuff (which should) have all conspired to push for the creation of a standard place to put ephemeral data: /run.

Here in 2016, /run is a done deal, and for backwards compatibility, /var/run is now simply a simlink to /run:

mike@longshot ~/$  ls -l /var/
total 52
lrwxrwxrwx  1 root root     11 Sep 30  2015 lock -> ../run/lock
lrwxrwxrwx  1 root root      6 Sep 30  2015 run -> ../run

Looking back at our cannot write pid-file '/var/run/arangodb3/' error, a few things are clear. First, we should probably stop using /var/run since /run has been standard since around 2011.

Second, our files disappear because /run is a tmpfs. While there are some subtleties it’s basically storing your files in RAM.

So the question is; how do we ensure our /run folder is prepped with our /run/arangodb3 directory (and whatever other files) before our systemd unit file is run? As it happens, systemd has a subproject that deals with this: tmpfiles.d.

The well-named tmpfiles.d creates tmpfiles in /run and /tmp (and a few others). It does this by reading conf files written in a simple configuration format out of certain folders. A quick demo:

mike@longshot ~$  sudo bash -c "echo 'd /run/foo 0755 mike users -' > /usr/lib/tmpfiles.d/foo.conf"
mike@longshot ~$  sudo systemd-tmpfiles --create foo.conf
mike@longshot ~$  ls -l /run
drwxr-xr-x  2 mike     users     40 Aug 24 14:18 foo

While we specified an individual conf file by name running systemd-tmpfiles --create would create the files for all the conf files that exist in /usr/lib/tmpfiles.d/.

mike@longshot ~$  ls -l /usr/lib/tmpfiles.d/
total 104
-rw-r--r-- 1 root root   30 Jul  5 10:35 apache.conf
-rw-r--r-- 1 root root   78 May  8 16:35 colord.conf
-rw-r--r-- 1 root root  574 Jul 25 17:10 etc.conf
-rw-r--r-- 1 root root  595 Aug 11 08:04 gvfsd-fuse-tmpfiles.conf
-rw-r--r-- 1 root root  362 Jul 25 17:10 home.conf

Tying all this together is a systemd service that runs just before that uses that exact command to create all the tmpfiles:

mike@longshot ~/$  systemctl cat systemd-tmpfiles-setup.service
# /usr/lib/systemd/system/systemd-tmpfiles-setup.service
#  This file is part of systemd.
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

Description=Create Volatile Files and Directories
Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
DefaultDependencies=no systemd-sysusers.service

ExecStart=/usr/bin/systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev

If your unit file includes you know that tmpfiles you specified will exist when your unit file is run.

Knowing that this plumbing is in place, your package should include a conf file which gets installed into /usr/lib/tmpfiles.d/. Here is mine for ArangoDB:

mike@longshot ~/projects/arangodb_pkg (master)$  cat arangodb-tmpfile.conf 
d /run/arangodb3 0755 arangodb arangodb -

While this will ensure that tmpfiles are created next time the computer boots, we also need to make sure the service can be started right now. If you are packaging software for ArchLinux that means having a post_install hook that looks like this:

post_install() {
  systemd-tmpfiles --create arangodb.conf

If you are running systemd, and you probably are, this is the way to go. While it’s not hard to find people using mkdir in their unit file’s ExecStartPre section (been there, done that) or writing some sort of startup script, this is much cleaner. Make use of the infrastructure that is there.

Graph traversals in ArangoDB

ArangoDB’s AQL query language was created to offer a unified interface for working with key/value, document and graph data. While AQL has been easy to work with and learn, it wasn’t until the addition of AQL traversals in ArangoDB 2.8 that it really felt like it has achieved it’s goal.

Adding keywords GRAPH, OUTBOUND, INBOUND and ANY suddenly made iteration using a FOR loop the central idea in the language. This one construct can now be used to iterate over everything; collections, graphs or documents:

//FOR loops for everything
FOR person IN persons //collections
  FOR friend IN OUTBOUND person GRAPH "knows_graph" //graphs
    FOR value in VALUES(friend, true) //documents

AQL has always felt more like programming than SQL ever did, but the central role of the FOR loop gives a clarity and simplicity that makes AQL very nice to work with. While this is a great addition to the language, it does however, mean that there are now 4 different ways to traverse a graph in AQL and a few things are worth pointing out about the differences between them.

AQL Traversals

There are two variations of the AQL traversal syntax; the named graph and the anonymous graph. The named graph version uses the GRAPH keyword and a string indicating the name of an existing graph. With the anonymous syntax you can simply supply the edge collections

//Passing the name of a named graph
FOR vertex IN OUTBOUND "persons/eve" GRAPH "knows_graph"
//Pass an edge collection to use an anonymous graph
FOR vertex IN OUTBOUND "persons/eve" knows

Both of these will return the same result. The traversal of the named graph uses the vertex and edge collections specified in the graph definition, while the anonymous graph uses the vertex collection names from the _to/_from attributes of each edge to determine the vertex collections.

If you want access to the edge or the entire path all you need to do is ask:

FOR vertex IN OUTBOUND "persons/eve" knows
FOR vertex, edge IN OUTBOUND "persons/eve" knows
FOR vertex, edge, path IN OUTBOUND "persons/eve" knows

The vertex, edge and path variables can be combined and filtered on to do some complex stuff. The Arango docs show a great example:

FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
  FILTER p.edges[0].theTruth == true
  AND p.edges[1].theFalse == false
  FILTER p.vertices[1]._key == "G"


Arango can end up doing a lot of work to fill in those FOR v, e, p IN variables. ArangoDB is really fast, so to show the effect these variables can have, I created the most inefficient query I could think of; a directionless traversal across a high degree vertex with no indexes.

The basic setup looked like this except with 10000 vertices instead of 10. The test was getting from start across the middle vertex to end.

Screenshot from 2016-04-05 10-07-04

What you can see is that adding those variables comes at a cost, so only declare ones you actually need.

Traversing a supernode with 10000 incident edges with various traversal methods. N=5. No indexes used.

GRAPH_* functions and TRAVERSAL

ArangoDB also has a series of “Named Operations” that feature among
them a few that also do traversals. There is also a super old-school TRAVERSAL function hiding in the “Other” section. What’s interesting is how different their performance can be while still returning the same results.

I tested all of the traversal functions on the same supernode described above. These are the queries:

//AQL traversal
FOR v IN 2 ANY "vertices/1" edges
  FILTER == "end"
    RETURN v

RETURN GRAPH_NEIGHBORS("db_10000", {_id: "vertices/1"}, {direction: "any", maxDepth:2, includeData: true, neighborExamples: [{name: "end"}]})

RETURN GRAPH_TRAVERSAL("db_10000", {_id:"vertices/1"}, "any", {maxDepth:2, includeData: true, filterVertices: [{name: "end"}], vertexFilterMethod: ["exclude"]})

RETURN TRAVERSAL(vertices, edges, {_id: "vertices/1"}, "any", {maxDepth:2, includeData: true, filterVertices: [{name: "end"}], vertexFilterMethod: ["exclude"]})

All of these returned the same vertex, just with varying levels of nesting within various arrays. Removing the nesting did not make a signficant difference in the execution time.

Traversing a supernode with 10000 incident edges with various traversal methods. N=5.


While TRAVERSAL and GRAPH_TRAVERSAL were not stellar performers here, the both have a lot to offer in terms of customizability. For ordering, depthfirst searches and custom expanders and visitors, this is the place to look. As you explore the options, I’m sure these get much faster.

Slightly less obvious but still worth pointing out that where AQL traversals require an id (“vertices/1000” or a document with and _id attribute), GRAPH_* functions just accept an example like {foo: “bar”} (I’ve passed in {_id: “vertices/1”} as the example just to keep things comparable). Being able to find things, without needing to know a specific id, or what collection to look in is very useful. It lets you abstract away document level concerns like collections and operate on a higher “graph” level so you can avoid hardcoding collections into your queries.

What it all means

The difference between these, at least superficially, similar traversals are pretty surprising. While some where faster than others, none of the options for tightening the scope of the traversal were used (edge restrictions, indexes, directionality). That tells you there is likely a lot of headroom for performance gains for all of the different methods.

The conceptual clarity that AQL traversals bring to the language as a whole is really nice, but it’s clear there is some optimization work to be done before I go and rewrite all my queries.

Where I have used the new AQL traversal syntax, I’m also going to have to check to make sure there are no unused v,e,p variables hiding in my queries. Where you need to use them, it looks like restricting yourself to v,e is the way to go. Generating those full paths is costly. If you use them, make sure it’s worth it.

Slowing Arango down is surprisingly instructive, but with 3.0 bringing the switch to Velocypack for JSON serialization, new indexes, and more, it looks like it’s going to get harder to do. :)


Running Gephi on Ubuntu 15.10

A while ago I gave a talk at the Ottawa graph meetup about getting started doing graph data visualizations with Gephi. Ever the optimist, I invited people to install Gephi on their machines and then follow along as I walked through doing various things with the program.


What trying to get a room of 20 people to install a Java program has taught me is that the installer’s “Java is found everywhere” is not advertising; it’s a warning. I did indeed experience the power of Java, and after about ten minutes of old/broken/multiple  Java versions, broken classpaths and Java 7/8 compatiblity drama, I gave up and completed the rest of the talk as a demo.

All of this was long forgotten until my wife and I started a little open data project recently and needed to use Gephi to visualize the data. The Gephi install she had attempted the day of the talk was still lingering on her Ubuntu system and so it was time to actually figure out how to get it going.

The instructions for installing Gephi are pretty straight forward:

  1. Update your distribution with the last official JRE 7 or 8 packages.
  2. After the download completes, unzip and untar the file in a directory.
  3. Run it by executing ./bin/gephi script file.

The difficulty was that after doing that, Gephi would show its splash screen and then hang as the loading bar said “Starting modules…“.

If you have every downloaded plugins for Gephi, you will have noticed that they have an .nbm extension, which indicates they, and (if you will pardon the pun) by extension, Gephi itself is built on top of the Netbeans IDE.
So the next question was, does Netbeans itself work?

sudo apt-get install netbeans

Wouldn’t you know it, that Netbeans also freezes while loading modules.

Installing Oracle’s version of Java was suggested and the place to get that is the Webupd8 Team’s ppa:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer oracle-java8-set-default
# The java version that got installed:
java -version
java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

That finally left us with a working version of gephi.

Gephi 0.9.1 running on Ubuntu 15.10
Gephi 0.9.1 running on Ubuntu 15.10

Installing Gephi on Arch Linux was (thankfully) drama-free, but interestingly installs the OpenJDK, they very thing that seemed to causing the problems on Ubuntu:

yaourt -S gephi
java -version
openjdk version "1.8.0_74"
OpenJDK Runtime Environment (build 1.8.0_74-b02)
OpenJDK 64-Bit Server VM (build 25.74-b02, mixed mode)

It’s a mystery to me why Gephi on Ubuntu seems to require Oracle’s Java but on Arch I can run it on OpenJDK.
With a little luck it can remain a mystery.

gpg and signing your own Arch Linux packages

One of the first things that I wanted to install on my system after switching to Arch was ArangoDB. Sadly it wasn’t in the official repos. I’ve had mixed success installing things from the AUR and the Arango packages there didn’t improve my ratio.

Using those packages as a starting point, I did a little tinkering and got it all working the way I like. I’ve been following all the work being done on reproducible builds, and why that is needed and it seems that with all that going on, anyone dabbling with making packages should at the very least be signing them. With that as my baseline, I figured I might as well start with mine .

Of course package signing involves learning about gpg/pgp whose “ease of use” is legendary.

Before we get to package signing, a little about gpg.

gpg --list-keys
gpg -k
gpg --list-public-keys

All of these commands list the contents of ~/.gnupg/pubring.gpg.

gpg --list-secret-keys
gpg -K

Both list all keys from ~/.gnupg/secring.gpg.

The pacman package manager also has its own gpg databases which you can explore with:

gpg --homedir /etc/pacman.d/gnupg --list-keys

So the task at hand is getting my public key into the list of public keys that pacman trusts. To do that we will need to do more than just list keys we need to reference them individually. gpg has a few ways to do that by passing an argument to one of our list keys commands above. I’ll do a quick search through the list of keys that pacman trusts:

mike@longshot:~/projects/arangodb_pkg☺ gpg --homedir /etc/pacman.d/gnupg -k pierre
pub   rsa3072/6AC6A4C2 2011-11-18 [SC]
uid         [  full  ] Pierre Schmitz (Arch Linux Master Key) <>
sub   rsa1024/86872C2F 2011-11-18 [E]
sub   rsa3072/1B516B59 2011-11-18 [A]

pub   rsa2048/9741E8AC 2011-04-10 [SC]
uid         [  full  ] Pierre Schmitz <>
sub   rsa2048/54211796 2011-04-10 [E]

mike@longshot:~/projects/arangodb_pkg☺  gpg --homedir /etc/pacman.d/gnupg --list-keys stephane
pub   rsa2048/AB441196 2011-10-30 [SC]
uid         [ unknown] Stéphane Gaudreault <>
sub   rsa2048/FDC576A9 2011-10-30 [E]

If you look at the output there you can see what is called an openpgp short key id. We can use those to refer to individual keys but we can also use long id’s and fingerprints:

gpg --homedir /etc/pacman.d/gnupg -k --keyid-format long stephane
pub   rsa2048/EA6836E1AB441196 2011-10-30 [SC]
uid                 [ unknown] Stéphane Gaudreault <>
sub   rsa2048/4ABE673EFDC576A9 2011-10-30 [E]

gpg --homedir /etc/pacman.d/gnupg -k --fingerprint stephane
pub   rsa2048/AB441196 2011-10-30 [SC]
      Key fingerprint = 0B20 CA19 31F5 DA3A 70D0  F8D2 EA68 36E1 AB44 1196
uid         [ unknown] Stéphane Gaudreault <>
sub   rsa2048/FDC576A9 2011-10-30 [E]

So we can identify Stephane’s specific key using either the short id, the long id or the fingerprint:

gpg --homedir /etc/pacman.d/gnupg -k AB441196
gpg --homedir /etc/pacman.d/gnupg -k EA6836E1AB441196
gpg --homedir /etc/pacman.d/gnupg -k 0B20CA1931F5DA3A70D0F8D2EA6836E1AB441196

Armed with a way to identify the key I want pacman to trust I need to do the transfer. Though not initially obvious, gpg can push and pull keys from designated key servers. The file at ~/.gnupg/gpg.conf tells me that my keyserver is, while pacman’s file at /etc/pacman.d/gnupg/gpg.conf says it is using

Using my key’s long id I’ll push it to my default keyserver and tell pacman to pull it and then sign it.

#send my key
gpg --send-key F77636AC51B71B99
#tell pacman to pull that key from my keyserver
sudo pacman-key --keyserver -r F77636AC51B71B99
#sign the key it recieved and start trusting it
sudo pacman-key --lsign-key F77636AC51B71B99

With all that done, I should be able to sign my package with

makepkg --sign --key F77636AC51B71B99

We can also shorten that by setting the default-key option in ~/.gnupg/gpg.conf.

# If you have more than 1 secret key in your keyring, you may want to
# uncomment the following option and set your preferred keyid.

default-key F77636AC51B71B99

With my default key set I’m able to make and install with this:

mike@longshot:~/projects/arangodb_pkg☺ makepkg --sign
mike@longshot:~/projects/arangodb_pkg☺ sudo pacman -U arangodb-2.8.1-1-x86_64.pkg.tar.xz
loading packages...
resolving dependencies...
looking for conflicting packages...

Packages (1) arangodb-2.8.1-1

Total Installed Size:  146.92 MiB
Net Upgrade Size:        0.02 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring
[##################################################] 100%
(1/1) checking package integrity
[##################################################] 100%

The ease with which I can make my own packages is a very appealing part of Arch Linux for me. Signing them was the next logical step and I’m looking forward to exploring some related topics like running my own repo, digging deeper into GPG, the Web of Trust, and reproducible builds. It’s all fun stuff, if you can only find the time.

Hello GraphQL

One of the most interesting projects to me lately has been Facebook’s GraphQL. Announced at React.conf in January, those of us that were excited by the idea have had to wait, first for the spec to be formalized and then for some actual running code.

I think the GraphQL team is on to something big (it’s positioned as an alternative to REST to give a sense of how big), and I’ve been meaning to dig in to it for a while, but it was never clear where to start. So after a little head-scratching and a little RTFM, I want to share a GraphQL hello world.

So what does that look like? Well Facebook as released two projects: graphql-js and express-graphql. Graphql-js is the reference implementation of what is described in the spec. express-graphql is a middleware component for the express framework that lets you use graphql.

So express is going to be our starting point. First we need to create a new project using the generator:

mike@longshot:~☺  express --git -e gql_hello

create : gql_hello
create : gql_hello/package.json
create : gql_hello/app.js
create : gql_hello/.gitignore
create : gql_hello/public
create : gql_hello/routes
create : gql_hello/routes/index.js
create : gql_hello/routes/users.js
create : gql_hello/views
create : gql_hello/views/index.ejs
create : gql_hello/views/error.ejs
create : gql_hello/bin
create : gql_hello/bin/www
create : gql_hello/public/javascripts
create : gql_hello/public/images
create : gql_hello/public/stylesheets
create : gql_hello/public/stylesheets/style.css

install dependencies:
$ cd gql_hello && npm install

run the app:
$ DEBUG=gql_hello:* npm start

Lets do as we are told and run cd gql_hello && npm install.
When that’s done we can get to the interesting stuff.
Next up will be installing graphql and the middleware using the –save option so that our app’s dependencies in our package.json will be updated:

mike@longshot:~/gql_hello☺  npm install --save express-graphql graphql babel
npm WARN install Couldn't install optional dependency: Unsupported
npm WARN prefer global babel@5.8.23 should be installed with -g

I took the basic app.js file that was generated and just added the following:

app.use('/', routes);
app.use('/users', users);

// GraphQL:
var graphqlHTTP = require('express-graphql');

import {
} from 'graphql';

var schema = new GraphQLSchema({
  query: new GraphQLObjectType({
    name: 'RootQueryType',
    fields: {
      hello: {
        type: GraphQLString,
        resolve() {
          return 'world';

//Mount the middleware on the /graphql endpoint:
app.use('/graphql', graphqlHTTP({ schema: schema , pretty: true}));
//That's it!

// catch 404 and forward to error handler
app.use(function(req, res, next) {
  var err = new Error('Not Found');
  err.status = 404;

Notice that we are passing our GraphQL schema to graphqlHTTP as well as pretty: true so that responses from the server will be pretty printed.

One other thing is that since those GraphQL libraries make extensive use of ECMAScript 6 syntax, we will need to use the Babel Transpiler to actually be able to run this thing.

If you installed Babel with npm install -g babel you can add the following to your package.json scripts section:

    "start": "babel-node ./bin/www"

Because I didn’t install it globally, I’ll just point to it in the node_modules folder:

    "start": "node_modules/babel/bin/babel-node.js ./bin/www"

With that done we can use npm start to start the app and try things out using curl:

mike@longshot:~☺  curl localhost:3000/graphql?query=%7Bhello%7D
  "data": {
    "hello": "world"

Looking back at the schema we defined, we can see that our request {hello} (or %7Bhello%7D when its been url encoded) caused the resolve function to be called, which returned the string “world”.

  name: 'RootQueryType',
  fields: {
    hello: {
      type: GraphQLString,
      resolve() {
        return 'world';

This explains what they mean when you hear that GraphQL “exposes fields that are backed by arbitrary code”. What drew me to GraphQL is that it seems to be a great solution for people with graph database backed applications, but it’s only now that I realize that GraphQL is much more flexible. That string could have just as easily pulled something out of a relational database or calculated something useful. In fact this might be the only time “arbitrary code execution” is something to be happy about.

I’m super excited to explore this further and to start using it with ArangoDB. If you want to dig deeper I suggest you check out Exploring GraphQL and Creating a GraphQL server and of course read the spec.

Running a Rails app with Systemd and liking it

Systemd has, over the strident objections of many strident people, become the default init system for a surprising number of linux distributions. Though I’ve been aware of the drama, the eye-rolling, the uh.. enterprising nature of systemd, I really have only just started using it myself. All the wailing and gnashing of teeth surrounding it left me unsure what to expect.

Recently I needed to get a Proof-Of-Concept app I built running so a client could use it on their internal network to evaluate it. Getting my Rails app to start on boot was pretty straight forward and I’m going to be using this again so I thought I would document it here.

First I created a “rails” user and group, and in /home/rails I installed my usual Rbenv setup. The fact that only root is allowed to listen to ports below 1024, conflicts with my plan to run my app with the “rails” user and listen on port 80. The solution is setcap:

setcap 'cap_net_bind_service=+ep' .rbenv/versions/2.2.2/bin/bundle

With that capability added, I set up my systemd unit file in /usr/lib/systemd/system/myapp.service and added the following:


ExecStart=/usr/bin/bash -lc 'bundle exec rails server -e production --bind --port 80'


The secret sauce that makes this work with rbenv is the “bash -l” in the ExecStart section. This means that the bash will execute as though it was a login shell, meaning that the .bashrc file with all the PATH exports and rbenv init stuff will be sourced before the command I give it will be run. In other words, exactly what happens normally.

From there, I just start the service like all the rest of them:

systemctl enable myapp.service
systemctl start myapp.service

This Just Works™ and got the job done, but in the process I find I am really starting to appreciate Systemd. Running daemons is complicated, and with a the dropping of privileges, ordering, isolation and security options, there is a lot to get right… or wrong.

What I am liking about Systemd is that it is taking the same functions that Docker is built on, namely cgroups and namespacing, and giving you a declarative way of using them while starting your process. Doing so puts some really nice (and otherwise complicated) security features within reach of anyone willing to read a man page.

PrivateTmp=yes is a great example of this. By simply adding that to the unit file above (which you should if you call in your app) closes off a bunch of security problems because systemd “sets up a new file system namespace for the executed processes and mounts private /tmp and /var/tmp directories inside it that is not shared by processes outside of the namespace”.

Could I get the same effect as PrivateTmp=yes with unshare? With some fiddling, but Systemd makes it a zero cost option.

There is also ProtectSystem=full to mount /boot, /usr and /etc as read only which “ensures that any modification of the vendor supplied operating system (and optionally its configuration) is prohibited for the service”. Systemd can even handle running setcap for me, resulting in beautiful stuff like this, and there is a lot more in man systemd.exec besides.

For me I think one of the things that has become clear over the last few years is that removing “footguns” from our software is really important. All the work that is going into the tools (like rm -rf) and languages (Rust!) we use less spectacularly dangerous is critical to raising the security bar across the industry.

The more I learn about Systemd the more it seems to be a much needed part of that.