ArangoDB and GraphQL

For a while now I’ve been wondering about what might be the minimal set of technologies that allows me to tackle the widest range of projects. The answer I’ve arrived at, for backend development at least, is GraphQL and ArangoDB.

Both of these tools expand my reach as a developer. Projects involving integrations, multiple clients and complicated data that would have been extremely difficult are now within easy reach.

But the minimal set idea is that I can enjoy this expanded range while juggling far fewer technologies than before. Tools that apply in more situations mean fewer things to learn, fewer moving parts and more depth in the learning I do.

While GraphQL and ArangoDB are both interesting technologies individually, it’s in using them together that I’ve been able to realize those benefits; one of those moments where the whole is different from the sum of it’s parts.

Backend Minimalism

My embrace of Javascript has definitely been part of creating that minimal set. A single language for both front and back end development has been a big part of simplifying my tech stack. Both GraphQL and ArangoDB can be used in many languages, but Javascript support is what you might describe as “first among equals” for both projects.

GraphQL can replace, and for me has replaced, server side frameworks like Rails or Django, leaving me with a handful of Javascript functions and more modular, testable code.

GraphQL also replaces ReST, freeing me from thinking about HATEOAS, bike-shedding over the vagaries of ReST, or needing pages and pages of JSON API documentation to save me from bike-shedding over the vagaries of ReST.

ArangoDB has also reduced the number of things I need to need to know. For a start it has removed the “need” for an ORM (no relational database, no need for Object Relational Mapping), which never really delivered on it’s promise to free you from knowing the underlying SQL.

More importantly it’s replaced not just NoSQL databases with a razor-thin set of capabilities like Mongodb (which stores nested documents but can’t do joins) or Neo4j (which does joins but can’t store nested documents), but also general purpose databases like MySQL or Postgres. I have one query language to learn, and one database whose quirks and characteristics I need to know.

It’s also replaced the deeply unpleasant process of relational data modeling with a seamless blend of documents and graphs that make modeling even really ugly connected datasets anticlimactic. As a bonus, in moving the schema outside the database GraphQL lets us enjoy all the benefits of a schema (making sure there is at least some structure I can rely on) and all the benefits of schemalessness (flexibility, ease of change).

Tools that actually reduce the number of things you need to know don’t come along very often. My goal here is to give a sense of what it looks like to use these two technologies together, and hopefully admiring the trees can let us appreciate the forest.

Show me the code

First we need some data to work with. ArangoDB’s administrative interface has some example graphs it can create, so lets use one to explore.

example_graphs

If we select the “knows” graph, we get a simple graph with 5 vertices.

knows_graph

This graph is going to be the foundation for our little exploration.

Next, the only really meaningful information these vertices have is a name attribute. If we are wanting to create a GraphQL type that represents one of these objects it would look like this:

  let Person = new GraphQLObjectType({
    name: 'Person',
    fields: () => ({
      name: {
        type: GraphQLString
      }
    })
  })

Now that we have a type that describes what a Person object looks like we can use it in a schema. This schema has a field called person which has two attributes: type, and resolve.

let schema = new GraphQLSchema({
    query: new GraphQLObjectType({
      name: 'Query',
      fields: () => ({
        person: {
          type: Person,
          resolve: () => {
            return {name: 'Mike'}
          },
        }
      })
    })
  })

The resolve is a function that will be run whenever graphql is asked to produce a person object. type is a type that describes the object that the resolve function returns, which in this this case is our Person type.

To see if this all works we can write a test using Jest.

import {
  graphql,
  GraphQLSchema,
  GraphQLObjectType,
  GraphQLString,
  GraphQLList,
  GraphQLNonNull
} from 'graphql'

describe('returning a hardcoded object that matches a type', () => {

  let Person = new GraphQLObjectType({
    name: 'Person',
    fields: () => ({
      name: {
        type: GraphQLString
      }
    })
  })

  let schema = new GraphQLSchema({
    query: new GraphQLObjectType({
      name: 'Query',
      fields: () => ({
        person: {
          type: Person,
          resolve: () => {
            return {name: 'Mike'}
          },
        }
      })
    })
  })

  it('lets you ask for a person', async () => {

    let query = `
      query {
        person {
          name
        }
      }
    `;

    let { data } = await graphql(schema, query)
    expect(data.person).toEqual({name: 'Mike'})
  })

})

This test passes which tells us that we got everything wired together properly, and the foundation laid to talk to ArangoDB.

First we’ll use arangojs and create a db instance and then a function that allows us to get a person using their name.

//src/database.js
import arangojs, { aql } from 'arangojs'

export const db = arangojs({
  url: `http://${process.env.ARANGODB_USER}:${process.env.ARANGODB_PASSWORD}@127.0.0.1:8529`,
  databaseName: 'knows'
})

export async function getPersonByName (name) {
  let query = aql`
      FOR person IN persons
        FILTER person.name == ${ name }
          LIMIT 1
          RETURN person
    `
  let results = await db.query(query)
  return results.next()
}

Now lets use that function with our schema to retrieve real data from ArangoDB.

import {
  graphql,
  GraphQLSchema,
  GraphQLObjectType,
  GraphQLString,
  GraphQLList,
  GraphQLNonNull
} from 'graphql'
import {
  db,
  getPersonByName
} from '../src/database'

describe('queries', () => {

  it('lets you ask for a person from the database', async () => {

    let Person = new GraphQLObjectType({
      name: 'Person',
      fields: () => ({
        name: {
          type: GraphQLString
        }
      })
    })

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          person: {
            args: { //person now accepts args
              name: { // the arg is called "name"
                type: new GraphQLNonNull(GraphQLString) // name is a string & manadatory
              }
            },
            type: Person,
            resolve: (root, args) => {
              return getPersonByName(args.name)
            },
          }
        })
      })
    })

    let query = `
        query {
          person(name "Eve") {
            name
          }
        }
      `

    let { data } = await graphql(schema, query)
    expect(data.person).toEqual({name: 'Eve'})
  })
})

Here we have modified our schema to accept a name argument when asking for a person. We access the name via the args object and pass it to our database function to go get the matching person from Arango.

Let’s add a new database function to get the friends of a user given their id.
What’s worth pointing out here is that we are using ArangoDB’s AQL traversal syntax. It allows us to do a graph traversal across outbound edges get the vertex on the other end of the edge.

export async function getFriends (id) {
  let query = aql`
      FOR vertex IN OUTBOUND ${id} knows
        RETURN vertex
    `
  let results = await db.query(query)
  return results.all()
}

Now that we have that function, instead of adding it to the schema, we add a field to the Person type. In the resolve for our new friends field we are going to use the root argument to get the id of the current person object and then use our getFriends function to do the traveral to retrieve the persons friends.

    let Person = new GraphQLObjectType({
      name: 'Person',
      fields: () => ({
        name: {
          type: GraphQLString
        },
        friends: {
          type: new GraphQLList(Person),
          resolve(root) {
            return getFriends(root._id)
          }
        }
      })
    })

What’s interesting is that because of GraphQL’s recursive nature, this change lets us query for friends:

        query {
          person(name: "Eve") {
            name
            friends {
              name
            }
          }
        }

and also ask for friends of friends (and so on) like this:

        query {
          person(name: "Eve") {
            name
            friends {
              name
              friends {
                name
              }
            }
          }
        }

We can show that with a test.

import {
  graphql,
  GraphQLSchema,
  GraphQLObjectType,
  GraphQLString,
  GraphQLList,
  GraphQLNonNull
} from 'graphql'
import {
  db,
  getPersonByName,
  getFriends
} from '../src/database'

describe('queries', () => {

  it('returns friends of friends', async () => {

    let Person = new GraphQLObjectType({
      name: 'Person',
      fields: () => ({
        name: {
          type: GraphQLString
        },
        friends: {
          type: new GraphQLList(Person),
          resolve(root) {
            return getFriends(root._id)
          }
        }
      })
    })

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          person: {
            args: {
              name: {
                type: new GraphQLNonNull(GraphQLString)
              }
            },
            type: Person,
            resolve: (root, args) => {
              return getPersonByName(args.name)
            },
          }
        })
      })
    })

    let query = `
        query {
          person(name: "Eve") {
            name
            friends {
              name
              friends {
                name
              }
            }
          }
        }
      `

    let result = await graphql(schema, query)
    let { friends } = result.data.person
    let foaf = [].concat(...friends.map(friend => friend.friends))
    expect([{name: 'Charlie'},{name: 'Dave'},{name: 'Bob'}]).toEqual(expect.arrayContaining(foaf))
  })

})

This test has running a query three levels deep and walking the entire graph. Because we can ask for any combination of any of the things our types defined, we have a whole lot of flexibility with very little code. The code that’s there is just a few simple functions, modular and easy to test.

But what did we trade away to get all that? If we look at the queries that get sent to Arango with tcpdump we can see how that sausage was made.

// getPersonByName('Eve') from the person resolver in our schema 
{"query":"FOR person IN persons
  FILTER person.name == @value0
  LIMIT 1 RETURN person","bindVars":{"value0":"Eve"}}
// getFriends('persons/eve') in Person type -> returns Bob & Alice.
{"query":"FOR vertex IN OUTBOUND @value0 knows
  RETURN vertex","bindVars":{"value0":"persons/eve"}}
// now a new request for each friend:
// getFriends('persons/bob')
{"query":"FOR vertex IN OUTBOUND @value0 knows
  RETURN vertex","bindVars":{"value0":"persons/bob"}}
// getFriends('persons/alice')
{"query":"FOR vertex IN OUTBOUND @value0 knows
  RETURN vertex","bindVars":{"value0":"persons/alice"}}

What we have here is our own version of the famous N+1 problem. If we were to add more people to this graph things would get out of hand quickly.

Facebook, which has been using GraphQL in production for years, is probably even less excited about the prospect of N+1 queries battering their database than we are. So what are they doing to solve this?

Using Dataloader

Dataloader is a small library released by Facebook that solves the N+1 problem by cleverly leveraging the way promises work. To use it, we need to give it a batch loading function and then replace our calls to the database with calls call Dataloader’s load method in all our resolves.

What, you might ask, is a batch loading function? The dataloader documentation offers that “A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values.”

We can write one of those.

async function getFriendsByIDs (ids) {
  let query = aql`
    FOR id IN ${ ids }
      let friends = (
        FOR vertex IN OUTBOUND id knows
          RETURN vertex
      )
      RETURN friends
  `
  let response = await db.query(query)
  return response.all()
}

We can then use that in a new test.

import {
  graphql
} from 'graphql'
import DataLoader from 'dataloader'
import {
  db,
  getFriendsByIDs
} from '../src/database'
import schema from '../src/schema'

describe('Using dataloader', () => {

  it('returns friends of friends', async () => {

    let Person = new GraphQLObjectType({
      name: 'Person',
      fields: () => ({
        name: {
          type: GraphQLString
        },
        friends: {
          type: new GraphQLList(Person),
          resolve(root, args, context) {
            return context.FriendsLoader.load(root._id)
          }
        }
      })
    })

    let query = `
        query {
          person(name: "Eve") {
            name
            friends {
              name
              friends {
                name
              }
            }
          }
        }
      `
    const FriendsLoader = new DataLoader(getFriendsByIDs)
    let result = await graphql(schema, query, {}, { FriendsLoader })
    let { person } = result.data
    expect(person.name).toEqual('Eve')
    expect(person.friends.length).toEqual(2)
    let names = person.friends.map(friend => friend.name)
    expect(names).toContain('Alice', 'Bob')
  })

})

The key section of the above test is this:

    const FriendsLoader = new DataLoader(getFriendsByIDs)
    //                         schema, query, root, context
    let result = await graphql(schema, query, {}, { FriendsLoader })

The context object is passed as the fourth parameter to the graphql function which is then available as the third parameter in every resolve function. With our FriendsLoader attached to the context object, you can see us accessing it in the resolve function on the Person type.

Let’s see what effect that batch loading has on our queries.

// getPersonByName('Eve') from the person resolver in our schema 
{"query":"FOR person IN persons
  FILTER person.name == @value0
  LIMIT 1 RETURN person","bindVars":{"value0":"Eve"}}
// getFriendsByIDs(["persons/eve"]) -> returns Bob & Alice.
{"query":"FOR id IN @value0
   let friends = (
    FOR vertex IN  OUTBOUND id knows
      RETURN vertex
    )
  RETURN friends","bindVars":{"value0":["persons/eve"]}}
// getFriendsByIDs(["persons/alice","persons/bob"])
{"query":"FOR id IN @value0
   let friends = (
    FOR vertex IN  OUTBOUND id knows
      RETURN vertex
    )
  RETURN friends","bindVars":{"value0":["persons/alice","persons/bob"]}}

Now for a three level query (Eve, her friends, their friends) we are down to just 1 query per level and the N+1 problem is no longer a problem.

When it’s time to serve your data to the world, express-graphql supplies a middleware that we can pass our schema and loaders to like this:

import express from 'express'
import graphqlHTTP from 'express-graphql'
import schema from './schema'
import DataLoader from 'dataloader'
import { getFriendsByIDs } from '../src/database'

const FriendsLoader = new DataLoader(getFriendsByIDs)
const app = express()
app.use('/graphql', graphqlHTTP({ schema, context: { FriendsLoader }}))
app.listen(3000)
// http://localhost:3000/graphql is up and running!

What we just did

With just those few code examples we’ve built a backend system that provides a query-able API for clients backed by a graph database. Growing it would look like adding a few more functions and a few more types. The code stays modular and testable. Dataloader has ensured that we didn’t even pay a performance penalty for that.

A perfect combination

While geeking out on the technology is fun, it loses sight of what I think is the larger point: The design of both GraphQL and ArangoDB allow you to combine and recombine some really simple primitives to tackle anything you can think of.

With ArangoDB, it’s all just documents, whether you use them like that or treat them as key/value or a graph is up to you. While this approach is marketed as “multi-model” database, the term is unfortunate since it makes the database sound like it’s trying to do lots of things instead of leveraging some fundamental similarity between these types of data. That similarity becomes the “primitive” that makes all this flexibility possible.

For GraphQL, my application is just a bunch of functions in an Abstract Syntax Tree which get combined and recombined by client queries. The parser and execution engine take care of what gets called when.

In each case what I need to understand is simple, the behaviour I can produce is complex.

I’m still honing my minimal set for front end development, but for backend development this is now how I build. These days I’m refocusing my learning to go narrow and deep and it feels good. Infinite width never felt sustainable. It’s not apparent at first, but once that burden is lifted off your shoulders you will realize how heavy it was.

Advertisements

Making requests in vanilla js with Apollo

There are lots of good reasons to be running GraphQL on the server. It’s clean, no ORM‘s or frameworks needed and has some interesting security properties too. But just because you are rockin’ the new hotness on the server side doesn’t mean you want it on the client side too. Sometimes the right thing is the simplest thing that can possibly work.

The Apollo Client is a GraphQL client made by the people behind Meteor. It aims to be an advanced and capable client that plays nice with the rest of the ecosystem. It has a lot going on, and sadly doesn’t seem to spend much time advertising that it’s actually a pretty great fit for those “simplest thing that can possibly work” moments as well.

Installing it is roughly what you might expect, but you also need the graphql-tag library so you can create queries Javascript’s new tagged template literals.

npm install --save apollo-client graphql-tag

So here, in all it’s glory, “simplest thing that can possibly work”:

import ApolloClient from 'apollo-client'
import gql from 'graphql-tag'

const client = new ApolloClient();

let query = gql`
  query {
    foo {
      bar
    }
  }
`
client.query({query}).then((results) => {
  //do something useful
})

I think this is actually even more simple than Lokka, which actually bills itself as the “Simple JavaScript Client for GraphQL”.

If you need to specify your endpoint as something other than the host the js came from, then you get to add just a little extra:

import ApolloClient, { createNetworkInterface } from 'apollo-client'

const opts = {uri: 'http://example.com:8080/graphql'}
const networkInterface = createNetworkInterface(opts)
const client = new ApolloClient({
  networkInterface,
});

But simple doesn’t mean we are restricted to queries only. Mutations can be simple too:

let mutation = gql`
  mutation ($foo: [FooInput] $bar: String!) {
    addFoo(
      foo: $foo
      bar: $bar
    ){
      foo
      bar
    }
  }
`

client.mutate({mutation, variables: {foo: [1,2,3], bar: "baz"}}).then((results) => {
  //do something with result
})

Obviously you will need the server side schema to support that, but that is all that is needed on the client.

Apollo has a tonne of features and integrates with Redux nicely (it does caching with it’s own internal Redux store unless you want it to use yours). While simplicity doesn’t appear to be it’s focus, the Apollo client is certainly capable of it. You’d just never guess from the documentation. Hopefully this will make it a little easier to appreciate the simple side of Apollo.

Hello GraphQL

One of the most interesting projects to me lately has been Facebook’s GraphQL. Announced at React.conf in January, those of us that were excited by the idea have had to wait, first for the spec to be formalized and then for some actual running code.

I think the GraphQL team is on to something big (it’s positioned as an alternative to REST to give a sense of how big), and I’ve been meaning to dig in to it for a while, but it was never clear where to start. So after a little head-scratching and a little RTFM, I want to share a GraphQL hello world.

So what does that look like? Well Facebook as released two projects: graphql-js and express-graphql. Graphql-js is the reference implementation of what is described in the spec. express-graphql is a middleware component for the express framework that lets you use graphql.

So express is going to be our starting point. First we need to create a new project using the generator:

mike@longshot:~☺  express --git -e gql_hello

create : gql_hello
create : gql_hello/package.json
create : gql_hello/app.js
create : gql_hello/.gitignore
create : gql_hello/public
create : gql_hello/routes
create : gql_hello/routes/index.js
create : gql_hello/routes/users.js
create : gql_hello/views
create : gql_hello/views/index.ejs
create : gql_hello/views/error.ejs
create : gql_hello/bin
create : gql_hello/bin/www
create : gql_hello/public/javascripts
create : gql_hello/public/images
create : gql_hello/public/stylesheets
create : gql_hello/public/stylesheets/style.css

install dependencies:
$ cd gql_hello && npm install

run the app:
$ DEBUG=gql_hello:* npm start

Lets do as we are told and run cd gql_hello && npm install.
When that’s done we can get to the interesting stuff.
Next up will be installing graphql and the middleware using the –save option so that our app’s dependencies in our package.json will be updated:

mike@longshot:~/gql_hello☺  npm install --save express-graphql graphql babel
npm WARN install Couldn't install optional dependency: Unsupported
npm WARN prefer global babel@5.8.23 should be installed with -g
...

I took the basic app.js file that was generated and just added the following:

app.use('/', routes);
app.use('/users', users);

// GraphQL:
var graphqlHTTP = require('express-graphql');

import {
  graphql,
  GraphQLSchema,
  GraphQLObjectType,
  GraphQLString,
} from 'graphql';

var schema = new GraphQLSchema({
  query: new GraphQLObjectType({
    name: 'RootQueryType',
    fields: {
      hello: {
        type: GraphQLString,
        resolve() {
          return 'world';
        }
      }
    }
  })
});

//Mount the middleware on the /graphql endpoint:
app.use('/graphql', graphqlHTTP({ schema: schema , pretty: true}));
//That's it!

// catch 404 and forward to error handler
app.use(function(req, res, next) {
  var err = new Error('Not Found');
  err.status = 404;
  next(err);
});

Notice that we are passing our GraphQL schema to graphqlHTTP as well as pretty: true so that responses from the server will be pretty printed.

One other thing is that since those GraphQL libraries make extensive use of ECMAScript 6 syntax, we will need to use the Babel Transpiler to actually be able to run this thing.

If you installed Babel with npm install -g babel you can add the following to your package.json scripts section:

  {
    "start": "babel-node ./bin/www"
  }

Because I didn’t install it globally, I’ll just point to it in the node_modules folder:

  {
    "start": "node_modules/babel/bin/babel-node.js ./bin/www"
  }

With that done we can use npm start to start the app and try things out using curl:

mike@longshot:~☺  curl localhost:3000/graphql?query=%7Bhello%7D
{
  "data": {
    "hello": "world"
  }
}

Looking back at the schema we defined, we can see that our request {hello} (or %7Bhello%7D when its been url encoded) caused the resolve function to be called, which returned the string “world”.

{
  name: 'RootQueryType',
  fields: {
    hello: {
      type: GraphQLString,
      resolve() {
        return 'world';
      }
    }
  }
}

This explains what they mean when you hear that GraphQL “exposes fields that are backed by arbitrary code”. What drew me to GraphQL is that it seems to be a great solution for people with graph database backed applications, but it’s only now that I realize that GraphQL is much more flexible. That string could have just as easily pulled something out of a relational database or calculated something useful. In fact this might be the only time “arbitrary code execution” is something to be happy about.

I’m super excited to explore this further and to start using it with ArangoDB. If you want to dig deeper I suggest you check out Exploring GraphQL and Creating a GraphQL server and of course read the spec.