ArangoDB’s geo-spatial functions

I’ve been playing with ArangoDB a lot lately. As a document database it looks to be a drop-in replacement for MongoDB, but it goes further, allowing graph traversals and geo-spatial queries.

Since I have a geo-referenced data set in mind I wanted to get to know its geo-spatial functions. I found the documentation a kind of unclear so I thought I would write up my exploration here.

At the moment there are only two geo-spatial functions in Arango; WITHIN and NEAR. Lets make some test data using the arango shell. Run arangosh and then the following:

db._create('cities')
db.cities.save({name: 'Ottawa', lat: 45.4215296, lng: -75.69719309999999})
db.cities.save({name: 'Montreal', lat: 45.5086699, lng: -73.55399249999999})
db.cities.save({name: 'São Paulo', lat: -23.5505199, lng: -46.63330939999999})

We will also need a geo-index for the functions to work. You can create one by passing in the name(s) of the fields that hold the latitude and longitude. In our case I just called them lat and lng so:

db.cities.ensureGeoIndex('lat', 'lng')

Alternately I could have done:

db.cities.save({name: 'Ottawa', location: [45.4215296, -75.69719309999999]})
db.cities.ensureGeoIndex('location')

As long as the values are of type double life is good. If you have some documents in the collection that don’t have the key(s) you specified for the index it will just ignore them.

First up is the WITHIN function. Its pretty much what you might expect, you give it a lat/lng and a radius and it gives you records with the area you specified. What is a little unexpected it that the radius is given in meters. So I am going to ask for the documents that are closest to the lat/lng of my favourite coffee shop (45.42890720357919, -75.68796873092651). To make the results more interesting I’ll ask for a 170000 meter radius (I know that Montreal is about 170 kilometers from Ottawa) so I should see those two cities in the result set:

arangosh [_system]> db._createStatement({query: 'FOR city in WITHIN(cities, 45.42890720357919, -75.68796873092651, 170000) RETURN city'}).execute().toArray()
[ 
  {
    "_id" : "cities/393503132620",
    "_rev" : "393503132620",
    "_key" : "393503132620",
    "lat" : 45.4215296,
    "lng" : -75.69719309999999,
    "name" : "Ottawa"
  },
  {
    "_id" : "cities/393504967628",
    "_rev" : "393504967628",
    "_key" : "393504967628",
    "lat" : 45.5086699,
    "lng" : -73.55399249999999,
    "name" : "Montreal"
  }
]

]

There is also an optional “distancename” parameter which, when given, prompts Arango to add the number of meters from your target point each document is. We can use that like this:

arangosh [_system]> db._createStatement({query: 'FOR city in WITHIN(cities, 45.42890720357919, -75.68796873092651, 170000, "distance_from_artissimo_cafe") RETURN city'}).execute().toArray()
[ 
  {
    "_id" : "cities/393503132620",
    "_rev" : "393503132620",
    "_key" : "393503132620",
    "distance_from_artissimo_cafe" : 1091.4226157106734,
    "lat" : 45.4215296,
    "lng" : -75.69719309999999,
    "name" : "Ottawa"
  },
  {
    "_id" : "cities/393504967628",
    "_rev" : "393504967628",
    "_key" : "393504967628",
    "distance_from_artissimo_cafe" : 166640.3086328647,
    "lat" : 45.5086699,
    "lng" : -73.55399249999999,
    "name" : "Montreal"
  } 
]

Arango’s NEAR function returns a set of documents ordered by their distance in meters from the lat/lng you provide. The number of documents in the set is controlled by the optional “limit” argument (which defaults to 100) and the same “distancename” as above. I am going to limit the result set to 3 (I only have 3 records in there anyway), and use my coffeeshop again:

arangosh [_system]> db._createStatement({query: 'FOR city in NEAR(cities, 45.42890720357919, -75.68796873092651, 3, "distance_from_artissimo_cafe") RETURN city'}).execute().toArray()
[ 
  {
    "_id" : "cities/393503132620",
    "_rev" : "393503132620",
    "_key" : "393503132620",
    "distance_from_artissimo_cafe" : 1091.4226157106734,
    "lat" : 45.4215296,
    "lng" : -75.69719309999999,
    "name" : "Ottawa"
  },
  {
    "_id" : "cities/393504967628",
    "_rev" : "393504967628",
    "_key" : "393504967628",
    "distance_from_artissimo_cafe" : 166640.3086328647,
    "lat" : 45.5086699,
    "lng" : -73.55399249999999,
    "name" : "Montreal"
  },
  {
    "_id" : "cities/393506343884",
    "_rev" : "393506343884",
    "_key" : "393506343884",
    "distance_from_artissimo_cafe" : 8214463.292795454,
    "lat" : -23.5505199,
    "lng" : -46.63330939999999,
    "name" : "São Paulo"
  } 
]

As you can see ArangoDB’s geo-spatial functionality is sparse but certainly enough to do some interesting things. Being able to act as a graph database AND do geo-spatial queries places Arango in a really interesting position and I am hoping to see its capabilities in both those areas expand. I’ve sent a feature request for WITHIN_BOUNDS, which I think would make working with leaflet.js or Google maps really nice, since it would save me doing a bunch of calculations with the map centre and the current zoom level to figure out a radius in meters for my query. I’ll keep my fingers crossed…

Update: My WITHIN_BOUNDS suggestion was actually implemented as WITHIN_RECTANGLE, and there is more geo stuff coming soon according to the roadmap.

Advertisements

2 thoughts on “ArangoDB’s geo-spatial functions”

  1. Hi Mike,

    we are indeed planning to continue working on the geo features.

    Currently the syntax of (and minor details in the algorithm behind) WITHIN allow only cirles. In the future we will allow other areas: squares, rectangles, ellipses, convex and arbitrary polygons.

    Rectangles will give you exactely the WITHIN_BOUNDS functionality that you are asking for.

    We also plan to rewrite the index code to make insertions faster (lookup times are fine, but inserting millions of documents in an geo index can be rather slow).

    It is in our plan – in two parts: general work on indices and general work on queries. However for those two parts we cannot tell you an exact release date – there are other higher priority features that we need to address first ;-).

    If you have further questions, comments, requests or suggestions I would be happy to hear them.

    Regards, martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s