ArangoDB’s geo-spatial functions

I’ve been playing with ArangoDB a lot lately. As a document database it looks to be a drop-in replacement for MongoDB, but it goes further, allowing graph traversals and geo-spatial queries.

Since I have a geo-referenced data set in mind I wanted to get to know its geo-spatial functions. I found the documentation a kind of unclear so I thought I would write up my exploration here.

At the moment there are only two geo-spatial functions in Arango; WITHIN and NEAR. Lets make some test data using the arango shell. Run arangosh and then the following:

db._create('cities')
db.cities.save({name: 'Ottawa', lat: 45.4215296, lng: -75.69719309999999})
db.cities.save({name: 'Montreal', lat: 45.5086699, lng: -73.55399249999999})
db.cities.save({name: 'São Paulo', lat: -23.5505199, lng: -46.63330939999999})

We will also need a geo-index for the functions to work. You can create one by passing in the name(s) of the fields that hold the latitude and longitude. In our case I just called them lat and lng so:

db.cities.ensureGeoIndex('lat', 'lng')

Alternately I could have done:

db.cities.save({name: 'Ottawa', location: [45.4215296, -75.69719309999999]})
db.cities.ensureGeoIndex('location')

As long as the values are of type double life is good. If you have some documents in the collection that don’t have the key(s) you specified for the index it will just ignore them.

First up is the WITHIN function. Its pretty much what you might expect, you give it a lat/lng and a radius and it gives you records with the area you specified. What is a little unexpected it that the radius is given in meters. So I am going to ask for the documents that are closest to the lat/lng of my favourite coffee shop (45.42890720357919, -75.68796873092651). To make the results more interesting I’ll ask for a 170000 meter radius (I know that Montreal is about 170 kilometers from Ottawa) so I should see those two cities in the result set:

arangosh [_system]> db._createStatement({query: 'FOR city in WITHIN(cities, 45.42890720357919, -75.68796873092651, 170000) RETURN city'}).execute().toArray()
[ 
  {
    "_id" : "cities/393503132620",
    "_rev" : "393503132620",
    "_key" : "393503132620",
    "lat" : 45.4215296,
    "lng" : -75.69719309999999,
    "name" : "Ottawa"
  },
  {
    "_id" : "cities/393504967628",
    "_rev" : "393504967628",
    "_key" : "393504967628",
    "lat" : 45.5086699,
    "lng" : -73.55399249999999,
    "name" : "Montreal"
  }
]

]

There is also an optional “distancename” parameter which, when given, prompts Arango to add the number of meters from your target point each document is. We can use that like this:

arangosh [_system]> db._createStatement({query: 'FOR city in WITHIN(cities, 45.42890720357919, -75.68796873092651, 170000, "distance_from_artissimo_cafe") RETURN city'}).execute().toArray()
[ 
  {
    "_id" : "cities/393503132620",
    "_rev" : "393503132620",
    "_key" : "393503132620",
    "distance_from_artissimo_cafe" : 1091.4226157106734,
    "lat" : 45.4215296,
    "lng" : -75.69719309999999,
    "name" : "Ottawa"
  },
  {
    "_id" : "cities/393504967628",
    "_rev" : "393504967628",
    "_key" : "393504967628",
    "distance_from_artissimo_cafe" : 166640.3086328647,
    "lat" : 45.5086699,
    "lng" : -73.55399249999999,
    "name" : "Montreal"
  } 
]

Arango’s NEAR function returns a set of documents ordered by their distance in meters from the lat/lng you provide. The number of documents in the set is controlled by the optional “limit” argument (which defaults to 100) and the same “distancename” as above. I am going to limit the result set to 3 (I only have 3 records in there anyway), and use my coffeeshop again:

arangosh [_system]> db._createStatement({query: 'FOR city in NEAR(cities, 45.42890720357919, -75.68796873092651, 3, "distance_from_artissimo_cafe") RETURN city'}).execute().toArray()
[ 
  {
    "_id" : "cities/393503132620",
    "_rev" : "393503132620",
    "_key" : "393503132620",
    "distance_from_artissimo_cafe" : 1091.4226157106734,
    "lat" : 45.4215296,
    "lng" : -75.69719309999999,
    "name" : "Ottawa"
  },
  {
    "_id" : "cities/393504967628",
    "_rev" : "393504967628",
    "_key" : "393504967628",
    "distance_from_artissimo_cafe" : 166640.3086328647,
    "lat" : 45.5086699,
    "lng" : -73.55399249999999,
    "name" : "Montreal"
  },
  {
    "_id" : "cities/393506343884",
    "_rev" : "393506343884",
    "_key" : "393506343884",
    "distance_from_artissimo_cafe" : 8214463.292795454,
    "lat" : -23.5505199,
    "lng" : -46.63330939999999,
    "name" : "São Paulo"
  } 
]

As you can see ArangoDB’s geo-spatial functionality is sparse but certainly enough to do some interesting things. Being able to act as a graph database AND do geo-spatial queries places Arango in a really interesting position and I am hoping to see its capabilities in both those areas expand. I’ve sent a feature request for WITHIN_BOUNDS, which I think would make working with leaflet.js or Google maps really nice, since it would save me doing a bunch of calculations with the map centre and the current zoom level to figure out a radius in meters for my query. I’ll keep my fingers crossed…

Update: My WITHIN_BOUNDS suggestion was actually implemented as WITHIN_RECTANGLE, and there is more geo stuff coming soon according to the roadmap.

Advertisements