Querying the Openstreetmap Dataset

While much has been written about putting data into OpenStreetMap (OSM), it doesn’t feel like much has been said about getting data out. For those familiar with GIS software, grabbing a “metro extract” is a reasonable place to start, but for developers or regular users its not quite as clear how to get at the data we can see is in there.

The first way to get at the data is with the Overpass API. Overpass was started by Roland Olbricht in 2008 as a way to ask for some specified subset of the OSM data.

Lets say I was curious about the number of bike racks that could hold 8 bikes in down-town Ottawa. The first thing to know is that OSM data is XML, which means that each element (node/way/area/relation) looks something like this:

  <node id="3046036633" lat="45.4168480" lon="-75.7016922">
    <tag k="access" v="public"/>
    <tag k="amenity" v="bicycle_parking"/>
    <tag k="bicycle_parking" v="rack"/>
    <tag k="capacity" v="8"/>
  </node>

Basically any XML element may be associated with a bunch tags containing keys and values.

You specify which elements of the OSM dataset are interesting to you by creating an Overpass query in XML format or using a query language called Overpass QL. You can use either one, but I’m using XML here.

Here is a query asking for all the elements of type “node” that has both a tag with a key of “amenity” and a value of “bicycle_parking” as well as a tag with a key of “capacity” and a value of “8”. You can also see my query includes a bbox-query element with coordinates for North, East, South, and West supplied; the two corners of a bounding box so search will be limited to that geographic area.

<osm-script output="json">
  <query type="node">
    <has-kv k="amenity" v="bicycle_parking"/>
    <has-kv k="capacity" v="8"/>
    <bbox-query e="-75.69105863571167" n="45.42274779392456" s="45.415714100972636" w="-75.70568203926086"/>
  </query>
  <print/>
</osm-script>

I’ve saved that query into a file named “query” and I am using cat to read the file and pass the text to curl which sends the query.

mike@longshot:~/osm☺  cat query | curl -X POST -d @- http://overpass-api.de/api/interpreter{
  "version": 0.6,
  "generator": "Overpass API",
  "osm3s": {
    "timestamp_osm_base": "2014-08-27T18:47:01Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [

{
  "type": "node",
  "id": 3046036633,
  "lat": 45.4168480,
  "lon": -75.7016922,
  "tags": {
    "access": "public",
    "amenity": "bicycle_parking",
    "bicycle_parking": "rack",
    "capacity": "8"
  }
},
{
  "type": "node",
  "id": 3046036634,
  "lat": 45.4168354,
  "lon": -75.7017258,
  "tags": {
    "access": "public",
    "amenity": "bicycle_parking",
    "capacity": "8",
    "covered": "no"
  }
},
{
  "type": "node",
  "id": 3046036636,
  "lat": 45.4168223,
  "lon": -75.7017618,
  "tags": {
    "access": "public",
    "amenity": "bicycle_parking",
    "bicycle_parking": "rack",
    "capacity": "8"
  }
}

  ]
}

This is pretty exciting, but its worth pointing out that the response is JSON, and not GeoJSON which you will probably want for doing things with Leaflet. The author is certainly aware of it and apparently working on it, but in the meantime you will need to use the npm module osmtogeojson if you need to do the conversion from what Overpass gives to what Leaflet accepts.

So what might that get you? Well lets say you are trying to calculate the total amount of bike parking in down-town Ottawa. With a single API call (this time using the Overpass QL, so its cut & paste friendly), we can tally up the capacity tags:

mike@longshot:~/osm☺  curl -s -g 'http://overpass-api.de/api/interpreter?data=[out:json];node["amenity"="bicycle_parking"](45.415714100972636,-75.70568203926086,45.42274779392456,-75.69105863571167);out;' | grep capacity | tr -d ',":' | sort | uniq -c
      2     capacity 10
      7     capacity 2
      6     capacity 8

Looks like more bike racks need to be tagged with “capacity”, but its a good start on coming up with a total.

Building on the Overpass API is the web based Overpass-turbo. If you are an regular user trying to get some “how many of X in this area” type questions answered, this is the place to go. Its also helpful for developers looking to work the kinks out of a query.

Displaying my edits in the Ottawa area.
Using Overpass-Turbo to display my edits in the Ottawa area.

Its really simple to get started using the wizard, which helps write a query for you. With a little fooling around with the styles you can do some really interesting stuff. As an example, we can colour the bicycle parking according to its capacity so we can see which ones have a capacity tag and which ones don’t. The query ends up looking like this:

<osm-script timeout="25">
  <!-- gather results -->
  <union>
    <!-- query part for: “amenity=bicycle_parking” -->
    <query type="node">
      <has-kv k="amenity" v="bicycle_parking"/>
      <bbox-query {{bbox}}/>
    </query>
    {{style:
      node[amenity=bicycle_parking]{ fill-opacity: 1; fill-color: grey;color: white;}
      node[capacity=2]{ fill-color: yellow; }
      node[capacity=8]{ fill-color: orange;}
      node[capacity=10]{fill-color: red;}
    }}
  </union>
  <print mode="body"/>
  <recurse type="down"/>
  <print mode="skeleton" order="quadtile"/>
</osm-script>

Bike racks with no capacity attribute will be grey. You can see the result here.

While Overpass-turbo might not be as sophisticated as CartoDB, it is really approachable and surprisingly capable. Highlighting certain nodes, picking out the edits of a particular user, there are lots of interesting applications.

Being able to query the OSM data easily opens some interesting possibilities. If you are gathering data for whatever reason, you are going to run into the problems of where to store it, and how to keep it up to date. One way of dealing with both of those is to store your data in OSM.

With all the thinking that has gone into what attributes can be attached  to things like trees, bike racks, and public art, you can store a surprising amount of information in a single point. Once saved into the OSM dataset, you will always know where to find the most current version of your data, and backups are dealt with for you.

This approach  also opens the door to other people helping you keep it up to date. Asking for volunteers or running hackathons to help you update your data is pretty reasonable when it also means improving a valuable public resource, instead of just enriching the owner alone. Once the data is in OSM, the maintenance burden is easy to distribute.

When its time to revisit your question, fresh data will only ever be an Overpass query away…

Something to think about.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s