Getting started with graph databases

I have a personal project I have been chipping away on for a little while now. I’ve been slowly adding more and more test data to it and as I do its become increasingly clear that while the data itself is neat, the stuff that is actually interesting is actually the relationships between the various entities and not so much the entities themselves. This realization led me to do some reading about graph databases. O’Reilly (as usual) has an interesting book on Graph Databases written by Ian Robinson, Jim Webber, and Emil Eifrem. Its a good intro but given that they hold positions of engineer, chief scientist and CEO at the company that makes the Neo4j graph database, its unsurprisingly focused on Neo4j.

Unfortunately the ‘j’ part of Neo4j refers to Java, which is a whole can of worms that I would rather not open. So I set off to look for a graph database that would not force me onto the JVM or trap me with open-core licencing, and ultimately found ArangoDB.

Licenced under Apache 2, ArangoDB (formerly AvacadoDB) is a document database, and so far looks like it could be a drop in replacement for MongoDB. What’s interesting is that it can also do key/value stuff like Redis and graphs like Neo4j.
Since its written in C++ I don’t have to worry about the JVM. So, lets get started with with it!

Installation is painless on Ubuntu:

wget -qO - http://www.arangodb.org/repositories/arangodb2/xUbuntu_13.10/Release.key | sudo apt-key add -
sudo sh -c "echo 'deb http://www.arangodb.org/repositories/arangodb2/xUbuntu_13.10/ /' > /etc/apt/sources.list.d/arangodb.list"
sudo apt-get update && sudo apt-get install arangodb

Since this is my first contact with graphs, I want a dataset that I can get a feel for working with graphs. Fortunately the company behind ArangoDB (triAGENS) has put some sample data up on github to get people started:

$> git clone https://github.com/triAGENS/ArangoDB-Data.git
Cloning into 'ArangoDB-Data'...
...
$> cd ArangoDB-Data/Graphs/IMDB
$> ./import.sh

That import script imports a bunch of IMDB data into ArangoDB and means that we can start exploring with the arango shell:

$> arangosh

                                       _     
  __ _ _ __ __ _ _ __   __ _  ___  ___| |__  
 / _` | '__/ _` | '_ \ / _` |/ _ \/ __| '_ \ 
| (_| | | | (_| | | | | (_| | (_) \__ \ | | |
 \__,_|_|  \__,_|_| |_|\__, |\___/|___/_| |_|
                       |___/                 

Welcome to arangosh 2.0.2 [linux]. Copyright (c) triAGENS GmbH
Using Google V8 3.16.14 JavaScript engine, READLINE 6.2, ICU 4.8.1.1

Pretty printing values.
Connected to ArangoDB 'tcp://localhost:8529' version: 2.0.2, database: '_system', username: 'root'

use 'help' to see common examples
arangosh [_system]>

Tab completion works super well here to give a sense of what your options are, but the first thing we care about is figuring out what that import did for us. You can see it created two collections (imdb_vertices and imdb_edges) with the db._collections() function:

arangosh [_system]> db._collections()
[ 
  [ArangoCollection 3021163, "_aal" (type document, status loaded)], 
  [ArangoCollection 1317227, "_graphs" (type document, status loaded)], 
  [ArangoCollection 3545451, "_replication" (type document, status loaded)], 
  [ArangoCollection 137579, "_users" (type document, status loaded)], 
  [ArangoCollection 1513835, "_cluster_kickstarter_plans" (type document, status loaded)], 
  [ArangoCollection 940644715, "vertices" (type document, status loaded)], 
  [ArangoCollection 3414379, "_aqlfunctions" (type document, status loaded)], 
  [ArangoCollection 1382763, "_modules" (type document, status loaded)], 
  [ArangoCollection 3610987, "_statistics" (type document, status loaded)], 
  [ArangoCollection 1160255851, "imdb_vertices" (type document, status loaded)], 
  [ArangoCollection 940710251, "edges" (type edge, status loaded)], 
  [ArangoCollection 3479915, "_trx" (type document, status loaded)], 
  [ArangoCollection 266194196843, "imdb_edges" (type edge, status loaded)], 
  [ArangoCollection 1448299, "_routing" (type document, status loaded)] 
]

We can also pick random documents out of the vertices collection with the .any() function to get a sense of whats in there.

 db.imdb_vertices.any()
{ 
  "_id" : "imdb_vertices/40233", 
  "_rev" : "6407199083", 
  "_key" : "40233", 
  "version" : 21, 
  "id" : "65952", 
  "type" : "Person", 
  "birthplace" : "", 
  "biography" : "", 
  "label" : "Jude Poyer", 
  "lastModified" : "1301901667000", 
  "name" : "Jude Poyer" 
}

If you have spent any time on the internet you will of course know that the obvious use for an IMDB graph is calculate Bacon numbers. So lets see if we can find Kevin in here:

arangosh [_system]> db._query('FOR Person IN imdb_vertices FILTER Person.name == "Kevin Bacon" RETURN Person').toArray()
[ 
  { 
    "_id" : "imdb_vertices/759", 
    "_rev" : "1218713963", 
    "_key" : "759", 
    "version" : 146, 
    "id" : "4724", 
    "type" : "Person", 
    "biography" : "", 
    "label" : "Kevin Bacon", 
    "lastModified" : "1299491319000", 
    "name" : "Kevin Bacon", 
    "birthplace" : "Philadelphia", 
    "profileImageUrl" : "http://cf1.imgobject.com/profiles/3e0/4bed49cf017a3c37a30003e0/kevin-bacon-profi...", 
    "birthday" : "-362451600000" 
  } 
]

And let’s see if we can connect him to, say, Kate Winslet. Since we know that Kevin is id imdb_vertices/759 and a little digging shows that Kate’s id is imdb_vertices/1088. We can pass those ids along with the imdb_vertices and imdb_edges collections to the SHORTEST_PATH function ArangoDB supplies for it to make the link between them:

arangosh [_system]> db._query('RETURN SHORTEST_PATH(imdb_vertices, imdb_edges, "imdb_vertices/759", "imdb_vertices/1088", "any", { maxIterations: 100000})').toArray()
[ 
  [ 
    { 
      "vertex" : { 
        "_id" : "imdb_vertices/759", 
        "_rev" : "1218713963", 
        "_key" : "759", 
        "version" : 146, 
        "id" : "4724", 
        "type" : "Person", 
        "biography" : "", 
        "label" : "Kevin Bacon", 
        "lastModified" : "1299491319000", 
        "name" : "Kevin Bacon", 
        "birthplace" : "Philadelphia", 
        "profileImageUrl" : "http://cf1.imgobject.com/profiles/3e0/4bed49cf017a3c37a30003e0/kevin-bacon-profi...", 
        "birthday" : "-362451600000" 
      } 
    }, 
    { 
      "vertex" : { 
        "_id" : "imdb_vertices/35451", 
        "_rev" : "5779626347", 
        "_key" : "35451", 
        "runtime" : 87, 
        "version" : 186, 
        "id" : "9692", 
        "genre" : "Drama", 
        "language" : "en", 
        "type" : "Movie", 
        "homepage" : "", 
        "tagline" : "", 
        "title" : "The Woodsman", 
        "label" : "The Woodsman", 
        "description" : "A pedophile returns to his hometown after 12 years in prison and attempts to sta...", 
        "imdbId" : "tt0361127", 
        "lastModified" : "1301903901000", 
        "imageUrl" : "http://cf1.imgobject.com/posters/3c1/4bc9281e017a3c57fe0103c1/the-woodsman-mid.j...", 
        "studio" : "Dash Films", 
        "releaseDate" : "1103842800000", 
        "released" : "2000-2010" 
      } 
    }, 
    { 
      "vertex" : { 
        "_id" : "imdb_vertices/1179", 
        "_rev" : "1274747243", 
        "_key" : "1179", 
        "version" : 90, 
        "id" : "335", 
        "type" : "Person", 
        "biography" : "", 
        "label" : "Michael Shannon", 
        "lastModified" : "1299902807000", 
        "name" : "Michael Shannon", 
        "profileImageUrl" : "http://cf1.imgobject.com/profiles/01c/4c2a3dc87b9aa15e9900001c/michael-shannon-p..." 
      } 
    }, 
    { 
      "vertex" : { 
        "_id" : "imdb_vertices/21077", 
        "_rev" : "3892517227", 
        "_key" : "21077", 
        "runtime" : 119, 
        "version" : 339, 
        "id" : "4148", 
        "genre" : "Drama", 
        "language" : "en", 
        "type" : "Movie", 
        "homepage" : "", 
        "tagline" : "", 
        "title" : "Revolutionary Road", 
        "label" : "Revolutionary Road", 
        "description" : "A young couple living in a Connecticut suburb during the mid-1950s struggle to c...", 
        "imdbId" : "tt0959337", 
        "trailer" : "http://www.youtube.com/watch?v=af01__Kvvr8", 
        "lastModified" : "1301907499000", 
        "imageUrl" : "http://cf1.imgobject.com/posters/627/4d4f8e275e73d617b7003627/revolutionary-road...", 
        "studio" : "BBC Films", 
        "releaseDate" : "1229641200000", 
        "released" : "2000-2010" 
      } 
    }, 
    { 
      "vertex" : { 
        "_id" : "imdb_vertices/1088", 
        "_rev" : "1262754155", 
        "_key" : "1088", 
        "version" : 102, 
        "id" : "204", 
        "type" : "Person", 
        "label" : "Kate Winslet", 
        "lastModified" : "1299746700000", 
        "name" : "Kate Winslet", 
        "birthplace" : "Reading, UK", 
        "profileImageUrl" : "http://cf1.imgobject.com/profiles/59f/4c022d0e017a3c702d00159f/kate-winslet-prof...", 
        "biography" : "<meta charset=\"utf-8\"><span style=\"font-family: sans-serif; font-size: 18px; lin...", 
        "birthday" : "181695600000" 
      } 
    } 
  ] 
]

So what we can see here is that it takes two hops (from Kevin to Michael Shannon via “The Woodsman“, and from Michael to Kate via “Revolutionary Road“), to connect Kevin Bacon to Kate Winslet, giving her a Bacon number of 2.

For the moment that is as far as I have gotten but I am pretty excited to explore the possiblities here. The more I think about graphs as a data model the more they seem to be a good fit for a lot of problems that I would normally be forcing into tables. Given that I can also just do straight document storage and the fact that they have a Object Document Mapper that works with Rails, I can tell you that ArangoDB and I will be spending a lot of time together.

Why Virtualenv?

Virtualenv comes up often when learning about Python. It’s a Python library that creates a folder into which you install all the libraries your project will need. While its often stated that you should use it, its not often explained why. I recently stumbled upon a good intro that gives an example of creating an application that uses requests and then giving the scenario where running sudo pip install --upgrade requests while working on a separate project breaks the first application.
The idea that updating a library in one project would/could break some/all of my other projects that rely on that library is bizarre and kind of terrifying. It’s nice that the solution to the problem is apparently Virtualenv, but why is this a problem to begin with?

The root of this problem seems to be Pip. If I install version 1.0 of the testing library nose (because I am using pyenv) it gets placed in ~/.pyenv/versions/3.4.0/lib/python3.4/site-packages/. Looking in there, I can see folders for both the code and the metadata (the egg-info folder):

nose/                      nose-1.0.0-py3.4.egg-info/

If I run the pip install --upgrade command you can see the problem unfold:

mike@sleepycat:~/projects/play/python_play☺  pip install --upgrade nose
Downloading/unpacking nose from https://pypi.python.org/packages/source/n/nose/nose-1.3.1.tar.gz#md5=672398801ddf5ba745c55c6eed79c5aa
  Downloading nose-1.3.1.tar.gz (274kB): 274kB downloaded
...
Installing collected packages: nose
  Found existing installation: nose 1.0.0
    Uninstalling nose:
      Successfully uninstalled nose
  Running setup.py install for nose
...

Yup, Pip only installs a single version of a library on your system. A quick look back in the ~/.pyenv/versions/3.4.0/lib/python3.4/site-packages/ folder confirms what Pip’s insanely verbose output told us, our nose 1.0 is gone:

nose/                      nose-1.0.0-py3.4.egg-info/

This is pretty surprising for someone whose expectations have been shaped by Ruby’s package manager rubygems. You can see multiple versions of the same library coexisting in the interpreter’s gems folder, meaning that my old projects will still be able to use the old version while my new projects can use the newer without carnage:

ls ~/.rbenv/versions/rbx-2.2.5/gems/gems/arel-
arel-3.0.3/ arel-4.0.1/

Returning to the reason for needing Virtualenv, at first glance it seems you need Virtualenv to protect you from Pip’s inability to do multiple versions of a library. What’s interesting is that both Virtualenv and Pip where written by the same person, Ian Bicking, Virtualenv in 2007 and Pip in 2008. What this seems to suggest is that installing a single version is a design decision made because Pip assumes the existence/use of something like Virtualenv. This is especially true when you realize that Pip was aimed at replacing easy_install, an earlier tool which actually could do multiple versions of the same library as Rubygems had since 2003.

So if you have ever wondered why you need Virtualenv, its seems we have an answer. Pip has pretty much completely replaced previous package managers, and it was developed to assume Virtualenv or something similar is being used… and its assumptions essentially force you to use it.

For those of us starting out with Python, sorting out the ins and outs of the messy world of Python packaging is a pain. The old system seems to be broken, the current one using Virtualenv/Pip is hacky and far from ideal, and the future seems to be largely ignored. Fortunately the beginnings of a clean solution appear to be coming from the world of Docker, so we will have to watch that space carefully. In the meantime, I guess I will have to install Virtualenv…

The more things change

The more they stay the same. Microsoft has changed their tack, from casting people who hate Internet Explorer as internet trolls (rather than every person who has ever tried to build a cross-browser website for themselves or a client) to casting it as a new brower.
Of course its a product of the same old Microsoft and a quick comparison on caniuse shows it; Shadow DOM? Nope. Server sent events? Nope. WebRTC? Nope.

Everything looks the same:

People are still writing shims to bring IE up to speed with the rest of the web.

IE is still adopting standards as slowly as possible, trying stave off the inevitable moment where web apps compete directly with desktop apps and the OS ceases to mean anything.

ie_vs_the_internet

I’m curious about the direction IE will take once Microsoft’s new CEO Satya Nadella has been around long enough to make his mark. Releasing Office of iPad in his first appearance as CEO is definitely a statement, as is rechristening “Windows Azure” to “Microsoft Azure” (since its aimed a more than just Windows…). We’ll have to wait to see what this more collaborative attitude means for IE. For the moment, the “new” IE, in spite of its rapid release cycle looks a lot like the old IE when you compare it to Firefox and Chromium.

Changing keyboard layout options in Ubuntu 14.04

Back in 2012 I switched my caps-lock key to act as a second ESC key. This made a big impact in my Vim usage, and you can understand why when you see the keyboard vi was created with. Since I now rely on that little tweak, it was a little disconcerting to realize that the keyboard layout options I had used to switch my caps-lock were nowhere to be found in Ubuntu 14.04. It turns out that Gnome (upstream from Ubuntu) removed the settings from the system settings entirely.

Fortunately this is still accessible via the Gnome Tweak Tool.
You can install that like this:

sudo apt-get install gnome-tweak-tool

Then you can find the all the old options under the “typing” option.
gnome tweak tool

Its a little wierd to have such useful stuff suddenly removed from the system settings. Hopefully they will find there way back in a future version, for the moment, my Vim crisis has been averted and that’s enough.

On working with Capybara

I’ve been writing Ruby since 2009 and while TDD as a process has long been clear to me, the morass of testing terminology has not. For a recent project I made pretty significant use of Capybara, and through it, Selenium. While it solved some problems I am not sure I could have solved any other way, it created others and on the way shed some light on some murky terminology.

I think mine was a pretty common pathway to working with Capybara, finding it via Rails and Rspec and an need to do an integration test.

The Rspec equivalent of an integration test is the Request Spec, and its often pointed out that its intended use is API testing, which wasn’t what I was doing. What’s held up as the user focused compliment to request specs are feature specs using Capybara.

The sudden appearance of “the feature” as the focus of these specs, and the brief description of “Acceptance test framework for web applications” at the top of the Capybara Github page should be our first signs that things have shifted a little.

This shift in terminology has implications technically, which are not immediately obvious. The intent of Acceptance testing “is to guarantee that a customers requirements have been met and the system is acceptable“. Importantly, since “Acceptance tests are black box system tests”, this means testing the system from the outside via the UI.

Its this “via the UI” part that should stand out, since its a far cry from the other kinds of tests of tests common with Rails. Uncle Bob has said that testing via the UI “is always a bad idea” and I got a taste of why pretty much right away. Lets take a feature spec like this as an example:

    it "asks for user email" do
      visit '/'
      fill_in "user_email", with: "foo@example.com"
      click_button "user_submit"
      expect(page).to have_content "Thanks for your email!"
    end

Notice that suddenly I am stating expectations about the HTML page contents and looking for and manipulating page elements like forms and buttons.
The code for the user_submit button above would typically look like this in most Rails apps:

<%= f.submit "Submit", :class => "btn snazzy" %>

In Rails 3.0.19 that code would use the User class and the input type to create the id attribute automatically. Our click_button 'user_submit' from above finds the element by id and our test passes:

<input class="btn snazzy" id="user_submit" name="commit" type="submit" value="Submit">

In Rails 3.1.12, the same code outputs this:

<input class="btn snazzy" name="commit" type="submit" value="Submit">

There are patterns like page objects that can reduce the brittleness but testing the UI is something that only really makes sense when the system under test is a black box.

Contrary to the black box assumption of Acceptance tests, Rails integration tests access system internals like cookies, session and the assigns hash as well as asserting specific templates have been rendered. All of that is done via HTTP requests without reliance on clicking UI elements to move between the controllers under test.

Another problem comes from the use of Selenium. By default Capybara uses rack-test as its driver, but adding js: true switches to the javascript driver which defaults to Selenium:

    it "asks for user email", js: true do
      visit '/'
      fill_in "user_email", with: "foo@example.com"
      click_button "user_submit"
      expect(page).to have_content "Thanks for your email!"
    end

The unexpected consequences of this seemingly innocuous option come a month or two later when I try to run my test suite:

     Failure/Error: visit '/'
     Selenium::WebDriver::Error::WebDriverError:
       unable to obtain stable firefox connection in 60 seconds (127.0.0.1:7055)

What happened? Well, my package manager has updated my Firefox version to 26.0, and the selenium-webdriver version specified in my Gemfile.lock is 2.35.1.

Yes, with “js: true” I have made that test dependent on a particular version of Firefox, which is living outside of my version control and gets auto-updated by my package manager every six weeks.

While workarounds like bundle update selenium-webdriver or simply skipping tests tagged with js:true using rspec -t ~js:true are available your default rspec command will always run all the tests. The need to use special options to avoid breakage is unlikely to be remembered/known by future selves/developers, so the solution seems to be keeping some sort of strict separation between your regular test suite and, minimally any test that uses js, or ideally all Acceptance tests. I’m not sure that that might look like yet, but I’m thinking about it.

Acceptance testing differs far more that I initially appreciated from other types of testing, and like most things, when used as intended its pretty fabulous. The unsettling thing in all this was how easy it was to drift into Acceptance testing without explicitly meaning to. Including the Capybara DSL in integration tests certainly blurs the boundaries between two things that are already pretty easily confused. In most other places Capybara’s ostensible raison d’etre seems to be largely glossed over. Matthew Robbins otherwise great book “Application Testing with Capybara“, is not only conspicuously not called “Acceptance testing with Capybara”, it only mentions the words “Acceptance testing” twice.

Cabybara is certainly nice to work with, and being able to translate a clients “when I click here it breaks” almost verbatim into a test is amazing. I feel like I now have a better idea of how to enjoy that upside without the downside.

Web scraping with Ruby

Writing a web scraper seems to be almost a right of passage as a programmer, and recently I found it was my turn to write one. Judging from the libraries I looked at, the term “web scraping” seems to refer to a spectrum of functionality; web crawling on one end to processing the results on the other, and often doing some combination of the two.

I looked at a few libraries and all of them seemed to either do to much or not enough relative to what I had in mind, so as millions of programmers before me have done, I wrote my own.

The idea was very much oriented around extracting a set of structured information from a given page, and to that end I wrote a little DSL to get it done.
The library is called (very creatively) Skrape.

Assuming you have a page like this at the address example.com:

<html><body><h1>I am a title</h1></body></html>

You can scrape the title off the page with this:

results = Skrape::Page.new("http://example.com").extract do
  extract_title with: 'h1'
end

The results would be:

{title: "I am a title"}

The calls to “extract_*” are caught with method_missing and whatever follows the “extract_” is used as the key in the hash of results that is returned.
To deal with the inevitable edge cases that come up so often in scraping you can all so pass a block which will be handed whatever the CSS selector found so you can do some further processing.

I’ve needed to use it for picking out the href attribute of a link:

results = Skrape::Page.new(url).extract do
  extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end

And also removing problematic <br> tags from elements:

results = Skrape::Page.new(url).extract do
  extract_locations with: '.address', and_run: proc {|address| address.map{|a| a.inner_html.gsub('<br>', ', ')} }
end

While there are still some improvements I would like to make, so far I am pretty happy with the library. It feels readable and does not do to much. If you have some scraping to do, check it out. Pull requests welcome.

An intro to injection attacks

I’ve found myself explaining SQL injection attacks to people a few times lately and thought I would write up something I can just point to instead.

For illustration purposes lets make a toy authentication system.

Lets say you have a database with table for all your users that looks like this:

The structure of table "users":
+-----+-----------+----------------+
| uid | username  | password       |
+-----+-----------+----------------+
|   1 | mike      | catfish        |
+-----+-----------+----------------+
|   2 | sally     | floride        |
+-----+-----------+----------------+
|   3 | akira     | pickles        |
+-----+-----------+----------------+

Lets say a user wants to use your application and you ask them for their username and password, and they give you ‘akira’ and ‘pickles’.

The next step in authenticating this user is to check in the database to see if we have a user with both a username of ‘akira’ and a password of ‘pickles’ which we can do with a SQL statement like this:

select * from user where username = 'akira' and password = 'pickles';

Since we know we have one row that satisfies both of the conditions we set (value in username must equal ‘akira’ and the value in password must equal ‘pickles’), if we hand that string of text to the database and ask it to execute it, we would expect the database to return the following data:

+-----+-----------+----------------+
| uid | username  | password       |
+-----+-----------+----------------+
|   3 | akira     | pickles        |
+-----+-----------+----------------+

If the database returns a row we can let the user go ahead and use our application.

Of course if we need our SQL statement to work for everyone and so we can’t just write ‘akira’ in there. So lets replace the username and password with variables (PHP style):

select * from user where username = '$username' and password = '$password';

Now if someone logs in with ‘mike’ and ‘catfish’ our application is going to place the value ‘mike’ in the variable $username  and ‘catfish’ in the variable $password and the PHP interpreter will be responsible for substituting the variable names for the actual values so it can create the finished string that looks like this:

select * from user where username = 'mike' and password = 'catfish';

This will be passed to the database which executes the command and return a row.

Unfortunately mixing data supplied by the user with our pre-exisisting SQL commands and passing the result to the database as one single string has set the stage for some bad behaviour:

$username = "";
$password = " ' or user_name like '%m% ";

select * from user where username = '$username' and password = '$password';

Once the final string is assembled suddenly the meaning is very different:

select * from user where username = ' ' and password = '' or username like '%m%';

Now our database will return a row if the username AND password are both empty strings OR if any of the usernames contains the letter ‘m’. Suddenly the bar for logging into our application is a lot lower.

There are tonnes of possible variations on this, and it’s a common enough problem that its the stuff of jokes among programmers:

XKCD: Little bobby tables.

The root of the problem is the commingling of user supplied data with commands intended  to be executed. The user supplied password value of “ ‘ or username like ‘%m%” entirely changed the meaning of our SQL command where if we had been able make it clear that this was just a string to search the database for we would have had the expected behaviour of comparing the string ” ‘ or username like ‘%m%” to strings in our list of passwords (‘pickles’, ‘catfish’ and ‘floride’).

If you think about it like that you realize that this is not just a problem with SQL, but a problem that shows up everywhere data and commands are mixed.

Keeping these things separate is the only way to stay safe. When the data is kept separate it can be sanitized, or singled out for whatever special treatment is appropriate for the situation by libraries/drivers or whatever else.

Separating data from commands can look pretty different depending on the what you are working on. Keeping data and commands separate when executing system commands in Ruby or Node.js looks different from keeping them separate using prepared statements to safely query a database using JRuby/Java. The same rules apply to NoSQL things like Mongodb as well.

People far smarter than I am still get caught by this stuff pretty regularly, so there is no magic bullet to solve it. If you are working with Rails, security tools like Breakman can help catch things but the only real solution is awareness and practice.

Usb tethering – that was easy

I’ve had my Google Nexus phone for a while now and am only just starting to explore some of the stuff it can do. A recent pleasant surprise was using tethering so I could connect to the internet from my laptop via my phone. I connected my phone to my laptop with the USB cable and dug into the settings:

Android settings
Android settings screen
Android wireless settings
Android wireless settings
Tethering and portable hotspot
Tethering and portable hotspot

As you can see you don’t need to dig very far. When I tapped the the “USB tethering” option Ubuntu immediately popped up a notification saying that it had connected to a wired network… and then I checked my email. It happened so fast that I almost feel cheated out of doing something clever to get it going… almost.

If you don’t have your cable you can explore setting up a Wi-Fi hotspot. You’ll just have to add a name and a password for your network.

I have so many memories of installing Windows and having to use a second computer to download drivers for my network card (and every other card)… Ubuntu is making those memories feel very distant indeed.

 

NetworkManager to the rescue

I work in coffee shops reasonably often, and its surprisingly difficult to find one that has the right combination of comfy chairs, decent coffee and unrestricted WIFI.
Recently one of my favourites had Bell come in and monkey with their router. Of course Bell set up the router to use their nameservers, which, I suspect are some of the slowest in the world (entirely what you would expect if you have had any dealings with Bell).
So when I show up, grab a coffee, and get ready do some work, I notice that everything is really slow.

mike@sleepycat:~☺  time curl www.google.ca > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10947    0 10947    0     0    698      0 --:--:--  0:00:15 --:--:--  3481
y
real	0m15.675s
user	0m0.013s
sys	0m0.000s

Making the same request using just the IP address so as to skip the DNS name resolution step makes it pretty clear where the problem lies:

e
mike@sleepycat:~☺  time curl 74.125.131.94 > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10586    0 10586    0     0  47366      0 --:--:-- --:--:-- --:--:-- 47470

real	0m0.234s
user	0m0.009s
sys	0m0.003s

Now this coffee shop is run by some really lovely people, but my attempt to explain DNS resolution and router settings to girl behind the cash while she frothed milk and customers come in behind me went about as well as you might expect.
So back at my table, I got to thinking; “If only there was a way to accept just the IP address from their router but not their nameserver settings…”

Sure enough, Gnome’s NetworkManager has exactly that:

NetworkManager settings
NetworkManager settings

You will notice I am using Google’s public DNS addresses in there. And the result:

mike@sleepycat:~☺  time curl www.google.ca > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10995    0 10995    0     0  62273      0 --:--:-- --:--:-- --:--:-- 62471

real	0m0.187s
user	0m0.011s
sys	0m0.003s

I love that they thought to add that in there. Well played Gnome! :)

Zero downtime Rails redeploys with Unicorn

Like any self-respecting Ruby hipster I am using Unicorn as my app server in production. Unlike most Ruby hipsters my deployment process is pretty manual at the moment. While the high-touch manual deployment that I am currently doing is far from ideal long term, short term its giving me a close up look at the care and feeding of a production Rails app. Think of it as wanting to get my hands dirty after years of being coddled by Heroku. :)

Much ado has been made of how unixy Unicorn is, and one of the ways that manifests itself is how Unicorn uses signals to allow you to talk to a running server process. What has been interesting about this has been a reintroduction to the “kill” command. Its pretty common to know that “kill -9 1234″ is a quick way to kill process 1234 but it turns out that the kill command has much more happening. The mysterious -9 option is significantly less mysterious once know that kill can send ANY signal, and finally look at the list of options:

mike@sleepycat:~☺  kill -l
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
 6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX	

So with that knowledge lets send some signals to Unicorn to get it serving up the latest version of our code. First we need its process id. We are really just interested in the process id of the master process which we can see is 26465:

mike@sleepycat:/myapp$ ps aux | grep unicorn
503       7995   0:00 grep unicorn
503      26465   0:07 unicorn_rails master -c config/unicorn.rb --env production -D
503      26498   0:11 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503      26502   2:37 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503      26506   0:06 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503      26510   0:06 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503      26514   0:06 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503      26518   0:06 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503      26522   0:06 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503      26526   0:07 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503      26530   0:07 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503      26534   0:06 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503      26538   0:09 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503      26542   0:07 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503      26546   0:07 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503      26550   0:08 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503      26554   0:10 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503      26558   0:08 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503      26562   0:05 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503      26566   0:08 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503      26570   0:07 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503      26574   0:06 unicorn_rails worker[19] -c config/unicorn.rb --env production -D         

Since I have just pulled down some new code I want to restart the master process. I can get Unicorn to launch an new master process by sending the master process the USR2 signal. After that you can see that there is now a new master (7996) with its set of workers and the old master (26465) and its set of workers

mike@sleepycat:/myapp$ kill -USR2 26465
mike@sleepycat:/myapp$ ps aux | grep unicorn
503       7996  0:07 unicorn_rails master -c config/unicorn.rb --env production -D
503       8035  0:00 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503       8038  0:00 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503       8041  0:00 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503       8044  0:00 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503       8046  0:00 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503       8050  0:00 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503       8052  0:00 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503       8056  0:00 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503       8059  0:00 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503       8062  0:00 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503       8064  0:00 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503       8069  0:00 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503       8073  0:00 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503       8075  0:00 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503       8079  0:00 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503       8082  0:00 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503       8085  0:00 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503       8088  0:00 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503       8091  0:00 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503       8094  0:00 unicorn_rails worker[19] -c config/unicorn.rb --env production -D
503       8156  0:00 grep unicorn
503      26465  0:07 unicorn_rails master (old) -c config/unicorn.rb --env production -D
503      26498  0:11 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503      26502  2:37 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503      26506  0:06 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503      26510  0:06 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503      26514  0:06 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503      26518  0:06 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503      26522  0:06 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503      26526  0:07 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503      26530  0:07 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503      26534  0:06 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503      26538  0:09 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503      26542  0:07 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503      26546  0:07 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503      26550  0:08 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503      26554  0:10 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503      26558  0:08 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503      26562  0:06 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503      26566  0:08 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503      26570  0:07 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503      26574  0:06 unicorn_rails worker[19] -c config/unicorn.rb --env production -D

s
Now I want to shutdown the old master process and its workers. I can do that with the QUIT signal:

mike@sleepycat:/myapp$/myapp$ kill -QUIT 26465
mike@sleepycat:/myapp$/myapp$ ps aux | grep unicorn
503       7996  0:07 unicorn_rails master -c config/unicorn.rb --env production -D
503       8035  0:00 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503       8038  0:00 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503       8041  0:00 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503       8044  0:00 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503       8046  0:00 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503       8050  0:00 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503       8052  0:00 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503       8056  0:00 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503       8059  0:00 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503       8062  0:00 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503       8064  0:00 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503       8069  0:00 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503       8073  0:00 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503       8075  0:00 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503       8079  0:00 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503       8082  0:00 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503       8085  0:00 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503       8088  0:00 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503       8091  0:00 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503       8094  0:00 unicorn_rails worker[19] -c config/unicorn.rb --env production -D
503       8161  0:00 grep unicorn

So now we have a Unicorn serving up the latest version of our code without dropping a single request. Really slick.
Ryan Bates has a screencast that has a broader look at the subject of zero downtime deployments, automated with Capistrano (probably a more sustainable approach), but if you look closely you will see these signals lurking in the code.

If you are interested in digging into more Unix fundamentals (from a Rubyist’s perspective!) I would recommend checking out Jesse Storimer’s books.