Category Archives: linux

gpg and signing your own Arch Linux packages

One of the first things that I wanted to install on my system after switching to Arch was ArangoDB. Sadly it wasn’t in the official repos. I’ve had mixed success installing things from the AUR and the Arango packages there didn’t improve my ratio.

Using those packages as a starting point, I did a little tinkering and got it all working the way I like. I’ve been following all the work being done on reproducible builds, and why that is needed and it seems that with all that going on, anyone dabbling with making packages should at the very least be signing them. With that as my baseline, I figured I might as well start with mine .

Of course package signing involves learning about gpg/pgp whose “ease of use” is legendary.

Before we get to package signing, a little about gpg.

gpg --list-keys
gpg -k
gpg --list-public-keys

All of these commands list the contents of ~/.gnupg/pubring.gpg.

gpg --list-secret-keys
gpg -K

Both list all keys from ~/.gnupg/secring.gpg.

The pacman package manager also has its own gpg databases which you can explore with:

gpg --homedir /etc/pacman.d/gnupg --list-keys

So the task at hand is getting my public key into the list of public keys that pacman trusts. To do that we will need to do more than just list keys we need to reference them individually. gpg has a few ways to do that by passing an argument to one of our list keys commands above. I’ll do a quick search through the list of keys that pacman trusts:

mike@longshot:~/projects/arangodb_pkg☺ gpg --homedir /etc/pacman.d/gnupg -k pierre
pub   rsa3072/6AC6A4C2 2011-11-18 [SC]
uid         [  full  ] Pierre Schmitz (Arch Linux Master Key) <pierre@master-key.archlinux.org>
sub   rsa1024/86872C2F 2011-11-18 [E]
sub   rsa3072/1B516B59 2011-11-18 [A]

pub   rsa2048/9741E8AC 2011-04-10 [SC]
uid         [  full  ] Pierre Schmitz <pierre@archlinux.de>
sub   rsa2048/54211796 2011-04-10 [E]

mike@longshot:~/projects/arangodb_pkg☺  gpg --homedir /etc/pacman.d/gnupg --list-keys stephane
pub   rsa2048/AB441196 2011-10-30 [SC]
uid         [ unknown] Stéphane Gaudreault <stephane@archlinux.org>
sub   rsa2048/FDC576A9 2011-10-30 [E]

If you look at the output there you can see what is called an openpgp short key id. We can use those to refer to individual keys but we can also use long id’s and fingerprints:

gpg --homedir /etc/pacman.d/gnupg -k --keyid-format long stephane
pub   rsa2048/EA6836E1AB441196 2011-10-30 [SC]
uid                 [ unknown] Stéphane Gaudreault <stephane@archlinux.org>
sub   rsa2048/4ABE673EFDC576A9 2011-10-30 [E]


gpg --homedir /etc/pacman.d/gnupg -k --fingerprint stephane
pub   rsa2048/AB441196 2011-10-30 [SC]
      Key fingerprint = 0B20 CA19 31F5 DA3A 70D0  F8D2 EA68 36E1 AB44 1196
uid         [ unknown] Stéphane Gaudreault <stephane@archlinux.org>
sub   rsa2048/FDC576A9 2011-10-30 [E]

So we can identify Stephane’s specific key using either the short id, the long id or the fingerprint:

gpg --homedir /etc/pacman.d/gnupg -k AB441196
gpg --homedir /etc/pacman.d/gnupg -k EA6836E1AB441196
gpg --homedir /etc/pacman.d/gnupg -k 0B20CA1931F5DA3A70D0F8D2EA6836E1AB441196

Armed with a way to identify the key I want pacman to trust I need to do the transfer. Though not initially obvious, gpg can push and pull keys from designated key servers. The file at ~/.gnupg/gpg.conf tells me that my keyserver is keys.gnupg.net, while pacman’s file at /etc/pacman.d/gnupg/gpg.conf says it is using pool.sks-keyservers.net

Using my key’s long id I’ll push it to my default keyserver and tell pacman to pull it and then sign it.

#send my key
gpg --send-key F77636AC51B71B99
#tell pacman to pull that key from my keyserver
sudo pacman-key --keyserver keys.gnupg.net -r F77636AC51B71B99
#sign the key it recieved and start trusting it
sudo pacman-key --lsign-key F77636AC51B71B99

With all that done, I should be able to sign my package with

makepkg --sign --key F77636AC51B71B99

We can also shorten that by setting the default-key option in ~/.gnupg/gpg.conf.

# If you have more than 1 secret key in your keyring, you may want to
# uncomment the following option and set your preferred keyid.

default-key F77636AC51B71B99

With my default key set I’m able to make and install with this:

mike@longshot:~/projects/arangodb_pkg☺ makepkg --sign
mike@longshot:~/projects/arangodb_pkg☺ sudo pacman -U arangodb-2.8.1-1-x86_64.pkg.tar.xz
loading packages...
resolving dependencies...
looking for conflicting packages...

Packages (1) arangodb-2.8.1-1

Total Installed Size:  146.92 MiB
Net Upgrade Size:        0.02 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring
[##################################################] 100%
(1/1) checking package integrity
[##################################################] 100%

The ease with which I can make my own packages is a very appealing part of Arch Linux for me. Signing them was the next logical step and I’m looking forward to exploring some related topics like running my own repo, digging deeper into GPG, the Web of Trust, and reproducible builds. It’s all fun stuff, if you can only find the time.

Hello GraphQL

One of the most interesting projects to me lately has been Facebook’s GraphQL. Announced at React.conf in January, those of us that were excited by the idea have had to wait, first for the spec to be formalized and then for some actual running code.

I think the GraphQL team is on to something big (it’s positioned as an alternative to REST to give a sense of how big), and I’ve been meaning to dig in to it for a while, but it was never clear where to start. So after a little head-scratching and a little RTFM, I want to share a GraphQL hello world.

So what does that look like? Well Facebook as released two projects: graphql-js and express-graphql. Graphql-js is the reference implementation of what is described in the spec. express-graphql is a middleware component for the express framework that lets you use graphql.

So express is going to be our starting point. First we need to create a new project using the generator:

mike@longshot:~☺  express --git -e gql_hello

create : gql_hello
create : gql_hello/package.json
create : gql_hello/app.js
create : gql_hello/.gitignore
create : gql_hello/public
create : gql_hello/routes
create : gql_hello/routes/index.js
create : gql_hello/routes/users.js
create : gql_hello/views
create : gql_hello/views/index.ejs
create : gql_hello/views/error.ejs
create : gql_hello/bin
create : gql_hello/bin/www
create : gql_hello/public/javascripts
create : gql_hello/public/images
create : gql_hello/public/stylesheets
create : gql_hello/public/stylesheets/style.css

install dependencies:
$ cd gql_hello && npm install

run the app:
$ DEBUG=gql_hello:* npm start

Lets do as we are told and run cd gql_hello && npm install.
When that’s done we can get to the interesting stuff.
Next up will be installing graphql and the middleware using the –save option so that our app’s dependencies in our package.json will be updated:

mike@longshot:~/gql_hello☺  npm install --save express-graphql graphql babel
npm WARN install Couldn't install optional dependency: Unsupported
npm WARN prefer global babel@5.8.23 should be installed with -g
...

I took the basic app.js file that was generated and just added the following:

app.use('/', routes);
app.use('/users', users);

// GraphQL:
var graphqlHTTP = require('express-graphql');

import {
  graphql,
  GraphQLSchema,
  GraphQLObjectType,
  GraphQLString,
} from 'graphql';

var schema = new GraphQLSchema({
  query: new GraphQLObjectType({
    name: 'RootQueryType',
    fields: {
      hello: {
        type: GraphQLString,
        resolve() {
          return 'world';
        }
      }
    }
  })
});

//Mount the middleware on the /graphql endpoint:
app.use('/graphql', graphqlHTTP({ schema: schema , pretty: true}));
//That's it!

// catch 404 and forward to error handler
app.use(function(req, res, next) {
  var err = new Error('Not Found');
  err.status = 404;
  next(err);
});

Notice that we are passing our GraphQL schema to graphqlHTTP as well as pretty: true so that responses from the server will be pretty printed.

One other thing is that since those GraphQL libraries make extensive use of ECMAScript 6 syntax, we will need to use the Babel Transpiler to actually be able to run this thing.

If you installed Babel with npm install -g babel you can add the following to your package.json scripts section:

  {
    "start": "babel-node ./bin/www"
  }

Because I didn’t install it globally, I’ll just point to it in the node_modules folder:

  {
    "start": "node_modules/babel/bin/babel-node.js ./bin/www"
  }

With that done we can use npm start to start the app and try things out using curl:

mike@longshot:~☺  curl localhost:3000/graphql?query=%7Bhello%7D
{
  "data": {
    "hello": "world"
  }
}

Looking back at the schema we defined, we can see that our request {hello} (or %7Bhello%7D when its been url encoded) caused the resolve function to be called, which returned the string “world”.

{
  name: 'RootQueryType',
  fields: {
    hello: {
      type: GraphQLString,
      resolve() {
        return 'world';
      }
    }
  }
}

This explains what they mean when you hear that GraphQL “exposes fields that are backed by arbitrary code”. What drew me to GraphQL is that it seems to be a great solution for people with graph database backed applications, but it’s only now that I realize that GraphQL is much more flexible. That string could have just as easily pulled something out of a relational database or calculated something useful. In fact this might be the only time “arbitrary code execution” is something to be happy about.

I’m super excited to explore this further and to start using it with ArangoDB. If you want to dig deeper I suggest you check out Exploring GraphQL and Creating a GraphQL server and of course read the spec.

Running a Rails app with Systemd and liking it

Systemd has, over the strident objections of many strident people, become the default init system for a surprising number of linux distributions. Though I’ve been aware of the drama, the eye-rolling, the uh.. enterprising nature of systemd, I really have only just started using it myself. All the wailing and gnashing of teeth surrounding it left me unsure what to expect.

Recently I needed to get a Proof-Of-Concept app I built running so a client could use it on their internal network to evaluate it. Getting my Rails app to start on boot was pretty straight forward and I’m going to be using this again so I thought I would document it here.

First I created a “rails” user and group, and in /home/rails I installed my usual Rbenv setup. The fact that only root is allowed to listen to ports below 1024, conflicts with my plan to run my app with the “rails” user and listen on port 80. The solution is setcap:

setcap 'cap_net_bind_service=+ep' .rbenv/versions/2.2.2/bin/bundle

With that capability added, I set up my systemd unit file in /usr/lib/systemd/system/myapp.service and added the following:

[Unit]
Description=MyApp
Requires=network.target
Requires=arangodb.service

[Service]
Type=simple
User=rails
Group=rails
WorkingDirectory=/home/rails/myapp
ExecStart=/usr/bin/bash -lc 'bundle exec rails server -e production --bind 0.0.0.0 --port 80'
TimeoutSec=30
RestartSec=15s
Restart=always

[Install]
WantedBy=multi-user.target

The secret sauce that makes this work with rbenv is the “bash -l” in the ExecStart section. This means that the bash will execute as though it was a login shell, meaning that the .bashrc file with all the PATH exports and rbenv init stuff will be sourced before the command I give it will be run. In other words, exactly what happens normally.

From there, I just start the service like all the rest of them:

systemd enable myapp.service
systemd start myapp.service

This Just Works™ and got the job done, but in the process I find I am really starting to appreciate Systemd. Running daemons is complicated, and with a the dropping of privileges, ordering, isolation and security options, there is a lot to get right… or wrong.

What I am liking about Systemd is that it is taking the same functions that Docker is built on, namely cgroups and namespacing, and giving you a declarative way of using them while starting your process. Doing so puts some really nice (and otherwise complicated) security features within reach of anyone willing to read a man page.

PrivateTmp=yes is a great example of this. By simply adding that to the unit file above (which you should if you call Tempfile.new in your app) closes off a bunch of security problems because systemd “sets up a new file system namespace for the executed processes and mounts private /tmp and /var/tmp directories inside it that is not shared by processes outside of the namespace”.

Could I get the same effect as PrivateTmp=yes with unshare? With some fiddling, but Systemd makes it a zero cost option.

There is also ProtectSystem=full to mount /boot, /usr and /etc as read only which “ensures that any modification of the vendor supplied operating system (and optionally its configuration) is prohibited for the service”. Systemd can even handle running setcap for me, resulting in beautiful stuff like this, and there is a lot more in man systemd.exec besides.

For me I think one of the things that has become clear over the last few years is that removing “footguns” from our software is really important. All the work that is going into the tools (like rm -rf) and languages (Rust!) we use less spectacularly dangerous is critical to raising the security bar across the industry.

The more I learn about Systemd the more it seems to be a much needed part of that.

Zero downtime Rails redeploys with Unicorn

Like any self-respecting Ruby hipster I am using Unicorn as my app server in production. Unlike most Ruby hipsters my deployment process is pretty manual at the moment. While the high-touch manual deployment that I am currently doing is far from ideal long term, short term its giving me a close up look at the care and feeding of a production Rails app. Think of it as wanting to get my hands dirty after years of being coddled by Heroku. :)

Much ado has been made of how unixy Unicorn is, and one of the ways that manifests itself is how Unicorn uses signals to allow you to talk to a running server process. What has been interesting about this has been a reintroduction to the “kill” command. Its pretty common to know that “kill -9 1234” is a quick way to kill process 1234 but it turns out that the kill command has much more happening. The mysterious -9 option is significantly less mysterious once know that kill can send ANY signal, and finally look at the list of options:

mike@sleepycat:~☺  kill -l
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
 6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX	

So with that knowledge lets send some signals to Unicorn to get it serving up the latest version of our code. First we need its process id. We are really just interested in the process id of the master process which we can see is 26465:

mike@sleepycat:/myapp$ ps aux | grep unicorn
503       7995   0:00 grep unicorn
503      26465   0:07 unicorn_rails master -c config/unicorn.rb --env production -D
503      26498   0:11 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503      26502   2:37 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503      26506   0:06 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503      26510   0:06 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503      26514   0:06 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503      26518   0:06 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503      26522   0:06 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503      26526   0:07 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503      26530   0:07 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503      26534   0:06 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503      26538   0:09 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503      26542   0:07 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503      26546   0:07 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503      26550   0:08 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503      26554   0:10 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503      26558   0:08 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503      26562   0:05 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503      26566   0:08 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503      26570   0:07 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503      26574   0:06 unicorn_rails worker[19] -c config/unicorn.rb --env production -D         

Since I have just pulled down some new code I want to restart the master process. I can get Unicorn to launch an new master process by sending the master process the USR2 signal. After that you can see that there is now a new master (7996) with its set of workers and the old master (26465) and its set of workers

mike@sleepycat:/myapp$ kill -USR2 26465
mike@sleepycat:/myapp$ ps aux | grep unicorn
503       7996  0:07 unicorn_rails master -c config/unicorn.rb --env production -D
503       8035  0:00 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503       8038  0:00 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503       8041  0:00 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503       8044  0:00 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503       8046  0:00 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503       8050  0:00 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503       8052  0:00 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503       8056  0:00 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503       8059  0:00 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503       8062  0:00 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503       8064  0:00 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503       8069  0:00 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503       8073  0:00 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503       8075  0:00 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503       8079  0:00 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503       8082  0:00 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503       8085  0:00 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503       8088  0:00 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503       8091  0:00 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503       8094  0:00 unicorn_rails worker[19] -c config/unicorn.rb --env production -D
503       8156  0:00 grep unicorn
503      26465  0:07 unicorn_rails master (old) -c config/unicorn.rb --env production -D
503      26498  0:11 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503      26502  2:37 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503      26506  0:06 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503      26510  0:06 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503      26514  0:06 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503      26518  0:06 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503      26522  0:06 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503      26526  0:07 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503      26530  0:07 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503      26534  0:06 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503      26538  0:09 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503      26542  0:07 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503      26546  0:07 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503      26550  0:08 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503      26554  0:10 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503      26558  0:08 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503      26562  0:06 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503      26566  0:08 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503      26570  0:07 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503      26574  0:06 unicorn_rails worker[19] -c config/unicorn.rb --env production -D

s
Now I want to shutdown the old master process and its workers. I can do that with the QUIT signal:

mike@sleepycat:/myapp$/myapp$ kill -QUIT 26465
mike@sleepycat:/myapp$/myapp$ ps aux | grep unicorn
503       7996  0:07 unicorn_rails master -c config/unicorn.rb --env production -D
503       8035  0:00 unicorn_rails worker[0] -c config/unicorn.rb --env production -D
503       8038  0:00 unicorn_rails worker[1] -c config/unicorn.rb --env production -D
503       8041  0:00 unicorn_rails worker[2] -c config/unicorn.rb --env production -D
503       8044  0:00 unicorn_rails worker[3] -c config/unicorn.rb --env production -D
503       8046  0:00 unicorn_rails worker[4] -c config/unicorn.rb --env production -D
503       8050  0:00 unicorn_rails worker[5] -c config/unicorn.rb --env production -D
503       8052  0:00 unicorn_rails worker[6] -c config/unicorn.rb --env production -D
503       8056  0:00 unicorn_rails worker[7] -c config/unicorn.rb --env production -D
503       8059  0:00 unicorn_rails worker[8] -c config/unicorn.rb --env production -D
503       8062  0:00 unicorn_rails worker[9] -c config/unicorn.rb --env production -D
503       8064  0:00 unicorn_rails worker[10] -c config/unicorn.rb --env production -D
503       8069  0:00 unicorn_rails worker[11] -c config/unicorn.rb --env production -D
503       8073  0:00 unicorn_rails worker[12] -c config/unicorn.rb --env production -D
503       8075  0:00 unicorn_rails worker[13] -c config/unicorn.rb --env production -D
503       8079  0:00 unicorn_rails worker[14] -c config/unicorn.rb --env production -D
503       8082  0:00 unicorn_rails worker[15] -c config/unicorn.rb --env production -D
503       8085  0:00 unicorn_rails worker[16] -c config/unicorn.rb --env production -D
503       8088  0:00 unicorn_rails worker[17] -c config/unicorn.rb --env production -D
503       8091  0:00 unicorn_rails worker[18] -c config/unicorn.rb --env production -D
503       8094  0:00 unicorn_rails worker[19] -c config/unicorn.rb --env production -D
503       8161  0:00 grep unicorn

So now we have a Unicorn serving up the latest version of our code without dropping a single request. Really slick.
Ryan Bates has a screencast that has a broader look at the subject of zero downtime deployments, automated with Capistrano (probably a more sustainable approach), but if you look closely you will see these signals lurking in the code.

If you are interested in digging into more Unix fundamentals (from a Rubyist’s perspective!) I would recommend checking out Jesse Storimer’s books.

Understanding Docker

Docker has generated alot of buzz lately and seems poised to fundamentally change how apps get deployed. With the apps I work on increasingly dependent on the environment (environmental vars, cron jobs and additional libraries) having a way of encapsulating my app and its environment is pretty appealing. With that in mind I’ve playing with docker for a little while but I found I had a hard time building a clear picture in my head of what is actually going on.

The tutorials all feel a little magical and a lot of the docs for the commands end up being stuff like “docker pull: pulls an image” which is pretty unsatisfying. So while I am still just getting started with Docker I thought I would share what I have pieced together so far and use it as an opportunity to explain this to myself as well.

The first thing to point out is that Docker is built on top of AuFS, Linux Containers (LXC), and cgroups (lots of details here). Doing some reading about those things first really helps understand what is going on.

While that is neat, I am a pretty visual person so for me to feel like I have any idea of what is going on I need to see it. So to do that I created my own image using the traditional debootstrap command:

☺  sudo debootstrap raring raring64
I: Retrieving InRelease
I: Failed to retrieve InRelease
I: Retrieving Release
I: Retrieving Release.gpg
....

You can see it created a folder with the following sub-folders:

ls raring64/
bin   dev  home  lib64  mnt  proc  run   selinux  sys  usr
boot  etc  lib   media  opt  root  sbin  srv      tmp  var

Then we tar up the folder and piping it into dockers import command. This creates the image and prints out the hash id of the image before exiting:

☺ sudo tar -C raring64 -c . | sudo docker import - raring64
9a6984a920c9

If I dig I can then find those folders in the docker graph folder:

☺ sudo ls /var/lib/docker/graph/9a6984a920c9badcaed6456bfdef2f20a414b08ed09acfd9140f2124065697b2/layer
bin   dev  home  lib64	mnt  proc  run	 selinux  sys  usr
boot  etc  lib	 media	opt  root  sbin  srv	  tmp  var

I can then log into that image by asking docker to run interactively (-i) and give me a pseudo tty (-t). But notice the host name on the root prompt you get when you get docker to run bash (which changes each time):

☺ sudo docker run -i -t raring64 /usr/bin/env bash
WARNING: Docker detected local DNS server on resolv.conf. Using default external servers: [8.8.8.8 8.8.4.4]
root@b0472d03f134:/# exit
exit
☺ sudo docker run -i -t raring64 /usr/bin/env bash
WARNING: Docker detected local DNS server on resolv.conf. Using default external servers: [8.8.8.8 8.8.4.4]
root@76c7860cf94e:/# exit
exit

If I run some commands that change the state of that image and I want to keep them I will need to use the hash we can see in the host name to commit that change back to the graph directory. So for example, I installed git (with “apt-get install git”) and afterwards I commit the change:

☺ sudo docker commit 76c7860cf94e raring_and_git
c153792e04b4

Sure enough this creates a new directory inside /var/lib/docker/graph/ that contains this difference between the original image (my raring64 image) and my new one with git:

☺ sudo ls /var/lib/docker/graph/
27cf784147099545						  9a6984a920c9badcaed6456bfdef2f20a414b08ed09acfd9140f2124065697b2  c153792e04b4a164b9eb981e0f59a82c8775cad90a7771045ba3c6daabc41f23  :tmp:
8dbd9e392a964056420e5d58ca5cc376ef18e2de93b5cc90e868a1bbc8318c1c  b750fe79269d2ec9a3c593ef05b4332b1d1a02a62b4accb2c21d589ff2f5f2dc  checksums

☺ sudo ls /var/lib/docker/graph/c153792e04b4a164b9eb981e0f59a82c8775cad90a7771045ba3c6daabc41f23/layer
dev  etc  lib  tmp  usr  var

It is the job of AuFS to take all the folders and files in the graph directory and sort of sum them into a single filesystem with all the files from raring64 + the new files that changed when I installed git. Docker can then use that filesystem as the base from which to run its namespaced process (similar to chroot).

All of this creates a pretty “git-like” experience where each hash represents a change set applied to a base set of files.

From here building out images takes one of two forms; give yourself an interactive bash session, make your changes and then commit them, or use a Dockerfile.

So this feels like a solid starting point in the world of Docker, and its a pretty exciting world. In fact I am looking forward to deploying my next app… how often can you say that?

EuRuKo 2013

Some of the talks from Eururko sounded really good but I missed most of them because of the time difference. Fortunately the live stream was saved and posted, so now I can watch them whenever!

Update: Ustream has thoughtfully decided to ignore the autoplay=false parameter that WordPress adds to all the videos to prevent them autoplaying. So rather than embeding them and having them all play at the same time everytime the page loads I am just going to link to them. Thanks Ustream.

Day 1 Part 1

Day 1 Part 2

Day 1 Part 3

Day 2 Part 1

Day 2 Part 2

Day 2 Part 3

Contributing to Musicbrainz

Hacking on code is not the only way to contribute to the Free Software / Open Source Software community. Many applications rely on external datasets to provide some or most of their functionality and a contribution there helps all the projects downstream.

Ubuntu’s default music application Rhythmbox as well as Banshee, KDE’s Amarok and a host of smaller programs like Sound Juicer all offer the ability to rip CDs. For this feature to be seen to “work” in the eyes of the user the software needs to correctly identify the CD and add the appropriate album details and track names. To do this, these programs all query the Musicbrainz database and the quality of that response essentially decides the users experience of the ripping process; Will it be a one click “import”, or 10 minutes of filling in track names while squinting at the CD jacket? God help you if you have tracks with accented characters and an English keyboard.

What all this means is that every contribution to Musicbrainz results in a better experience for users of ALL of these programs. When someone somewhere decides to rip their favourite obscure CD, and the software magically fills in the track names and album details, its a pretty happy moment. So if you have wanted to contribute to the Free/Open Source Software community but don’t have the programming chops to do it, contributing to Musicbrainz is one way to help out.

The Musicbrainz dataset is under the Creative Commons CCO license which places all the data in the public domain. This means that your contributions will stay open to the public and we won’t have another CDDB/Gracenote situation where people contributed to a database that ended up charging for access. All you need to get started is to create a Musicbrainz account.

A typical contribution looks something like this. I’ll decide to rip one of my CDs and pop it in the drive. I launch Rhythmbox which will tell me if its not recognized:

rhythmbox_unknown_album2

When you click the “Submit Album” button the application will send you to the Musicbrainz website with the Table Of Contents (TOC) information from the CD in the url. Once on the site you will need to search for the Artist or Release to add the TOC to:

musicbrainz_search

Most of the time there will be an entry in the database for the Artist and all that needs to happen is to add the TOC that you are still carrying from page to page in the url to one of the Artists CDs. In cases where the search returns multiple matches I usually explore the different options by ctrl+clicking on the results to open them in separate tabs.

musicbrainz_selected

Click through the artists albums until you find the one you are looking for, or add one if you need to. In this case there was one already there (and all the details including the catalog number matched) so my next step was to click the “Attach CD TOC”. This takes the TOC you can see in the address bar and attaches it to that release.

musicbrainz_attaching_toc

You will be asked to add a note describing where this information you are providing is coming from. In this case its coming from the CD. Add the note and you are done.  What make contributing to Musicbrainz particularly gratifying is that next time you put in the CD, it is recognized right away. My favourite kind of gratification: instant. You can also immediately see the results in Rhythmbox, as well as Banshee and any other application that uses Musicbrainz.

rhythmbox_after

banshee_science_fiction_lookup_after

Its pretty great thinking that the few minutes invested in this process  not only solves your immediate problem of having an unrecognised CD, but also makes software all over the Linux ecosystem just a tiny bit better. That’s a win/win situation if I’ve ever seen one.

Getting to know SQLite3

I’m finding SQLite3 super useful lately. Its great for any kind of experimentation and quick and painless way to persist data. There are just a few things I needed to wrap my head around to start to feel commfortable with it.

As with most things on Debian based systems, installing is really easy:
sudo apt-get install sqlite3 libsqlite3-dev

My first real question was about datatypes. What does SQLite support? It was a bit mysterious to read that SQLite has 5 datatypes (null, integer, real(float), text, blob) but then see a MySQL style create table statement like this work:

create table people(
  id integer primary key autoincrement,
  name varchar(30),
  age integer,
  awesomeness decimal(5,2)
);

How are varchar and decimal able to work? Worse still, why does something like this work:

create table people(
  id integer primary key autoincrement,
  name foo(30),
  age bar(100000000),
  awesomeness baz
);

As it happens SQLite maps certain terms to its internal datatypes:

If the declared type contains the string “INT” then it is assigned INTEGER affinity.

If the declared type of the column contains any of the strings “CHAR”, “CLOB”, or “TEXT” then that column has TEXT affinity. Notice that the type VARCHAR contains the string “CHAR” and is thus assigned TEXT affinity.

If the declared type for a column contains the string “BLOB” or if no type is specified then the column has affinity NONE.

If the declared type for a column contains any of the strings “REAL”, “FLOA”, or “DOUB” then the column has REAL affinity.

Otherwise, the affinity is NUMERIC.

So the foo, bar and baz columns above, being unrecognized, would have received an affinity of numeric, and would try to convert whatever was inserted into them into a numeric format. You can read more about the in’s and outs of type affinities in the docs, but the main thing to grasp up front is that syntax-wise you can usually write whatever you are comfortable with and it will probably work, just keep in mind that affinities are being set and you will know where to look when you see something strange happening. For the most part this system of affinities does a good job of not violating your expectations regardless of what database you are used to using.

The other thing to get is that SQLite determines the datatype from the values themselves. Anything in quotes is assumed to be a string, unquoted digits are integers, or if they have a decimal, a “real” while a blob is a string of hex digits prefixed with an x: x’00ff’.

So the safest/easiest thing might just be to leave the column definitions out altogether so they will all have an affinity of none and let the values speak for themselves.

The rest of my learning about SQLite is really a grab bag of little goodies:

Getting meta info about tables, indexes or the database itself is done with a pragma statement.
For example, if I want information about the table data:

sqlite> pragma table_info(people);
0|id|integer|0||1
1|name|foo(30)|0||0
2|age|bar(100000000)|0||0
3|awesomeness|baz|0||0

You can get that same list of info within Ruby like so (after running “gem install sqlite3”):

require 'sqlite3'
@db = SQLite3::Database.new("cats.db")
table_name = "cats"
@db.table_info(table_name)

A complete list of pragma statements can be found in the docs.

To open or create a database simply run sqlite3 with the name of the file:

mike@sleepycat:~☺ sqlite3 cats.db

And finally if you have a file with sql statements you would like to run on a database:

mike@sleepycat:~☺ sqlite3 cats.db < insert_all_the_cats.sql

Its been good to get to know SQLite3 a little better. Before this I had only really come in contact with it through my Rails development work and knew it only as the in-memory test database or the one I would use when I couldn’t be bothered to set up a “real” database. The more I look at it the more its seems like a really powerful and useful tool.

Opening the results of a grep in Vim

This week while removing some deprecated syntax from our old tests I found myself wondering how to take the results of my grep command and open the files returned in vim. Moments like this are great for getting to know your tools better so I decided to figure it out. Opening the files was not that difficult to figure out:

vim -o `grep -ril Factory.build test/`

This did exactly the right thing, a case insensitive (-i), recursive (-r) search of the test directory for the problematic syntax (Factory.build) returning only filenames (-l) and then opening the results in vim. I was able to quickly make my changes with a search and replace. With that done you can then write all buffers and close vim with:

:xa

I really love finding stuff like that. While that solved the immediate need, as written, it would not be able to deal with file names that have spaces in them. Since that sort of thing bugs me, I set up some test files and gave it a try:

mike@sleepycat:~/projects/vimtest☺  tail ./*
==> ./fileone <==
blah
blah
cats
blah

==> ./filethree <==
blah
rabbits
blah

==> ./file two <==
blah
racoons
dogs and cats
blah

After a little tinkering I figured it out with a little help from the xargs man page. Its not something I would want to type often but this will make a very useful alias or script some time:

grep -Zril cats . | xargs -0 sh -c 'vim -O "$@" < /dev/tty' vim

Bash Boilerplate

I’ve ended up doing a lot of bash scripting lately and am really shocked at how difficult it is to sort out all most mundane things like handling long and short options and some other cases like ensuring the script is run as root. So here is my bash boilerplate; 45 lines of time and sanity saving bash script. It has been lovingly assembled in Franenstinian fashion from about a dozen blogs and stackoverflow answers and owes a great debt to missiondata’s excellent post on getopt. Thank you internet.

I have several scripts using variations on this now that I use reasonably regularly and its a real time saver to use this as my starting point. If you notice a mistake or something that is flat out a bad idea please let me know. I’d like to make this as solid as possible.


#!/usr/bin/env bash
# set the script to exit immediately on error
set -e

usage(){
cat <<'EOT'
Call this script with...
EOT
exit 0;
}

# exit if there are no arguments
[ $# -eq 0 ] && usage

set -- `getopt -n$0 -u -a --longoptions "help dogs: cats:" "hd:c:" "$@"`

# $# is the number of arguments
while [ $# -gt 0 ]
do
case "$1" in
-d|--dogs) dogs="$2"; shift;;
-c|--cats) cats="$2"; shift;;
-h| --help) usage;;
--) shift;break;;
*) break;;
esac
shift
done

cleanup_before_exit () {
# this code is run before exit
echo -e "Running cleanup code and exiting"
}
# trap catches the exit signal and runs the specified function
trap cleanup_before_exit EXIT

# if this script needs to be run as root
# we can check like this:
# if [ ! `id -u` -eq 0 ]; then
#  echo "You need to call this script using sudo."
#  exit 1
# fi

# OK now do stuff
echo "$cats and $dogs living together, mass hysteria!"