Packaging, pid-files and systemd

When I first built my ArangoDB package one of the problems I had was getting ArangoDB to start after a reboot. While reworking it for Arango 3.0 I ran into this again.
The reason this can be tricky is that ArangoDB, like basically all forking processes needs to write a pid file somewhere. Where things get confusing is that that anything you create in /var/run will be gone next time you reboot leading to errors like this:

-- Unit arangodb.service has begun starting up.
Aug 24 08:50:27 longshot arangod[10366]: {startup} starting up in daemon mode
Aug 24 08:50:27 longshot arangod[10366]: cannot write pid-file '/var/run/arangodb3/arangod.pid'
Aug 24 08:50:27 longshot systemd[1]: arangodb.service: Control process exited, code=exited status=1
Aug 24 08:50:27 longshot systemd[1]: Failed to start ArangoDB.
-- Subject: Unit arangodb.service has failed

If you DuckDuckGo it you can see that people stumble into this pretty regularly.

To understand what’s going on here it’s important to know about what /var/run is actually for.

The Filesystem Hierarchy Standard describes it as a folder for “run-time variable data” and lays out some rules for the folder:

This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process. Programs may have a subdirectory of /var/run; this is encouraged for programs that use more than one run-time file. Process identifier (PID) files, which were originally placed in /etc , must be placed in /var/run. The naming convention for PID files is .pid. For example, the crond PID file is named /var/run/crond.pid.

Since those words were written in 2004, the evolving needs of init systems, variations across distributions and the idea of storing pid-files (which shouldn’t survive reboot) with logs and stuff (which should) have all conspired to push for the creation of a standard place to put ephemeral data: /run.

Here in 2016, /run is a done deal, and for backwards compatibility, /var/run is now simply a simlink to /run:

mike@longshot ~/$  ls -l /var/
total 52
...
lrwxrwxrwx  1 root root     11 Sep 30  2015 lock -> ../run/lock
lrwxrwxrwx  1 root root      6 Sep 30  2015 run -> ../run
...

Looking back at our cannot write pid-file '/var/run/arangodb3/arangod.pid' error, a few things are clear. First, we should probably stop using /var/run since /run has been standard since around 2011.

Second, our files disappear because /run is a tmpfs. While there are some subtleties it’s basically storing your files in RAM.

So the question is; how do we ensure our /run folder is prepped with our /run/arangodb3 directory (and whatever other files) before our systemd unit file is run? As it happens, systemd has a subproject that deals with this: tmpfiles.d.

The well-named tmpfiles.d creates tmpfiles in /run and /tmp (and a few others). It does this by reading conf files written in a simple configuration format out of certain folders. A quick demo:

mike@longshot ~$  sudo bash -c "echo 'd /run/foo 0755 mike users -' > /usr/lib/tmpfiles.d/foo.conf"
mike@longshot ~$  sudo systemd-tmpfiles --create foo.conf
mike@longshot ~$  ls -l /run
...
drwxr-xr-x  2 mike     users     40 Aug 24 14:18 foo
d
...

While we specified an individual conf file by name running systemd-tmpfiles --create would create the files for all the conf files that exist in /usr/lib/tmpfiles.d/.

mike@longshot ~$  ls -l /usr/lib/tmpfiles.d/
total 104
-rw-r--r-- 1 root root   30 Jul  5 10:35 apache.conf
-rw-r--r-- 1 root root   78 May  8 16:35 colord.conf
-rw-r--r-- 1 root root  574 Jul 25 17:10 etc.conf
-rw-r--r-- 1 root root  595 Aug 11 08:04 gvfsd-fuse-tmpfiles.conf
-rw-r--r-- 1 root root  362 Jul 25 17:10 home.conf
...

Tying all this together is a systemd service that runs just before sysinit.target that uses that exact command to create all the tmpfiles:

mike@longshot ~/$  systemctl cat systemd-tmpfiles-setup.service
# /usr/lib/systemd/system/systemd-tmpfiles-setup.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Create Volatile Files and Directories
Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
DefaultDependencies=no
Conflicts=shutdown.target
After=local-fs.target systemd-sysusers.service
Before=sysinit.target shutdown.target
RefuseManualStop=yes

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev

If your unit file includes After=sysinit.target you know that tmpfiles you specified will exist when your unit file is run.

Knowing that this plumbing is in place, your package should include a conf file which gets installed into /usr/lib/tmpfiles.d/. Here is mine for ArangoDB:

mike@longshot ~/projects/arangodb_pkg (master)$  cat arangodb-tmpfile.conf 
d /run/arangodb3 0755 arangodb arangodb -

While this will ensure that tmpfiles are created next time the computer boots, we also need to make sure the service can be started right now. If you are packaging software for ArchLinux that means having a post_install hook that looks like this:

post_install() {
  systemd-tmpfiles --create arangodb.conf
}

If you are running systemd, and you probably are, this is the way to go. While it’s not hard to find people using mkdir in their unit file’s ExecStartPre section (been there, done that) or writing some sort of startup script, this is much cleaner. Make use of the infrastructure that is there.

Running a Rails app with Systemd and liking it

Systemd has, over the strident objections of many strident people, become the default init system for a surprising number of linux distributions. Though I’ve been aware of the drama, the eye-rolling, the uh.. enterprising nature of systemd, I really have only just started using it myself. All the wailing and gnashing of teeth surrounding it left me unsure what to expect.

Recently I needed to get a Proof-Of-Concept app I built running so a client could use it on their internal network to evaluate it. Getting my Rails app to start on boot was pretty straight forward and I’m going to be using this again so I thought I would document it here.

First I created a “rails” user and group, and in /home/rails I installed my usual Rbenv setup. The fact that only root is allowed to listen to ports below 1024, conflicts with my plan to run my app with the “rails” user and listen on port 80. The solution is setcap:

setcap 'cap_net_bind_service=+ep' .rbenv/versions/2.2.2/bin/bundle

With that capability added, I set up my systemd unit file in /usr/lib/systemd/system/myapp.service and added the following:

[Unit]
Description=MyApp
Requires=network.target
Requires=arangodb.service

[Service]
Type=simple
User=rails
Group=rails
WorkingDirectory=/home/rails/myapp
ExecStart=/usr/bin/bash -lc 'bundle exec rails server -e production --bind 0.0.0.0 --port 80'
TimeoutSec=30
RestartSec=15s
Restart=always

[Install]
WantedBy=multi-user.target

The secret sauce that makes this work with rbenv is the “bash -l” in the ExecStart section. This means that the bash will execute as though it was a login shell, meaning that the .bashrc file with all the PATH exports and rbenv init stuff will be sourced before the command I give it will be run. In other words, exactly what happens normally.

From there, I just start the service like all the rest of them:

systemctl enable myapp.service
systemctl start myapp.service

This Just Works™ and got the job done, but in the process I find I am really starting to appreciate Systemd. Running daemons is complicated, and with a the dropping of privileges, ordering, isolation and security options, there is a lot to get right… or wrong.

What I am liking about Systemd is that it is taking the same functions that Docker is built on, namely cgroups and namespacing, and giving you a declarative way of using them while starting your process. Doing so puts some really nice (and otherwise complicated) security features within reach of anyone willing to read a man page.

PrivateTmp=yes is a great example of this. By simply adding that to the unit file above (which you should if you call Tempfile.new in your app) closes off a bunch of security problems because systemd “sets up a new file system namespace for the executed processes and mounts private /tmp and /var/tmp directories inside it that is not shared by processes outside of the namespace”.

Could I get the same effect as PrivateTmp=yes with unshare? With some fiddling, but Systemd makes it a zero cost option.

There is also ProtectSystem=full to mount /boot, /usr and /etc as read only which “ensures that any modification of the vendor supplied operating system (and optionally its configuration) is prohibited for the service”. Systemd can even handle running setcap for me, resulting in beautiful stuff like this, and there is a lot more in man systemd.exec besides.

For me I think one of the things that has become clear over the last few years is that removing “footguns” from our software is really important. All the work that is going into the tools (like rm -rf) and languages (Rust!) we use less spectacularly dangerous is critical to raising the security bar across the industry.

The more I learn about Systemd the more it seems to be a much needed part of that.