A look at Overlay FS

Lots has been written about how Docker combines linux kernel features like namespaces and cgroups to isolate processes. One overlooked kernel feature that I find really interesting is Overlay FS.

Overlay FS was built into the kernel back in 2014, and provides a way to “present a filesystem which is the result over overlaying one filesystem on top of the other.”

To explore what this means, lets create some files and folders to experiment with.

$ for i in a b c; do mkdir "$i" && touch "$i/$i.txt"; done
$ mkdir merged
$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
└── merged

4 directories, 3 files

At this point we can use Overlay FS to overlay the contents of a, b and c and mount the result in the merged folder.

$ sudo mount -t overlay -o lowerdir=a:b:c none merged
$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
└── merged
    ├── a.txt
    ├── b.txt
    └── c.txt

4 directories, 6 files
$ sudo umount merged

With merged containing the union of a,b and c suddenly the name “union mount” makes a lot of sense.

If you try to write to the files in our union mount, you will discover they are not writable.

$ echo a > merged/a.txt
bash: merged/a.txt: Read-only file system

To make them writable, we will need to provide an “upper” directory, and an empty scratch directory called a “working” directory. We’ll use c as our writable upper directory.

$ mkdir working
$ sudo mount -t overlay -o lowerdir=a:b,upperdir=c,workdir=working none merged

When we write to a file in one of the lower directories, it is copied into a new file in the upper directory. Writing to merged/a.txt creates a new file with a different inode than a/a.txt in the upper directory.

$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
├── merged
│   ├── a.txt
│   ├── b.txt
│   └── c.txt
└── working
    └── work [error opening dir]

6 directories, 6 files
$ echo a > merged/a.txt
$ tree --inodes
.
├── [34214129]  a
│   └── [34214130]  a.txt
├── [34217380]  b
│   └── [34217392]  b.txt
├── [34217393]  c
│   ├── [34737071]  a.txt
│   └── [34211503]  c.txt
├── [34217393]  merged
│   ├── [34214130]  a.txt
│   ├── [34217392]  b.txt
│   └── [34211503]  c.txt
└── [34737069]  working
    └── [34737070]  work [error opening dir]

6 directories, 7 files

Writing to merged/c.txt modifies the file directly, since c is our writable upper directory.

$ echo c > merged/c.txt
$ tree --inodes
.
├── [34214129]  a
│   └── [34214130]  a.txt
├── [34217380]  b
│   └── [34217392]  b.txt
├── [34217393]  c
│   ├── [34737071]  a.txt
│   └── [34211503]  c.txt
├── [34217393]  merged
│   ├── [34214130]  a.txt
│   ├── [34217392]  b.txt
│   └── [34211503]  c.txt
└── [34737069]  working
    └── [34737070]  work [error opening dir]

6 directories, 7 files

After a little fooling around with Overlay FS, the GraphDriver output from docker inspect starts looking pretty familiar.

$ docker inspect node:alpine | jq .[].GraphDriver.Data
{
  "LowerDir": "/var/lib/docker/overlay2/b999fe6781e01fa651a9cb42bcc014dbbe0a9b4d61e242b97361912411de4b38/diff:/var/lib/docker/overlay2/1c15909e91591947d22f243c1326512b5e86d6541f83b4bf9751de99c27b89e8/diff:/var/lib/docker/overlay2/12754a060228233b3d47bfb9d6aad0312430560fece5feef8848de61754ef3ee/diff",
  "MergedDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/merged",
  "UpperDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/diff",
  "WorkDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/work"
}

We can use these like Docker does to mount the file system for the node:alpine image into our merged directory, and then take a peek to see the nodejs binary that image includes.

$ lower=$(docker inspect node:alpine | jq .[].GraphDriver.Data.LowerDir | tr -d \")
$ upper=$(docker inspect node:alpine | jq .[].GraphDriver.Data.UpperDir | tr -d \")
$ sudo mount -t overlay -o lowerdir=$lower,upperdir=$upper,workdir=working none merged
$ ls merged/usr/local/bin/
docker-entrypoint.sh  node  nodejs  npm  npx  yarn  yarnpkg

From there we could do a partial version of what Docker does for us, using the unshare command to give a process it’s own mount namespace and chroot it to the merged folder. With our merged directory as it’s root, running ls /usr/local/bin command should give us those node binaries again.

$ sudo unshare --mount --root=./merged ls /usr/local/bin
docker-entrypoint.sh  nodejs                npx                   yarnpkg
node                  npm                   yarn

Seeing Overlay FS and Docker’s usage of it has really helped flesh out my mental model of containers. Watching docker pull download layer after layer has taken on a whole new significance.