A look at Overlay FS

Lots has been written about how Docker combines linux kernel features like namespaces and cgroups to isolate processes. One overlooked kernel feature that I find really interesting is Overlay FS.

Overlay FS was built into the kernel back in 2014, and provides a way to “present a filesystem which is the result over overlaying one filesystem on top of the other.”

To explore what this means, lets create some files and folders to experiment with.

$ for i in a b c; do mkdir "$i" && touch "$i/$i.txt"; done
$ mkdir merged
$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
└── merged

4 directories, 3 files

At this point we can use Overlay FS to overlay the contents of a, b and c and mount the result in the merged folder.

$ sudo mount -t overlay -o lowerdir=a:b:c none merged
$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
└── merged
    ├── a.txt
    ├── b.txt
    └── c.txt

4 directories, 6 files
$ sudo umount merged

With merged containing the union of a,b and c suddenly the name “union mount” makes a lot of sense.

If you try to write to the files in our union mount, you will discover they are not writable.

$ echo a > merged/a.txt
bash: merged/a.txt: Read-only file system

To make them writable, we will need to provide an “upper” directory, and an empty scratch directory called a “working” directory. We’ll use c as our writable upper directory.

$ mkdir working
$ sudo mount -t overlay -o lowerdir=a:b,upperdir=c,workdir=working none merged

When we write to a file in one of the lower directories, it is copied into a new file in the upper directory. Writing to merged/a.txt creates a new file with a different inode than a/a.txt in the upper directory.

$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
├── merged
│   ├── a.txt
│   ├── b.txt
│   └── c.txt
└── working
    └── work [error opening dir]

6 directories, 6 files
$ echo a > merged/a.txt
$ tree --inodes
.
├── [34214129]  a
│   └── [34214130]  a.txt
├── [34217380]  b
│   └── [34217392]  b.txt
├── [34217393]  c
│   ├── [34737071]  a.txt
│   └── [34211503]  c.txt
├── [34217393]  merged
│   ├── [34214130]  a.txt
│   ├── [34217392]  b.txt
│   └── [34211503]  c.txt
└── [34737069]  working
    └── [34737070]  work [error opening dir]

6 directories, 7 files

Writing to merged/c.txt modifies the file directly, since c is our writable upper directory.

$ echo c > merged/c.txt
$ tree --inodes
.
├── [34214129]  a
│   └── [34214130]  a.txt
├── [34217380]  b
│   └── [34217392]  b.txt
├── [34217393]  c
│   ├── [34737071]  a.txt
│   └── [34211503]  c.txt
├── [34217393]  merged
│   ├── [34214130]  a.txt
│   ├── [34217392]  b.txt
│   └── [34211503]  c.txt
└── [34737069]  working
    └── [34737070]  work [error opening dir]

6 directories, 7 files

After a little fooling around with Overlay FS, the GraphDriver output from docker inspect starts looking pretty familiar.

$ docker inspect node:alpine | jq .[].GraphDriver.Data
{
  "LowerDir": "/var/lib/docker/overlay2/b999fe6781e01fa651a9cb42bcc014dbbe0a9b4d61e242b97361912411de4b38/diff:/var/lib/docker/overlay2/1c15909e91591947d22f243c1326512b5e86d6541f83b4bf9751de99c27b89e8/diff:/var/lib/docker/overlay2/12754a060228233b3d47bfb9d6aad0312430560fece5feef8848de61754ef3ee/diff",
  "MergedDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/merged",
  "UpperDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/diff",
  "WorkDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/work"
}

We can use these like Docker does to mount the file system for the node:alpine image into our merged directory, and then take a peek to see the nodejs binary that image includes.

$ lower=$(docker inspect node:alpine | jq .[].GraphDriver.Data.LowerDir | tr -d \")
$ upper=$(docker inspect node:alpine | jq .[].GraphDriver.Data.UpperDir | tr -d \")
$ sudo mount -t overlay -o lowerdir=$lower,upperdir=$upper,workdir=working none merged
$ ls merged/usr/local/bin/
docker-entrypoint.sh  node  nodejs  npm  npx  yarn  yarnpkg

From there we could do a partial version of what Docker does for us, using the unshare command to give a process it’s own mount namespace and chroot it to the merged folder. With our merged directory as it’s root, running ls /usr/local/bin command should give us those node binaries again.

$ sudo unshare --mount --root=./merged ls /usr/local/bin
docker-entrypoint.sh  nodejs                npx                   yarnpkg
node                  npm                   yarn

Seeing Overlay FS and Docker’s usage of it has really helped flesh out my mental model of containers. Watching docker pull download layer after layer has taken on a whole new significance.

The problem of selling software on Linux

Recently I listened Bryan Lunduke’s talk “Why Linux Sucks“. One of the arguments he made was that the open source/free software development model has produced a lot of great software but struggled to produce highly specialised and sophisticated software like Photoshop or some of the top audio/video editing apps.

The argument is certainly food for thought and he points out that the projects like the Linux kernel that are making progress very quickly are largely commercially backed. From there he talks about the need for the developers to get paid and lauds the Ubuntu Software Centre for offering the ability to sell software as a potential solution. I have been digesting this for a little while now and I think the concept of selling software can be deeply problematic.

The difficulty I have is that the economic incentives are for the developer to maximise the amount of money they collect by treating each version as a separate product. When you sell someone a particular version, the will keep that version until they feel that it is worth it to shell out for the next one. My Dad for instance ran Office 97 for almost 10 years, because he never felt the new features were worth the money.

The problem with this situation lies in the fact that old software lingering on your system is a vector for attack. Microsoft’s failure to entice my Dad into upgrading means that my Dad’s computer is now at risk of infection and compromise by all sorts of malware. Once he is infected and his computer starts attacking other people, this becomes everyone’s problem.

I think one of the strengths of the Linux platform is that all this “free as in speech” software has largely translated into “free as in beer” software which in turn means the barrier to staying up-to-date is nearly non-existent. Your package manager keeps installing the latest version for free and you don’t need to worry about virus’s and malware and I don’t need to worry that your computer is attacking mine. Throwing paid for software into this ecosystem seems like a way to compromise on of the best things about the platform. While Bryan Lunduke’s talk was good for stimulating some discussion I have come to feel that he has misdiagnosed the problem.

At a recent Android meetup I met a guy who was being paid by IBM to work on PhoneGap (now known as Apache Cordova). When I asked him why IBM would do this his answer was that IBM has hundreds of people that are trained and familiar with web technologies and almost none familiar with Objective C and the rest of the Apple technologies. With a need to make a bunch of apps on a variety of mobile devices and a a bunch of trained web developers on staff they looked around for a technology that could bridge that gap and found PhoneGap. After a little poking around they determined that it didn’t do everything they wanted but was close enough that paying someone to work on improving PhoneGap was cheaper than retraining all their web developers to learn Java and Objective C.

Coming back to the lack of good image/video/audio editors on Linux, it seems to me, then, that we have a chicken and egg type problem. The editors on Linux don’t attract financial backing because they are not good enough to be considered “good enough” for basic professional use. While that sounds a little circular, I think its sound. Just like IBM looking at PhoneGap and deciding that it was close enough to what they wanted that with minor investment they could make it fit their needs, GIMP and other programs could find that both monetary and code contributions could start rolling in once they cross that “close enough” threshold for their industry.

A good example is the GIMP. Much maligned for its interface, its real problem is that its lack of CMYK support has meant that anyone that deals with printing presses has been unable to use it. With the GIMP teams recent announcement that 90% of GIMP core has been ported to GEGL which allows them to operate in different colour-spaces like CMYK. This means that price sensitive people in the printing industry and can finally consider using GIMP in their daily workflow and may just find that its “good enough”. When they do, they too may find that throwing a few dollars towards the project to smooth out some rough edges will make the same kind of sense to them as it did to IBM.

For my part I took Bryan’s advice and took a look at the Ardour audio editor he used as one of his examples. He mentioned that it was getting some traction in the industry and was in need of donations to keep the developer working on it full time. I decided to donate monthly to the project, hoping that it too will cross that threshold into “good enough” and start attracting some investment from professionals looking for improvements. Here’s hoping!

Dear Canonical: Need Data.

Dell’s partnered with Canonical in 2007 was pretty exciting news. In my professional life I have spent close to half a million on computer hardware with the majority of it going to Dell. All my personal computers have always come from them as well. When I decided to get myself a new laptop, I went to straight to Dell.ca/ubuntu to pick out my dream machine.

After a little poking around I realised that they only sell Ubuntu on the low end machines. At the time (almost a year and a half ago now) they sold Ubuntu preinstalled on at least one desktop machine and on maybe three different laptops. Their best one was the lowest end of the XPS line and had very few customisation options.

While I really wanted to buy a system with Ubuntu preinstalled so Dell would know that a market for this stuff actually exists, I also knew I would be doing myself a disservice by buying something less powerful than I needed. After agonising over it I bit the bullet and bought the top of the range XPS with most of the bells and whistles… and Windows Vista.

Looking again, Dell seems to have reduced even that meagre offering and is now only offering Ubuntu preinstalled on the Dell mini netbook and a low-end Inspiron laptop. Aside from the fact that Ubuntu is almost impossible find for anyone not typing in the URL directly, the worst part is the Inspiron 15 with Ubuntu is $579 while the Inspiron 15 with Windows 7 is $569. Perhaps I’m old fashioned, but a computer with a $110 copy of Windows 7 on it should be more expensive than one without. It would be nice if one of the most obvious benefits of Linux were a little more obvious in the pricing. For a partner, Dell seems to have some funny ideas about what will help Ubuntu sell well.

While I believe Dell was correct in sensing that there is a market out there, and gutsy enough to try it out, its Ubuntu offerings seem to be languishing. Partly its pricing silliness and poor marketing but mostly it seems like they are misreading the market. Its been my experience that Linux users don’t buy low end hardware. Low end hardware in the Linux community seems to be something that is either gifted or salvaged, not something you purchase. When purchasing, its mid range to high end systems they are after. With that in mind, I can’t help but feel that Dell has missed the mark with their offering.

I suspect they missed it for the same reason most other companies still think there is no money to be made in the Linux world; there’s very little data. While Canonical is starting to gather a little data for their servers, I think the desktop probably needs it more. Perhaps its time for an Ubuntu version of the Steam Hardware Survey. Maybe they should consider some demographic surveys as well.  I think its time we find out big the GNU dollar is.

I think this needs to happen before companies fall prey to a self fulfilling prophecy; Offer a product blindly, receive an underwhelming response, and conclude that there really is no money to be made from the 12 million+ Ubuntu machines and the unknown number of users of other distros. This kind of data would be invaluable not only to developers considering a Linux version of an existing program, but also to partners like Dell who just seem to need a bit of a nudge in the right direction.

What say you Canonical?

Steam on Linux will be a very big deal.

Today Phoronix “officially” announced that Steam is coming to Linux. While their definition of “official” doesn’t seem to require any input from Valve (the company that makes Steam), it does seem likely that they are correct given the screenshots and whatnot circulating on their site.

Its been pointed out before (I can’t remember where) that gamers are a perfect target market for Linux. They are more technical than the average user, looking for performance and geeky. I also think Linux and gaming are a match made in heaven, and now that it seems like it may actually happen, it does make you wonder about the ripple effect this will have across the industry.

For Microsoft there are a few interesting implications. First, one of the top games companies has gone cross-platform. To do this you would have to ditch Microsoft’s DirectX and code using OpenGL. Losing a company like Valve is bad, but its made worse since Steam is now cross-platform (currently on Windows and Mac, Linux rumours aside) and as such incentivizes other developers using Steam as a sales platform to follow suit.

Judging from the recent success of Wolfire’s Humble indie bundle, Mac and Linux users are worth the trouble of reaching out to. Had Valve been a little faster at making Steam cross-platform,  some of the nearly $1.2 million they Wolfire raised could have gone to Valve. All the games in the bundle are available via Steam already, but only for Windows. Hopefully other developers were paying attention.

Also, pretty much every Linux user I know has a Windows machine they keep around for gaming. There is also a pretty sizeable number of Mac users out there that do the same. If you can get all (or even most) of your favourite games for your chosen OS, why keep Windows around?

Linux users may have some additional reasons to keep a Windows box around (Photoshop jumps to mind) but Mac users will likely just jettison Windows entirely the first chance they get. This effect will probably be noticeable only in some indirect ways: more pressure on OEM’s like Dell or HP to deliver machines without Windows, sluggish sales of the next version of Windows. While it won’t be huge, I think this effect would probably be big enough to be noticed by Microsoft. That said, I don’t think you could ever make a direct causal link.

The effect on the Linux community is also interesting to think about. Will we see stripped down gaming distros, tweaked to get the highest possible Frames Per Second, running Xfce (or even TWM), Steam and not much more? Imagine having both game and OS compiled from source specifically to squeeze every bit of performance out of your processor. Gentoo… I think we have found your calling. Of course, if gamers come to expect the ability to significantly optimize their operating systems, they may start demanding that of their video drivers as well…

Even though Steam is still vaporware at the moment, it’s awfully fun to speculate about. While it’s still possible that it won’t actually happen, one thing if for sure; Steam on Linux would be a very big deal.

The budding business case for linux

Wolfire games has teamed up with some of the other top indie game developers and put together a cross platform game bundle that is selling like hotcakes. It includes the games World of Goo, Aquaria, Gish, Lugaru and Penumbra. Their “pay what you want” game bundle has made them half a million dollars (and counting) to be split amongst the developers from the participating studios as well as selected charities (EFF and Child’s Play). While the sale of the bundle is generating some good money, it’s also generating some fascinating statistical data:

Windows ~90% of the market 65% of donations 52% of revenue
Mac ~6% of the market 21% of donations 25% of revenue
Linux: ~1% of the market 14% of donations 23% of revenue

There is plenty of conjecture about why they numbers shake out that way but, pretty much any way you look at them there it is looks like a pretty solid business case for supporting Mac and Linux. Until recently there was precious data about what it might be like selling into the Linux community. Most simply wrote it off assuming that the ~1% market share was all there was to the story. While not every company will have the same experience as Wolfire, I think that they have proved there is more going on there than the 1% suggests.

This bodes particularly well for Valve and their imminent release of their Steam platform for the Mac (curently in Beta), and their upcoming version for Linux. It will be really interesting to see what kind of numbers they come up with after running Steam across all three platforms. If they are anything like the numbers from Wolfire, the next couple of years are going to be pretty interesting for Linux.

I also wonder if the bundle method could also be harnessed as a broader method of funding Free Software development. Imagine a web of cross promotion where developers with easily monitized software (like a game) include a donation to a less easily monitized project (like a bittorrent client) exactly as was done with the EFF in the Humble bundle. Wolfire has done something inspirational, and shown there are lots of possibilities to explore. Go and support them!

Netstat

Netstat is one of those programs that most computer people use but very few understand. Because I am one of those people, I decided to write this to change that. Netstat displays a listing of network connections that and their status which can be very useful for anyone concerned with the security of their machine. Not only does it tell you who your machine is talking to currently but it also tells you if there are programs listening to accept connections from foreign computers. Typically the output of the command is pretty alarming because of the startling number of connections and pretty arcane descriptions that go with them:

C:\>netstat -ano

Active Connections

Proto Local Address Foreign Address State PID
TCP 0.0.0.0:135 0.0.0.0:0 LISTENING 1104
TCP 0.0.0.0:445 0.0.0.0:0 LISTENING 4
TCP 0.0.0.0:1025 0.0.0.0:0 LISTENING 1336
TCP 0.0.0.0:2996 0.0.0.0:0 LISTENING 2912
TCP 0.0.0.0:3172 0.0.0.0:0 LISTENING 2912
TCP 0.0.0.0:3173 0.0.0.0:0 LISTENING 2912
TCP 0.0.0.0:5000 0.0.0.0:0 LISTENING 1672
TCP 74.104.77.xxx:139 0.0.0.0:0 LISTENING 4
TCP 74.104.77.xxx:3071 12.120.5.14:80 TIME_WAIT 0
TCP 74.104.77.xxx:3172 72.14.207.99:443 CLOSE_WAIT 2912
TCP 74.104.77.xxx:3173 72.14.205.83:443 CLOSE_WAIT 2912
TCP 127.0.0.1:2995 0.0.0.0:0 LISTENING 2912
TCP 127.0.0.1:2995 127.0.0.1:2996 ESTABLISHED 2912
TCP 127.0.0.1:2996 127.0.0.1:2995 ESTABLISHED 2912

Probably the most confusing column is the local address column. Your computer always has at least two (and sometimes more) IP addresses that it will answer to. The above example shows that the computer will answer to 74.104.77.xxx and 127.0.0.1 (the computers equivalent of “me”). The three addresses shown have different and special meanings.

127.0.0.1:port#programs listening on this address will accept connections originating from only the local computer.

74.104.77.xxx:port#programs listening on this address will accept connections originating from computers on the network/internet.

0.0.0.0:port#programs listening on this address will accept connections from anywhere, local or remote, sent to any of the addresses the computer will answer to (in this case 127.0.0.1 and 74.104.77.xxx).

The State column refers to the state of the TCP connection. You won’t see this for UDP connections because the don’t have state like TCP does. Here is the list of options (plagiarised from some site I don’t remember):

LISTEN – represents waiting for a connection request from any remote TCP and port.

SYN-SENT – represents waiting for a matching connection request after having sent a connection request.

SYN-RECEIVED – represents waiting for a confirming connection request acknowledgment after having both received and sent a connection request.

ESTABLISHED – represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.

FIN-WAIT-1 – represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent.

FIN-WAIT-2 – represents waiting for a connection termination request from the remote TCP.

CLOSE-WAIT represents waiting for a connection termination request from the local user.

CLOSING – represents waiting for a connection termination request acknowledgment from the remote TCP.

LAST-ACK – represents waiting for an acknowledgment of the connection termination request previously sent to the remote TCP (which includes an acknowledgment of its connection termination request).

TIME-WAIT – represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

CLOSED – represents no connection state at all.

Hopefully that will help make sense of the output netstat gives. It helped me at least :)