Why Virtualenv?

Virtualenv comes up often when learning about Python. It’s a Python library that creates a folder into which you install all the libraries your project will need. While its often stated that you should use it, its not often explained why. I recently stumbled upon a good intro that gives an example of creating an application that uses requests and then giving the scenario where running sudo pip install --upgrade requests while working on a separate project breaks the first application.
The idea that updating a library in one project would/could break some/all of my other projects that rely on that library is bizarre and kind of terrifying. It’s nice that the solution to the problem is apparently Virtualenv, but why is this a problem to begin with?

The root of this problem seems to be Pip. If I install version 1.0 of the testing library nose (because I am using pyenv) it gets placed in ~/.pyenv/versions/3.4.0/lib/python3.4/site-packages/. Looking in there, I can see folders for both the code and the metadata (the egg-info folder):

nose/                      nose-1.0.0-py3.4.egg-info/

If I run the pip install --upgrade command you can see the problem unfold:

mike@sleepycat:~/projects/play/python_play☺  pip install --upgrade nose
Downloading/unpacking nose from https://pypi.python.org/packages/source/n/nose/nose-1.3.1.tar.gz#md5=672398801ddf5ba745c55c6eed79c5aa
  Downloading nose-1.3.1.tar.gz (274kB): 274kB downloaded
Installing collected packages: nose
  Found existing installation: nose 1.0.0
    Uninstalling nose:
      Successfully uninstalled nose
  Running setup.py install for nose

Yup, Pip only installs a single version of a library on your system. A quick look back in the ~/.pyenv/versions/3.4.0/lib/python3.4/site-packages/ folder confirms what Pip’s insanely verbose output told us, our nose 1.0 is gone:

nose/                      nose-1.0.0-py3.4.egg-info/

This is pretty surprising for someone whose expectations have been shaped by Ruby’s package manager rubygems. You can see multiple versions of the same library coexisting in the interpreter’s gems folder, meaning that my old projects will still be able to use the old version while my new projects can use the newer without carnage:

ls ~/.rbenv/versions/rbx-2.2.5/gems/gems/arel-
arel-3.0.3/ arel-4.0.1/

Returning to the reason for needing Virtualenv, at first glance it seems you need Virtualenv to protect you from Pip’s inability to do multiple versions of a library. What’s interesting is that both Virtualenv and Pip where written by the same person, Ian Bicking, Virtualenv in 2007 and Pip in 2008. What this seems to suggest is that installing a single version is a design decision made because Pip assumes the existence/use of something like Virtualenv. This is especially true when you realize that Pip was aimed at replacing easy_install, an earlier tool which actually could do multiple versions of the same library as Rubygems had since 2003.

So if you have ever wondered why you need Virtualenv, its seems we have an answer. Pip has pretty much completely replaced previous package managers, and it was developed to assume Virtualenv or something similar is being used… and its assumptions essentially force you to use it.

For those of us starting out with Python, sorting out the ins and outs of the messy world of Python packaging is a pain. The old system seems to be broken, the current one using Virtualenv/Pip is hacky and far from ideal, and the future seems to be largely ignored. Fortunately the beginnings of a clean solution appear to be coming from the world of Docker, so we will have to watch that space carefully. In the meantime, I guess I will have to install Virtualenv…


1 thought on “Why Virtualenv?”

  1. Hi,

    I think you’re right about pip deliberately eschewing multiple versions, but you could also be missing a point.

    The problem isn’t installing multiple versions of a package, it’s importing the modules from them.

    You can do a pkg_resources.require(“mypackage==1.2.3”), then import a module from that package to insist on a version. This is often used for entry scripts; that is, at the very top level. In my opinion, it would be an antipattern to use this all over the place – effectively forcing you to hardcode all your dependencies. The way import works – by name only – is the correct way. It lets python load the correct one from sys.path, not unlike the java classloader.

    Looked at this way, priming sys.path (PYTHONPATH, virtualenv, whatever) is the correct way to load the correct modules. It gives the packager/user control.

    The flipside, is if you do use easy_install to get multiple versions of a package onto sys.path, you then have to be very careful not to load the wrong one. It’s not unreasonable for pip to do what it does, instead pushing the isolation up one level to the virtualenv.

    In most cases, if you find you need to select dependency versions at runtime, you’re probably doing something wrong (note that this on it’s own doesn’t mean virtualenv or pip are worthwhile).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s