Productionizing a Zope 3 App with Buildout

I recently went through the process of putting a Zope 3 application into production. I learned a lot about buildout that I thought might be useful for other people to know. Here are the goals that I wanted to accomplish:

  • One step deployment
  • Externalized deployment configuration
  • Easy dependency management
  • Reliability (don’t choke when pypi goes down)
  • Reproducability

Server Setup

It’s important to decide early on what your base server deployment is going to look like. Most of the time, a bare bones installation of CentOS or Ubuntu Server edition are not going to cut it for the specialized software you will be running (unless you *really* want to go through the trouble of building debs or rpms). As a precursor to using buildout for some of the more detailed configuration requirements, we have to set up the server with the basics.

Non-system Python

Anyone who has been using buildout or eggs for a while knows that it’s generally not a good idea to use the system’s python when installing and building custom software since different OS vendors don’t always provide the standard library that you’d expect nor in a way that is necessarily compatible with setuptools and eggs.

We are installing python2.5 under /opt/python which creates a number of binaries under /opt/python/bin/. Once you have a custom python setup, you want to run the ez_setup.py script to get the latest setuptools. You can do this in one shot on most linux systems with the command:

curl http://peak.telecommunity.com/dist/ez_setup.py | /opt/python/bin/python

Now you will have the eazy_install command available to you upon which you can install buildout:

/opt/python/bin/easy_install zc.buildout

At this point you are probably asking “what happened to the one step deployment?” Fortunately, you really only have to instally easy_install and buildout once. It’s almost always a good idea to keep all your servers running on the exact same hardware and OS so that any binaries generated by one of the servers can be run on all the others. Once you have installed easy_install and buildout, you can just tar up the entire /opt/python directory and make it part of your standard deployment on all machines. In the event you are using python libraries with C extensions like scipy or lxml, it would be a good idea to have those built now and be included in the base system tarball.

Building and Publishing eggs

When you deploy software, it is immensely useful to have all the code bundled together into a single easy to distribute file. Python eggs provide a great way for bundling your software with all the metadata you need to manage complex distributions. Let’s consider an example application that uses a number of different packages.

Working with Multiple Packages

If you are building a large python application, chances are that you have split out reusable functionality into standalone python packages that are all combined and integrated with a primary application package. For example, you might have the following package structure:

packages/
    company.app/
        buildout.cfg
        setup.py
        setup.cfg
    company.foo/
        buildout.cfg
        setup.py
        setup.cfg
    company.bar/
        buildout.cfg
        setup.py
        setup.cfg

In this example, company.app depends on company.foo and company.bar. There are pros and cons to separating your software project into more than one package. For small projects, it’s probably not worth it, but for large projects it can be a really helpful way to separate functionality into more easily controllable blocks. As the project grows and more developers come on board, each package can take on its own life and development pace. Since dependencies across packages can be versioned in setuptools, each package can be developed against stable versions of the other packages at the same time that unstable versions of the other packages continue to be worked on. This is generally how large frameworks like Zope 3 are developed. On the flip side, the entire twisted framework exists in one package hierarchy. You can’t get one part of twisted without getting the rest of it.

Some projects start off by developing all of their packages in tandem using buildout’s develop eggs option, which allows you to specify source directories to use as eggs when finding dependencies. Once you are closer to deployment, it’s a good idea to start making snapshot releases, as needed – sometimes even several per day. This will get you into the mindset of the formal release process you will use in production.

Creating Snapshot Releases

Every release you make should have a unique version number. When you start making several snapshot releases a day, updating the version number of a package can become a chore. To get around this, you can create a setup.cfg in the same directory as setup.py that looks like this:

  [egg_info]
  tag_svn_revision = 1

This tells setuptools to append the svn revision number of the current checkout to the version number. You’ll be able to make releases willy-nilly and each one is guaranteed to have a unique version number.

We then have a setup.py file for each of our packages that looks like this:

from setuptools import setup, find_packages
setup(
    name='company.foo',
    version = '0.1.0',
    author='Company Inc.',
    description='The foo package, which does foo.',
    license = "proprietary",
    packages = find_packages('src'),
    package_dir = {'': 'src'},
    namespace_packages = ['company'],
    install_requires=[
        'setuptools',
        'zope.interface',
        'zope.schema',
        'zope.component',
    ],
    include_package_data = True,
    zip_safe = False)

We can generate a nicely bundled tarball with the sdist setuptools command. Remember to use the python you installed manually.

/opt/python/bin/python setup.py sdist

This command produces a file like dist/company.foo-0.1.0-r4245.tar.gz. The tar.gz file is egg you’ll use to distribute the company.foo package. The next step after builing the egg, is uploading it to a central repository where it can be accessed from any server that requests it.

If you are developing an open source project, then the tools are in your favor. The Python Package Index (PyPI) provides free hosting for any open source python package. It is the default place where setuptools will look to find dependencies.

If for some strange reason you do not want to release your software to the world (your boss telling you so is pretty much the only acceptable one), then you’ll have to set up your own eggs repository.

Setting up an Egg Repository

Setting up an eggs repository couldn’t be simpler. You can even just use subversion if you want. All you need for an eggs repository is a web server (apache works just fine) that publishes index.html pages with links to all the eggs in the directory. Setuptools can parse the links from the index.html page to find the latest version of all the packages you need. Then you just have to upload the tar.gz eggs to the repository to make them officially released.

Alternatively if you are feeling fancy you can run your own instance of PyPI on your own hardware. This has the added benefit of allowing you to use setuptool’s built in commands for uploading releases to your remote repository.

For the ongoing example I’m using, let’s assume you’ve gone with an apache setup that is available at http://deploy.company.com/eggs/. To make sure setuptools checks http://deploy.company.com/eggs/ when looking for dependencies, you have to add a find-links option to your buildout configuration. Since this is project specific, I put the find-links option in the buildout configuration for each of the python packages I’m developing. So the buildout configuration I use for development of the company.app package might look like this:

  [buildout]
  develop = .
  parts = app test python
  find-links = http://deploy.company.com/eggs/

One of the goals we had was secure access to our eggs repository. There are a few options here. You could only allow access to the eggs repository on a local intranet, but this is not enjoyable if your team is distributed or you like to work from home. (I don’t care what you say, vpn is always is not an answer). The other option is to put Basic HTTP Authentication in front of the eggs repository with ssl encryption.

Buildout and setuptools do not support http authentication out of the box but Lovely Systems has developed an extension to add support. lovely.buildouthttp is very easy to use.

There are two steps.

  1. First tell buildout to use lovely.buildouthttp by specifying it in the extensions option. Your buildout.cfg will look like:

    [buildout]
    develop = .
    parts = app test python
    find-links = http://deploy.company.com/eggs/
    extensions=lovely.buildouthttp
        
  2. Next you need to specify the username and password needed to access the eggs repository. This is done by creating a file at ~/.buildout/.httpauth that looks like this:

          your realm, https://deploy.company.com/eggs, username, password
        

Now you’ll be able to securely release and retrieve all your packages from anywhere with a web connection, including most importantly all your production servers.

Not relying on PyPI

As I mentioned earlier, setuptools uses pypi.python.org as the default location to look for dependencies. Most of the time this works great. You just run buildout install and everything you need is magically downloaded. Occasionaly, the PyPI servers go down (this happens less and less) and you will be unable to download dependencies for an unknown period of time. This is not the situation you want to be in 2 hours before you have to give a demo to customers while frantically trying to deploy your software into production. It’s almost always better to rely on your own infrastructure to provide these services. Even though running services yourself might be just as error prone as relying on external services, chances are you’ll actually be able to do something about failures when they happen.

Mirroring PyPI

The brute force option for making sure you can always download the dependencies you need is to completely mirror PyPI. The z3c.pypimirror package provides a command line utility that does just this. You can install it on the same server as the egg repository and run it periodically with a cron job. The first time you mirror pypi, it will take quite a while. There is over 4gb of software on pypi, and the mirror will mirror everything including all versions of all software, even the software you are never going to use. The up side is that when you add a new dependency, you’ll already have it available in your mirror!

Using a Caching PyPI Proxy

The lighter weight option, and the option I decided to go with in the end was to use collective.eggproxy. From the collective.eggproxy PyPI page,

It will collect packages on PyPI only when a program like easy_install or zc.buildout asks for it. In other words, unlike some mirrors that act like rsync and get the whole PyPI base (more than 5 gigs) collective.eggproxy will only get what you need.

The PyPI page has enough documentation to get you started. I set up collective.eggproxy to run on the same server as the private egg repository with apache mapping it to a url like http://deploy.company.com/pypi/

Whichever option you choose to go with – a full mirror or a caching proxy, you will have to add the index option to your buildout configuration. It would now look like this:

[buildout]
develop = .
parts = app test python
find-links = http://deploy.company.com/eggs/
extensions=lovely.buildouthttp
index = http://deploy.company.com/pypi/

Finally we can move on to the really fun part, which is writing a production buildout configuration file.

Using Buildout for External Configuration

A lesser known feature of buildout is that you can specify a remote configuration file that buildout should use. Buildout will fetch this configuration file (and any others that it depends on) from the web and run without hickups. This makes it easy to manage all of your configuration files in one place.

I’ve added another simple repository to my deployment infrastructure at a url like http://deploy.company.com/buildouts/. This can host any number of deployment configurations I want. The buildout configuration for a production deployment then has only one additional option specified, which is the directory option. By default, buildout will use the directory in which the buildout configuration is located as the location to write all the files it generates. When using an external configuration file however, you must specify a directory to use on the server that is running the buildout command. This option can always be overwritten on the command line, but it is a good idea to specify a default. The buildout.cfg file in question would now look like this:

[buildout]
develop = .
parts = app
find-links = http://deploy.company.com/eggs/
extensions=lovely.buildouthttp
index = http://deploy.company.com/pypi/
directory = /opt/company-app

I then upload this coniguration file to http://deploy.company.com/buildouts/production-app.cfg along with any other useful config files (like versions-3.4.0.cfg), and run it on the servers with the command:

  buildout -c http://deploy.company.com/buildouts/production-app.cfg

Any binaries associated with the application will then be available at /opt/company-app/bin/ on the server. When you want to update your server with the latest release of your software, All you have to do is rerun the above command and restart your application. It can’t get much easier than that!

When I am testing out my production buildout configurations on my local development machine, I will override the buildout directory to point to the current working directory. The command would then be:

  buildout -c http://deploy.company.com/buildouts/production-app.cfg buildout:directory=`pwd`

I use `pwd` instead of just “.” because you must specify an absolute path. On a side note, if you are deploying a zope 3 application using zserver and the regular zope configuration system (as opposed to paster), you will probably see a lot of example buildout configurations with a section like this:

[zope3]
location = .

For reasons I don’t completely understand, this does not work and you must leave the location option blank, as in

[zope3]
location =

It seems silly, but that’s the way it is. If you know better, I am eager for you to explain.

That is pretty much all there is to productionizing your python software deployments. To recap, here is a brief summary.

Summary

Here are the following steps to setting up production python deployment infrastructure:

  1. Setup your production deployment servers.

    • Install non-system python to /opt/python
    • Install setuptools with the ez_setup.py script
    • Install zc.buildout using easy_install
  2. Setup a repository for your proprietary eggs (if applicable) using.

    • Use apache with ssl encryption and Basic HTTP Auth
    • Use lovely.buildouthttp to add HTTP Auth support to buildout
  3. Start doing frequent release snapshots.

    • Use tag_svn_revision = 1 to automaticaly add revision numbers to your version numbers.
  4. Create a safety net for PyPI.

    • Mirror PyPI using z3c.pypimirror or
    • Use a caching proxy of PyPI with collective.eggproxy
  5. Put buildout configuration files in a centralized repository.
  6. Deploy deploy deploy!

I have been using this setup for about a week now, and it has been working very well so far. In the future I may explore building tar balls that contain all dependencies as well using something like zc.sourcerelease. It can be a huge pain to generate new massive source releases with all the dependencies every time anything has a minor version change, but the flip side is that you can bring up a new machine more quickly. Bringing up more machines in a cloud as load increases is not a usecase I have optimized yet using the buildout method – although I’m sure it can be done.

1 comment
  1. “Productionizing a Zope 3 App with Buildout « Heterogeneous Mixture” was
    in fact a pretty good blog post, . Keep publishing and
    I will keep viewing! Thanks ,Moises

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 76 other followers

%d bloggers like this: