Archive

python

UPDATED: fbconsole Pypi Package and Github Repository

Sometimes you just want to write a little script using Facebook’s api that updates your status, or downloads all your photos, or deletes all those empty albums you accidentally created. In order to streamline my writing of one-off facebook scripts, I created a micro api client that implements the client-side authentication flow and has a few utility functions for accessing the graph api and fql.

To use this mini api client, all you have to do is put 4 lines of code at the top of your python script:

from urllib import urlretrieve
import imp
urlretrieve('https://raw.github.com/gist/1194123/fbconsole.py', '.fbconsole.py')
fb = imp.load_source('fb', '.fbconsole.py')

Now you can specify the permissions you’ll need for your script (from the list of available api permissions) and authenticate yourself:

fb.AUTH_SCOPE = ['publish_stream']
fb.authenticate()

By default, the api client makes requests as the “fbconsole” app. You can use your own app by setting fb.APP_ID. When you authenticate, a browser window will open asking for whatever permissions were requested by your script. After you go through the permission dialog, the script will continue running. The access token used is stored in a local file when you authenticate so the next time around you won’t be presented with a dialog in your browser.

Once authenticated, you can make whatever calls to the graph api or fql that you want. For example:

Post a status update

status = fb.graph_post("/me/feed", {"message":"Hello from my awesome script"})

Fetch likes on a status update

likes = fb.graph("/"+status["id"]+"/likes")

Delete a status update

fb.graph_delete("/"+status["id"])

Upload a photo (why does python make this so hard?)

fb.graph_post("/me/photos", {"message":"My photo", "source":open("my-photo.jpg")})

Query FQL tables

friends = fb.fql("SELECT name FROM user WHERE uid IN "
                 "(SELECT uid2 FROM friend WHERE uid1 = me())")

If you download https://raw.github.com/gist/1194123/fbconsole.py and run it, you’ll be dropped into a python shell so you can just play around with api calls in an interactive environment. An IPython shell will be used if you have IPython installed.

The code is just in a gist on github at https://gist.github.com/1194123. Feel free to comment on this post or on the gist if you have questions.

Ever since Facebook launched graph.facebook.com, I’ve been wanting to check it out and see just how superior it is to Facebook Connect. It turns out it was pretty easy to authenticate myself from python using OAuth 2. So I wrote a little script that spits out my feed on the command line.

If you’ve ever felt frustrated implementing an OAuth client before, rest assured that OAuth 2 is one million times easier to work with than OAuth 1. You don’t have to keep track of all these different tokens or worry about generating signatures in just the right way. The one gotcha here is that you need a web server for the user to be redirected to after they authorize your application with Facebook. Fortunately, python makes it really easy to start up a mini webserver for the purposes of OAuth.

Here is the script in 75 lines of python. If you want to try it out yourself, you’ll have to register an application with Facebook at http://developers.facebook.com/setup/.

#!/usr/bin/python2.6
import os.path
import json
import urllib2
import urllib
import urlparse
import BaseHTTPServer
import webbrowser

APP_ID = 'your-app-id-here'
APP_SECRET = 'your-app-secret-here'
ENDPOINT = 'graph.facebook.com'
REDIRECT_URI = 'http://127.0.0.1:8080/'
ACCESS_TOKEN = None
LOCAL_FILE = '.fb_access_token'
STATUS_TEMPLATE = u"{name}\033[0m: {message}"

def get_url(path, args=None):
    args = args or {}
    if ACCESS_TOKEN:
        args['access_token'] = ACCESS_TOKEN
    if 'access_token' in args or 'client_secret' in args:
        endpoint = "https://"+ENDPOINT
    else:
        endpoint = "http://"+ENDPOINT
    return endpoint+path+'?'+urllib.urlencode(args)

def get(path, args=None):
    return urllib2.urlopen(get_url(path, args=args)).read()

class RequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):

    def do_GET(self):
        global ACCESS_TOKEN
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()

        code = urlparse.parse_qs(urlparse.urlparse(self.path).query).get('code')
        code = code[0] if code else None
        if code is None:
            self.wfile.write("Sorry, authentication failed.")
            sys.exit(1)
        response = get('/oauth/access_token', {'client_id':APP_ID,
                                               'redirect_uri':REDIRECT_URI,
                                               'client_secret':APP_SECRET,
                                               'code':code})
        ACCESS_TOKEN = urlparse.parse_qs(response)['access_token'][0]
        open(LOCAL_FILE,'w').write(ACCESS_TOKEN)
        self.wfile.write("You have successfully logged in to facebook. "
                         "You can close this window now.")

def print_status(item, color=u'\033[1;35m'):
    print color+STATUS_TEMPLATE.format(name=item['from']['name'],
                                       message=item['message'].strip())

if __name__ == '__main__':
    if not os.path.exists(LOCAL_FILE):
        print "Logging you in to facebook..."
        webbrowser.open(get_url('/oauth/authorize',
                                {'client_id':APP_ID,
                                 'redirect_uri':REDIRECT_URI,
                                 'scope':'read_stream'}))

        httpd = BaseHTTPServer.HTTPServer(('127.0.0.1', 8080), RequestHandler)
        while ACCESS_TOKEN is None:
            httpd.handle_request()
    else:
        ACCESS_TOKEN = open(LOCAL_FILE).read()
    for item in json.loads(get('/me/feed'))['data']:
        if item['type'] == 'status':
            print_status(item)
            if 'comments' in item:
                for comment in item['comments']['data']:
                    print_status(comment, color=u'\033[1;33m')
            print '---'

There has been some talk about using “class based views” in Django to make view code more reusable. Apparently, there was even a presentation given about it. At Divvyshot, our code base is growing quickly and we are starting to reuse view code a lot. We’ve been refactoring all of our view code into classes, which makes them much easier to customize and mash together. Today I worked on some pretty exciting stuff that makes harnessing class-based views a snap.

Here’s a scenario we run into a lot.

  1. We have a view that displays information about a person with a url like /people/{id}/ where the id is the person object’s id field
  2. We have another view that displays information about an event with a url like /event/{slug}/ where the slug is some small number of alphanumeric characters uniquely identifying the event.
  3. We have a third view that shows information about an event relating to a person with a url like /event/{event_slug}/person/{person_id}/

The third piece to the above combination is where class-based views really pay off. We already have a bunch of code for working with a person’s data and a bunch of code for working with an event’s data. Wouldn’t it be great if we could just magically combine those two pieces of code and get all the data about both an event and a person and their relationship spit out onto a page? Well, we can and here is a simplified example of how it would look in our code base.


First there is the code for displaying a page about a person. I’ll explain in detail what’s going on.

class PersonDetail(Handler):
    template = "myapp/person/detail.html"
    person = fromurl("id").model(Person)
    def update(self):
        # do a bunch of stuff with self.person, for example
        if self.request.user.get_person() == self.person:
            self.context['page_title'] = "This is you"
        else:
            self.context['page_title'] = "%s %s" % (self.person.first_name, self.person.last_name)

In Detail

First you’ll notice that PersonDetail is a class and not a function. Django does not require views to be functions, just to be callable. PersonDetail subclasses Handler, which provides the __call__ method that’s necessary to make an instance of PersonDetail callable. In case you are jumping to conclusions, we do not use an instance of PersonDetail directly as a callable view for thread safety reasons that I will explain later.

The next thing you’ll see is that the template is specified as a class attribute with the path used by a template loader. The actual template rendering with a proper request context and all that jazz is abstracted away for us by a render method defined in the Handler class.

The next cool thing is the line person = fromurl("id").model(Person) which declaratively spells out the mapping from a url parameter to a Person model object. In particular, this says to pull out the id from the keyword arguments passed to the view function (based on the regex in the url conf) and use it to look up a Person object. By default, a 404 response is returned if no such object is found. This is sort of a replacement for person = get_object_or_404(Person, id=some_id) that works better with class-based views.

Next we have an update method, which gets called before the template is rendered. The purpose of the update method is just to prepare the view, and not to render a template to a response. That means adding stuff to the template context, adding additional attributes to the view instance, creating and processing forms, handling post data, etc. By putting all this logic in a standalone method, it is easy to modify the views behavior without having to worry about how the HttpResponse is created.

In this example, we put variables that should be made available to the template into self.context, which is just a dictionary. Alternatively, we could set attributes on the view instance itself, which is made available to the template. For example, having {{view.person.name}} in the template would yield the desired result. The request is also made available as the self.request instance attribute. By setting attributes in the view instance, it becomes much easier to share data between multiple helper methods of the view instance. For example, you might have a method that processes a GET request and a separate one for POST requests. Subclasses of your view can then selectively override just one of the methods and all the while you don’t have to worry about passing around any required data, like the request object itself.


Next we have the code for displaying stuff about an event. This is a lot like the PersonDetail class. The only thing to note is that the event attribute has an additional piece of metadata which says that the "slug" url parameter corresponds to the "url_slug" field of the Event model.

class EventDetail(Handler):
    template = "myapp/event/detail.html"
    event = fromurl("slug").model(Event, "url_slug")
    def update(self):
        # do a bunch of stuff with self.event
        self.context['page_title'] = self.event.name


As the final section of the scenario I outlined above, we will combine these two classes using python’s multiple inheritance support. Strictly speaking, it’s not necessary to use multiple inheritance to combine the functionality of the previous two classes, and frankly I haven’t decided yet whether it is a good idea. But as long as you are careful and know what’s going on in the base classes, it should be OK. This is python after all and we don’t do hand holding.

class EventForPerson(EventDetail, PersonDetail):
    template = "myapp/event/person.html"
    def update(self):
        # do a bunch more stuff with self.event and self.person
        EventDetail.update(self)
        PersonDetail.update(self)
        self.context['page_title'] = "%s and %s" % (self.person.first_name,
                                                     self.event.name)

This example is a bit contrived because the only thing any of the update methods do is set the same variable in the template context to something different. But the idea you should take home from this is that the views could have arbitrarily complex business logic that can be easily extended and customized through subclassing, just as can be done with Model objects, admin views, HttpResponse objects, or anything else that is object oriented. With the multiple inheritance setup we have, our template, myapp/event/person.html can access the person object, the event object, and anything else provided by the update methods from EventDetail and PersonDetail. We could even {% include %} the other two templates in myapp/event/person.html and they would just work. In creating the EventForPerson class, we didn’t even have to worry about how the Event and Person objects get looked up from the parameterized url. If we refactor the object lookup later (for example, switching from person ids to person slugs), we’ll only have to change the code in one place.

Url confs

Now for a quick note about how these get hooked up in a url conf file. You might be tempted to do something like this:

urlpatterns += patterns('',
    url(r'^event/(?P<slug>[\d\w\-]+)/person/(?P<id>\d+)/', EventForPerson()),
)

where the EventForPerson class is instantiated so as to provide the url conf with a callable object. But this means you would have one instance of EventForPerson for every request that gets processed. Besides this not being thread safe, it’s just plain confusing because the update methods might “dirty up” the instance while processing one request, and that might affect the next request that gets processed. To avoid that, our urlconf looks like this:

urlpatterns += patterns('',
    url(r'^event/(?P<slug>[\d\w\-]+)/person/(?P<id>\d+)/', EventForPerson.view),
)

where EventForPerson.view is just a class method that instantiates and calls a brand new instance of EventForPerson for each request, passing in whatever parameters it receives and returning whatever result it gets. Unfortunately, due to a limitiation of Django, you cannot use the handy string notation url(r'^some-regex', "myapp.views.EventForPerson.view") to achieve the same result. So you have to import the view classes into the url conf.

Dealing with conflicting regex groups in a urlconf

The last feature I want to briefly mention is how we deal with conflicting groups in a urlconf. Suppose that both our base classes, PersonDetail and EventDetail looked up objects based on a regex group named “id”. If we wanted to combine the these two view classes into one, the url regex pattern would have to use different group names. The pattern might look like ^event/(?P<event_id>\d+)/person/(?P<person_id>\d+)/. Even though the base classes are looking for the “id” group, we can override their behavior in a subclass. It would look like this:

class EventForPerson(EventDetail, PersonDetail):
    template = "myapp/event/person.html"
    event = EventDetail.event.fromurl("event_id")
    person = PersonDetail.person.fromurl("person_id")
    def update(self):
        # do a bunch more stuff with self.event and self.person

Without having to know which models are used to look up person and event, I can still reconfigure which parts of the url get used to look them up.

Conclusion

If you don’t need to reuse your view code, you shouldn’t bother writing them as classes. If you do need to reuse view code, writing them as classes is the only sane way to do it. The utility classes we use at Divvyshot for all our class-based views are still baked into the code base but I hope to open source the useful bits soon. If you are interested in using a similar class-based view implementation, let me know and I’ll move the open sourcing of these utilities higher up on my to-do list.

If you are familiar with writing Django applications, you have probably run across the problem of extending the builtin User authentication model. Django does not yet have the hooks necessary for modifying the User object in a nice way, so you more or less have to resort to monkey patching.

Here is the basic monkey patching pattern I have seen:

def user_get_name(self):
    # do something with the user object which is self
    return "%s, %s" % (self.last_name, self.first_name)

User.get_name = user_get_name

Or if it is really just a one liner you can use a lambda, which avoids dirtying up the local namespace of wherever you are performing the monkey patching:

User.get_name = lambda self: "%s, %s" % (self.last_name, self.first_name)

The first monkey patching pattern makes reading the code incredibly painful (at least to me) and the lambda pattern isn’t much better either.

Decorator Pattern

You can perform the same operations in a more readable manner using decorators. Here is what it would look like:

def monkeypatch(cls):
    def decorator(f):
        setattr(cls, f.__name__, f)
    return decorator

Now to monkey patch the get_name method of a User object, you would do this:

@monkeypatch(User)
def get_name(self):
    return "%s, %s" % (self.last_name, self.first_name)

I personally think this is a bit more readable. The real advantage to using a monkeypatch decorator though, is that you call out the fact that you are monkey patching. While reading the above code, it is very clear that monkey patching business is going on.

Monkey patching is almost never the best way to accomplish what you’re trying to do, but it will often get the job done fast. To remind yourself that you should revisit any monkey patching code later and think of a better way to do it, consider renaming the decorator to XXXmonkeypatch.

Class decorators with python 2.6

If you are using python2.6, you can also use monkey patching decorators on entire classes. Here is an example of such a decorator:

def monkeypatch(cls_to_patch):
    def decorator(cls):
        cls_to_patch.__bases__ += (cls,)
        return cls
    return decorator

You would use this decorator like so:

@monkeypatch(User)
class MyUser:
    def get_name(self):
        return "%s, %s" % (self.last_name, self.first_name)

    def get_initials(self):
        return self.first_name[0]+self.last_name[0]

The main caveat with this method is that MyUser actually becomes a base class to User so if User ever gets a new method of the same name as one of your monkey patch methods, the User version will take precedence. This might be a feature depending on what exactly it is you are doing.

With my recent move to Divvyshot, I’ve started learning a lot more than I ever planned about Django, which to the chagrin of many a python web framework has become more or less the defacto standard for developing web application in Python (for better or worse), to the extent of being among the few 3rd party libraries included in Google App Engine.

Being a long time Zope 3 guy, you can imagine the shock I had encountering all the differences between Zope 3 and Django. ORMs? What are those? And what do you mean I have to save the object explicitly after changing it? You mean I have to implement a get_absolute_url method for every single object I want to publish? How come I can’t easy_install Django? Where are all the doctests? How do I generate test coverage reports? Security? What security? And I’m sorry, but my editor just doesn’t understand your non-xml templating language. There’s more, but I’ll leave it at that.

So now begins my long trek to making Django work for me and hopefully for others as well. At the top of my list are deployability (eggs, externalized configuration, buildout?!) and maintainability (more tests and test automation). Part of this trek will involve discovering solutions that have already become standard practice in the Django community. Another part will be writing new tools to fill the gaps. And one last part will be extracting some fantastic solutions from the Zope community and making them work for Django.

As a first step, I’m going to look at just getting a development environment setup. The installation instructions and tutorial on djangoproject.com are easy enough to follow but lack a certain level of repeatability and encapsulation that I’m used to with Zope 3 projects. To those who don’t know, I’m talking about buildout. Let’s look at an example.

Say I want to start hacking on this project called z3c.formdemo, which is a full fledged Zope 3 web application. Here’s what I do:

  
    $ svn checkout svn://svn.zope.org/repos/main/z3c.formdemo/trunk z3c.formdemo
    $ cd z3c.formdemo
    $ python bootstrap.py
    $ ./bin/buildout

    ... go get a cup of tea if you haven't done this before ...
              ... I never said Zope 3 was small ...

    $ ./bin/demo fg
  

Just like that I’ve got everything installed including third party packages, test scripts, database configuration, web server, yadda yadda. It didn’t even touch my system python.

Now I need a similar setup for my Django project. Enter Paver, a general utility for performing simple tasks using python. Paver is kind of like a Make for python, but with a few steroids. Buildout would be another very good option, and I wouldn’t be surprised if eventually I find myself using Buildout instead of Paver. However, after having used both a fair amount, I get the feeling that Paver is more likely to jive with the Django community. After all, it does use python as its configuration language whereas buildout strictly uses the INI format. It’s also a bit easier to create “paver tasks” than “buildout recipes” as you don’t need a python egg to house your code, though you can use one easily. (Someone please correct me if you do not need a python package to house a buildout recipe.)

Paver uses pavement.py files to define “tasks” you want to perform. The first task I want to perform is to “bootstrap” the project. That means installing everything needed to start the server, and preferably without mucking up my system python.

Paver comes with a task for generating a bootstrap.py script that will:

  • Create a virtual environment where you can install things without messing up system python.
  • Install any number of 3rd party packages your project depends on into the virtual environment.
  • Run a function in your pavement.py file that can do anything else that needs to be done.

There is a bit of a catch-22 in that you have to have Paver and virtualenv installed to generate the bootstrap.py but thereafter anyone else who uses the bootstrap.py script will not need Paver to run it.

Here is what the initial pavement.py file will look like:

# /path/to/my/django/project/pavement.py

from paver.easy import *
options(
    virtualenv=dict(
        script_name="bootstrap.py",
        packages_to_install = [
            # Project dependencies
            'flickrapi',
            'BeautifulSoup',
            'Pygments',
            'Markdown',
            'gdata',
            'twitter',
            'Django',
            ],
        paver_command_line="init",
        ))

INSTRUCTIONS = """
Run
   $ source bin/activate
to enter the virtual environment and
   $ deactivate
to exit the environment.
"""

@task
def init():
    """Initializing everything so you can start working"""
    print "virtual environment successfully bootstrapped."
    print INSTRUCTIONS

With the initial pavement.py file created in my project directory, I can now generate the bootstrap.py script by running the command:


$ paver paver.virtual.bootstrap


If you get an error about an unknown task “paver.virtual.bootstrap” it means you don’t have virtualenv installed.

Now for the cool part (note the file paths):

  $ python bootstrap.py
  ...
  $ source bin/activate
  $ which python
  /path/to/my/django/project/bin/python
  $ which easy_install
  /path/to/my/django/project/bin/easy_install
  $ python
  Python 2.5.4 (r254:67916, Aug  9 2009, 20:26:50)
  >>> import flickrapi
  >>> print flickrapi.__file__
  /path/to/my/django/project/lib/python2.5/site-packages/flickrapi-1.2-py2.5/flickrapi/__init__.py

virtualenv gives you your very own python executable and easy_install script that will install things directly to your very own site-packages folder. No need to have sudoer privileges. Having a dedicated site-packages folder for each project makes it really easy to work on multiple projects with different dependencies (different versions of the same dependencies too!) without installing anything system wide.

So that is the virtualenv step.

With this in place, getting started on an existing django project is easy. Here is what the entire process might look like on the command line:

  $ svn co svn://svn.someproject.com/someproject/trunk someproject
  $ cd someproject
  $ python bootstrap.py
  $ source bin/activate
  (someproject)$ python manage.py runserver

At first I thought I would be annoyed by having to type source bin/activate all the time, but in practice it hasn’t been a problem. You could skip the activate step and always run scripts directly from the virtual environment’s bin/ directory (i.e. ./bin/python) and get the same effect. You would have to use the bin/* scripts explicitly with buildout anyhow.

I’m now at djangocon in Portland and have been hearing some interesting things about pip, an easy_install replacement. Hopefully @ianbicking will be able to explain why it is worth using. Given what appears to be a trend towards using git and other DVCS both for code development and package distribution, pip’s built-in support for installing packages from svn/bzr/git/hg
repositories could be extremely useful.

I am happy to announce the initial publication of a new tool called ZBoiler.

ZBoiler is a collection of tools for generating boiler plate code for starting python projects. There are three main pieces that comprise ZBoiler, a web application/service (http://prealpha.zboiler.com), the boil command line program, and a few libraries of code generators.

Brief Architecture

The main problem with boiler plate code generators used in python today is that they work completely from file templates. If you want to modify and improve upon a template, your only option is to fork the template. They are inherently non-pluggable and non-flexible.

ZBoiler improves on template based code generators by providing an abstract and pluggable representation of code snippets, called builders which handle the actual code generation while providing a clear API for modification. Instead of writing “class Foo:\n pass” to a file, you construct a class builder, modify it however you want, then render it to a file.

ZBoiler provides an additional layer on top of builders that configures a collection of builders into high level “features” like documentation, egg-based distribution, unit testing, or anything else you might want your python project to do. Since we’re not using templates, multiple “features” can modify the same collection of builders safely, allowing you to mix and match high level features however you want. Once you have decided on your features, you generate all the boilerplate code in one step.

As a final level of abstraction, we also group features into project templates. Project templates are useful for getting started on a larger project that uses a framework. For example, you might want to start a Zope 3 project, or Grok project, or Django project, or PyGame project. Each of these projects will typically have their own solutions for testing, deployment, etc. that correspond to different features.

zboiler.com

zboiler.com website.

zboiler.com website.

The zboiler.com website provides a pluggable web interface to all the project templates and individual features that are available. Each feature can be configured through the web and once you are satisfied, you can download a tarball of the generated code. At the moment we have project templates for creating egg-based python packages, python command line programs, and full fledged Zope 3 applications. You can see an early screencast of how it works here: http://zboiler.com/demo.html. (I plan on doing an updated one soon.)

The boil command

For those who are not interested in clicking around on a website, there is also a relatively intelligent command line interface to the same project templates and features available on the zboiler.com website. The easiest way to use the boil command is with predefined templates. Here is what a typical session might look like:

We can start by listing the available project templates.

$ boil -l
Available Templates:

  zope-project   "Zope 3 Web Application"
                   Includes all the features you would want for a Zope 3 Web Application.
  command-line   "Command Line Program"
                   Includes all the features you would want for a command line program.
  python-package "Python Package"
                   Just a simple python package with few bells and whistles.

ZBoiler is completely pluggable using setuptools entry points so it is relatively easy to add more templates to this list.

You can then boil a template interactively, which will prompt you for any values missing from the template:

$ boil -t python-package
Enter the name for this project: z3c.foobar

The python-package project template will prompt us for a lot of data used by setuptools.

Options for: z3c.feature.core:meta-data
---------------------------------------
Project Description (? for help): The Foo Bar Project
License [GNU General Public License (GPL)]:
using default value: GNU General Public License (GPL)
Author(s) (? for help): Paul Carduner
Author Email (? for help):
URL (? for help):
Version [0.1.0] (? for help): 0.5.0
Namespace Packages (? for help): ?
A list of namespace packages that should be created, one per line (i.e. zope or zc or z3c or collective)
Namespace Packages (? for help): z3c
Namespace Packages (? for help):

The interactive editor handles validation and complex data types like list entry automatically.

Keywords (? for help): simple
Keywords (? for help): zboiler
Keywords (? for help): example
Keywords (? for help):
Install Requires (? for help): 

Finished creating xml definition.

Once you finish going through the interactive wizard, you can see the xml project definition, which is what the features use to configure themselves. Notice that each feature has a type that points to an entry point.

Do you want to see the generated xml definition? (y/[n]): y

<project name="z3c.foobar">
  <feature type="z3c.feature.core:meta-data">
    <author>Paul Carduner</author>
    <author-email></author-email>
    <description>The Foo Bar Project</description>
    <version>0.5.0</version>
    <license>GNU General Public License (GPL)</license>
    <url></url>
    <keywords><item>simple</item><item>zboiler</item><item>example</item></keywords>
    <namespace-packages><item>z3c</item></namespace-packages>
    <install-requires/>
  </feature>
  <feature type="z3c.feature.core:python-interpreter"/>
  <feature type="z3c.feature.core:unit-testing"/>
  <feature type="z3c.feature.core:documentation"/>
</project>

Finally the complete boiler plate code for a new egg-based python package is generated.

Does this look right? ([y]/n): y
INFO - Creating directory ./z3c.foobar
INFO - Creating file ./z3c.foobar/bootstrap.py
INFO - Creating file ./z3c.foobar/setup.py
INFO - Creating file ./z3c.foobar/buildout.cfg
INFO - Creating directory ./z3c.foobar/src
INFO - Creating directory ./z3c.foobar/src/z3c
INFO - Creating directory ./z3c.foobar/src/z3c/foobar
INFO - Creating directory ./z3c.foobar/src/z3c/foobar/tests
INFO - Creating file ./z3c.foobar/src/z3c/foobar/tests/test_doc.py
INFO - Creating file ./z3c.foobar/src/z3c/foobar/tests/__init__.py
INFO - Creating file ./z3c.foobar/src/z3c/foobar/README.txt
INFO - Creating file ./z3c.foobar/src/z3c/foobar/index.txt
INFO - Creating file ./z3c.foobar/src/z3c/foobar/__init__.py
INFO - Creating file ./z3c.foobar/src/z3c/__init__.py
INFO - Creating file ./z3c.foobar/ZBOILER.txt
INFO - Build finished

The Python API

Finally there is also the Python API. Rather than describe it all here, I will let you read the doctests, which are nicely rendered using sphinx here: http://docs.carduner.net/. One of the probably more relevant sections is the python code builders: http://carduner.net/docs/z3c.builder.core/python.html. There is also a long example that walks through the configuration of all the builders necessary for a zope 3 application: http://carduner.net/docs/z3c.builder.core/example.html.

Get the Code

All the code for ZBoiler, including the web application is kept in the Zope subversion repository. Check it out:

$ svn co svn://svn.zope.org/repos/main/z3c.builder.core/trunk z3c.builder.core
$ svn co svn://svn.zope.org/repos/main/z3c.feature.core/trunk z3c.feature.core
$ svn co svn://svn.zope.org/repos/main/z3c.feature.zope/trunk z3c.feature.zope
$ svn co svn://svn.zope.org/repos/main/z3c.boiler/trunk z3c.boiler
$ svn co svn://svn.zope.org/repos/main/z3c.boilerweb/trunk z3c.boilerweb

Initial releases are available on pypi. To get the boil command, you can use easy_install:

$ easy_install z3c.boiler

Get Involved

ZBoiler is very new and so far not tested much in the wild. But the foundation is in place for anyone to start contributing new features and project templates to ZBoiler. Here is a short list of features I’d like to see:

  • Google App Engine / Django
  • Paver – automatically generate the paver bootstrap.py and a pavement.py files
  • Other unit testing harnesses
  • PyGame Projects
  • Other cool stuff!!!

Introduction

Python doesn’t enforce information hiding. That is, you can access any attribute or method of a class without restriction. The only way that python programmers know whether they can rely on an attribute or method being part of the supported interface of a class is by naming convention.

For example, if I create a class for maintaining an ordered mapping I might implement it like this:

class OrderedMap:
    def __init__(self):
        self._order = []
        self._map = {}

    def set(self, key, val):
        if key not in self._map.keys():
            self._map[key] = val
            self._order.append(key)
        else:
            self.remove(key)
            self.set(key, val)

    def get(self, key, default=None):
        return self._map.get(key, default)

    def getAt(self, index):
        return self._map[self._order[index]]

    def remove(self, key):
        del self._map[key]
        self._order.remove(key)

    def removeAt(self, index):
        del self._map[self._order[index]]
        del self._order[index]

Ignore for a moment that this implementation is far from feature complete. The interface I want to provide to programmers using this class should just include the methods: set, get, getAt, remove, and removeAt.

However, since this is python, and python does not do information hiding, someone using this class could ignore the “public” interface and directly modify the _order and _map attributes. If the client programmer doesn’t know what they’re doing, they could screw up the data integrity of the object. What’s more, if I modify the OrderedMap class and remove or change the _map or _order attributes then the client code that is mucking with those two attributes will break.

But python programmers are generally trusting of each other and know that when a method or attribute has an underscore in front of it, it means don’t touch this! Although in rare circumstances, when you really know what you are doing, you may need to break the trust bond and access underscore’d methods anyways, such as when using a sys._getframe hack.

For those who are not so trusting but still want to write python, it’s possible to really hide attributes and methods from consumers of your python classes.

Metaclass Madness

So Python(1.5) has this cool (or uncool, depending on who you ask) feature called Metaclasses which allow you to redefine the way classes are defined (confused yet?).

It turns out that by redefining class definitions, we can enforce the underscore naming convention and make it impossible for code to access these “private” methods and attributes.

Here is the basic idea. At class definition time, we are provided with a list of all the class members and their names. Instead of defining one class, we will define two, a private version and a public version. The class being defined becomes the private version, and the public version gets created dynamically using only the class members that do not start with underscores. We then wrap each of the members of the public class in special accessors that give them access to the private class’s members. What you end up with is a public class that is just a facade over the private class.

So without further ado, here is a basic implementation in less than 70 lines of code:

from types import FunctionType

class PrivacyClass(object):
    def __init__(self, name, bases=(), attrs=None, __doc__=None,
                 __module__=None, prefix="_"):
        attrs = attrs or {}
        # XXX: this does not preserve python's multiple inheritance resolution order
        for base in bases:
            for attrName, attr in base.attrs.items():
                attrs.setdefault(attrName, attr)
            prefix = base.__private_prefix
        self.__private_prefix = prefix

        publicAttrs = {}
        for attrName, attr in attrs.items():
            if isinstance(attr, FunctionType):
                attr = self._getPrivateAccessWrapper(attr)
            else:
                attr = self._getPublicAttributeWrapper(attrName)
            if attrName.startswith('__') and attrName.endswith('__'):
                # all operator overloads should be public
                publicAttrs[attrName] = attr
            elif prefix is not None and not attrName.startswith(prefix):
                publicAttrs[attrName] = attr
            elif prefix is None:
                publicAttrs[attrName] = attr

        if not attrs.has_key('__init__'):
            attrs['__init__'] = lambda self: None
        publicAttrs['__init__'] = self._getInitWrapper(attrs['__init__'])

        self.privateClass = type(name, (object,), attrs)
        self.publicClass = type(name, (object,), publicAttrs)

    def __repr__(self):
        return self.privateClass.__name__

    def __call__(self, *args, **kwargs):
        private = self.privateClass(*args, **kwargs)
        return self.publicClass(private, *args, **kwargs)

    def _getInitWrapper(self, init):
        def wrapped(instance, *args, **kwargs):
            private = args[0]
            instance.__private__ = private
            args = args[1:]
            init(private, *args, **kwargs)
        return wrapped

    def _getPrivateAccessWrapper(self, method):
        def wrapped(instance, *args, **kwargs):
            return method(instance.__private__, *args, **kwargs)
        return wrapped

    def _getPublicAttributeWrapper(self, attrName):
        def getter(instance):
            return getattr(instance.__private__, attrName)
        def setter(instance, val):
            setattr(instance.__private__, attrName, val)
        def deleter(instance):
            delattr(instance.__private__, attrName)
        return property(getter, setter, deleter)

Privacy = PrivacyClass("Privacy", prefix="_")

Using metaclasses can be a bit tricky and if you don’t do it correctly, the modified class definitions can break some common OOP patterns that your average python developer would expect to work right out of the box. At this point, I’ve just covered single inheritance and method overriding. Problems will probably be encountered if you try to use multiple-inheritance in any complicated manner or if you try to dynamically create public instance members that weren’t in the original class definition.

Now all you have to do is inherit from the Privacy class and attempts to access private members will raise AttributeErrors.

    >>> class Example(Privacy):
    ...     def talk(self):
    ...         print "I am talking"
    ...
    ...     def _whisper(self):
    ...         print "I am whispering"

    >>> e = Example()
    >>> e.talk()
    I am talking
    >>> e._whisper()
    Traceback (most recent call last):
    ...
    AttributeError: 'Example' object has no attribute '_whisper'

I’ve created a small python package with this code in it along with some documentation and tests. I doubt I will ever release it on PyPI unless someone comes along and tells me it is actually useful, but if you would like to peruse the different use cases I tried to cover, documentation with runnable examples can be found at http://www.carduner.net/docs/privacy/README.html. If anyone is interested, I’m also happy to write up a more detailed explanation of what’s going on in the metaclass, as it’s not exactly the most straight-forward thing to grok.

Suppose you have a python package/program called foobar that is distributed as a python egg and you want to programatically discover the version number for the foobar package/program from inside its own code. This can be easily achieved using the pkg_resources module.

import pkg_resources
pkg_resources.get_distribution("foobar").version

There is lots of other good information to be found in the distribution object that represents a python egg. But I’ll leave the rest for you to discover!

I recently went through the process of putting a Zope 3 application into production. I learned a lot about buildout that I thought might be useful for other people to know. Here are the goals that I wanted to accomplish:

  • One step deployment
  • Externalized deployment configuration
  • Easy dependency management
  • Reliability (don’t choke when pypi goes down)
  • Reproducability

Server Setup

It’s important to decide early on what your base server deployment is going to look like. Most of the time, a bare bones installation of CentOS or Ubuntu Server edition are not going to cut it for the specialized software you will be running (unless you *really* want to go through the trouble of building debs or rpms). As a precursor to using buildout for some of the more detailed configuration requirements, we have to set up the server with the basics.

Non-system Python

Anyone who has been using buildout or eggs for a while knows that it’s generally not a good idea to use the system’s python when installing and building custom software since different OS vendors don’t always provide the standard library that you’d expect nor in a way that is necessarily compatible with setuptools and eggs.

We are installing python2.5 under /opt/python which creates a number of binaries under /opt/python/bin/. Once you have a custom python setup, you want to run the ez_setup.py script to get the latest setuptools. You can do this in one shot on most linux systems with the command:

curl http://peak.telecommunity.com/dist/ez_setup.py | /opt/python/bin/python

Now you will have the eazy_install command available to you upon which you can install buildout:

/opt/python/bin/easy_install zc.buildout

At this point you are probably asking “what happened to the one step deployment?” Fortunately, you really only have to instally easy_install and buildout once. It’s almost always a good idea to keep all your servers running on the exact same hardware and OS so that any binaries generated by one of the servers can be run on all the others. Once you have installed easy_install and buildout, you can just tar up the entire /opt/python directory and make it part of your standard deployment on all machines. In the event you are using python libraries with C extensions like scipy or lxml, it would be a good idea to have those built now and be included in the base system tarball.

Building and Publishing eggs

When you deploy software, it is immensely useful to have all the code bundled together into a single easy to distribute file. Python eggs provide a great way for bundling your software with all the metadata you need to manage complex distributions. Let’s consider an example application that uses a number of different packages.

Working with Multiple Packages

If you are building a large python application, chances are that you have split out reusable functionality into standalone python packages that are all combined and integrated with a primary application package. For example, you might have the following package structure:

packages/
    company.app/
        buildout.cfg
        setup.py
        setup.cfg
    company.foo/
        buildout.cfg
        setup.py
        setup.cfg
    company.bar/
        buildout.cfg
        setup.py
        setup.cfg

In this example, company.app depends on company.foo and company.bar. There are pros and cons to separating your software project into more than one package. For small projects, it’s probably not worth it, but for large projects it can be a really helpful way to separate functionality into more easily controllable blocks. As the project grows and more developers come on board, each package can take on its own life and development pace. Since dependencies across packages can be versioned in setuptools, each package can be developed against stable versions of the other packages at the same time that unstable versions of the other packages continue to be worked on. This is generally how large frameworks like Zope 3 are developed. On the flip side, the entire twisted framework exists in one package hierarchy. You can’t get one part of twisted without getting the rest of it.

Some projects start off by developing all of their packages in tandem using buildout’s develop eggs option, which allows you to specify source directories to use as eggs when finding dependencies. Once you are closer to deployment, it’s a good idea to start making snapshot releases, as needed – sometimes even several per day. This will get you into the mindset of the formal release process you will use in production.

Creating Snapshot Releases

Every release you make should have a unique version number. When you start making several snapshot releases a day, updating the version number of a package can become a chore. To get around this, you can create a setup.cfg in the same directory as setup.py that looks like this:

  [egg_info]
  tag_svn_revision = 1

This tells setuptools to append the svn revision number of the current checkout to the version number. You’ll be able to make releases willy-nilly and each one is guaranteed to have a unique version number.

We then have a setup.py file for each of our packages that looks like this:

from setuptools import setup, find_packages
setup(
    name='company.foo',
    version = '0.1.0',
    author='Company Inc.',
    description='The foo package, which does foo.',
    license = "proprietary",
    packages = find_packages('src'),
    package_dir = {'': 'src'},
    namespace_packages = ['company'],
    install_requires=[
        'setuptools',
        'zope.interface',
        'zope.schema',
        'zope.component',
    ],
    include_package_data = True,
    zip_safe = False)

We can generate a nicely bundled tarball with the sdist setuptools command. Remember to use the python you installed manually.

/opt/python/bin/python setup.py sdist

This command produces a file like dist/company.foo-0.1.0-r4245.tar.gz. The tar.gz file is egg you’ll use to distribute the company.foo package. The next step after builing the egg, is uploading it to a central repository where it can be accessed from any server that requests it.

If you are developing an open source project, then the tools are in your favor. The Python Package Index (PyPI) provides free hosting for any open source python package. It is the default place where setuptools will look to find dependencies.

If for some strange reason you do not want to release your software to the world (your boss telling you so is pretty much the only acceptable one), then you’ll have to set up your own eggs repository.

Setting up an Egg Repository

Setting up an eggs repository couldn’t be simpler. You can even just use subversion if you want. All you need for an eggs repository is a web server (apache works just fine) that publishes index.html pages with links to all the eggs in the directory. Setuptools can parse the links from the index.html page to find the latest version of all the packages you need. Then you just have to upload the tar.gz eggs to the repository to make them officially released.

Alternatively if you are feeling fancy you can run your own instance of PyPI on your own hardware. This has the added benefit of allowing you to use setuptool’s built in commands for uploading releases to your remote repository.

For the ongoing example I’m using, let’s assume you’ve gone with an apache setup that is available at http://deploy.company.com/eggs/. To make sure setuptools checks http://deploy.company.com/eggs/ when looking for dependencies, you have to add a find-links option to your buildout configuration. Since this is project specific, I put the find-links option in the buildout configuration for each of the python packages I’m developing. So the buildout configuration I use for development of the company.app package might look like this:

  [buildout]
  develop = .
  parts = app test python
  find-links = http://deploy.company.com/eggs/

One of the goals we had was secure access to our eggs repository. There are a few options here. You could only allow access to the eggs repository on a local intranet, but this is not enjoyable if your team is distributed or you like to work from home. (I don’t care what you say, vpn is always is not an answer). The other option is to put Basic HTTP Authentication in front of the eggs repository with ssl encryption.

Buildout and setuptools do not support http authentication out of the box but Lovely Systems has developed an extension to add support. lovely.buildouthttp is very easy to use.

There are two steps.

  1. First tell buildout to use lovely.buildouthttp by specifying it in the extensions option. Your buildout.cfg will look like:

    [buildout]
    develop = .
    parts = app test python
    find-links = http://deploy.company.com/eggs/
    extensions=lovely.buildouthttp
        
  2. Next you need to specify the username and password needed to access the eggs repository. This is done by creating a file at ~/.buildout/.httpauth that looks like this:

          your realm, https://deploy.company.com/eggs, username, password
        

Now you’ll be able to securely release and retrieve all your packages from anywhere with a web connection, including most importantly all your production servers.

Not relying on PyPI

As I mentioned earlier, setuptools uses pypi.python.org as the default location to look for dependencies. Most of the time this works great. You just run buildout install and everything you need is magically downloaded. Occasionaly, the PyPI servers go down (this happens less and less) and you will be unable to download dependencies for an unknown period of time. This is not the situation you want to be in 2 hours before you have to give a demo to customers while frantically trying to deploy your software into production. It’s almost always better to rely on your own infrastructure to provide these services. Even though running services yourself might be just as error prone as relying on external services, chances are you’ll actually be able to do something about failures when they happen.

Mirroring PyPI

The brute force option for making sure you can always download the dependencies you need is to completely mirror PyPI. The z3c.pypimirror package provides a command line utility that does just this. You can install it on the same server as the egg repository and run it periodically with a cron job. The first time you mirror pypi, it will take quite a while. There is over 4gb of software on pypi, and the mirror will mirror everything including all versions of all software, even the software you are never going to use. The up side is that when you add a new dependency, you’ll already have it available in your mirror!

Using a Caching PyPI Proxy

The lighter weight option, and the option I decided to go with in the end was to use collective.eggproxy. From the collective.eggproxy PyPI page,

It will collect packages on PyPI only when a program like easy_install or zc.buildout asks for it. In other words, unlike some mirrors that act like rsync and get the whole PyPI base (more than 5 gigs) collective.eggproxy will only get what you need.

The PyPI page has enough documentation to get you started. I set up collective.eggproxy to run on the same server as the private egg repository with apache mapping it to a url like http://deploy.company.com/pypi/

Whichever option you choose to go with – a full mirror or a caching proxy, you will have to add the index option to your buildout configuration. It would now look like this:

[buildout]
develop = .
parts = app test python
find-links = http://deploy.company.com/eggs/
extensions=lovely.buildouthttp
index = http://deploy.company.com/pypi/

Finally we can move on to the really fun part, which is writing a production buildout configuration file.

Using Buildout for External Configuration

A lesser known feature of buildout is that you can specify a remote configuration file that buildout should use. Buildout will fetch this configuration file (and any others that it depends on) from the web and run without hickups. This makes it easy to manage all of your configuration files in one place.

I’ve added another simple repository to my deployment infrastructure at a url like http://deploy.company.com/buildouts/. This can host any number of deployment configurations I want. The buildout configuration for a production deployment then has only one additional option specified, which is the directory option. By default, buildout will use the directory in which the buildout configuration is located as the location to write all the files it generates. When using an external configuration file however, you must specify a directory to use on the server that is running the buildout command. This option can always be overwritten on the command line, but it is a good idea to specify a default. The buildout.cfg file in question would now look like this:

[buildout]
develop = .
parts = app
find-links = http://deploy.company.com/eggs/
extensions=lovely.buildouthttp
index = http://deploy.company.com/pypi/
directory = /opt/company-app

I then upload this coniguration file to http://deploy.company.com/buildouts/production-app.cfg along with any other useful config files (like versions-3.4.0.cfg), and run it on the servers with the command:

  buildout -c http://deploy.company.com/buildouts/production-app.cfg

Any binaries associated with the application will then be available at /opt/company-app/bin/ on the server. When you want to update your server with the latest release of your software, All you have to do is rerun the above command and restart your application. It can’t get much easier than that!

When I am testing out my production buildout configurations on my local development machine, I will override the buildout directory to point to the current working directory. The command would then be:

  buildout -c http://deploy.company.com/buildouts/production-app.cfg buildout:directory=`pwd`

I use `pwd` instead of just “.” because you must specify an absolute path. On a side note, if you are deploying a zope 3 application using zserver and the regular zope configuration system (as opposed to paster), you will probably see a lot of example buildout configurations with a section like this:

[zope3]
location = .

For reasons I don’t completely understand, this does not work and you must leave the location option blank, as in

[zope3]
location =

It seems silly, but that’s the way it is. If you know better, I am eager for you to explain.

That is pretty much all there is to productionizing your python software deployments. To recap, here is a brief summary.

Summary

Here are the following steps to setting up production python deployment infrastructure:

  1. Setup your production deployment servers.

    • Install non-system python to /opt/python
    • Install setuptools with the ez_setup.py script
    • Install zc.buildout using easy_install
  2. Setup a repository for your proprietary eggs (if applicable) using.

    • Use apache with ssl encryption and Basic HTTP Auth
    • Use lovely.buildouthttp to add HTTP Auth support to buildout
  3. Start doing frequent release snapshots.

    • Use tag_svn_revision = 1 to automaticaly add revision numbers to your version numbers.
  4. Create a safety net for PyPI.

    • Mirror PyPI using z3c.pypimirror or
    • Use a caching proxy of PyPI with collective.eggproxy
  5. Put buildout configuration files in a centralized repository.
  6. Deploy deploy deploy!

I have been using this setup for about a week now, and it has been working very well so far. In the future I may explore building tar balls that contain all dependencies as well using something like zc.sourcerelease. It can be a huge pain to generate new massive source releases with all the dependencies every time anything has a minor version change, but the flip side is that you can bring up a new machine more quickly. Bringing up more machines in a cloud as load increases is not a usecase I have optimized yet using the buildout method – although I’m sure it can be done.

Follow

Get every new post delivered to your Inbox.

Join 65 other followers