Our choices for python web applications

So, at work, we’re doing some “next generation” versions of a bunch of our backoffice tooling. That involves producing a bunch of cute little web applications, that often control not so cute and not so little processes (like transcoding and publishing and whatnot). The course-grained architecture pattern is pretty simple and familiar: database with information about files, jobs, tasks and metadata, some common libraries for interacting with the database, some web application middleware using those libraries, and a web server frontend serving up the middleware.

Pretty much normal bread-and-butter stuff. It’s not quite like document-based CMS work (you don’t really want to store many-gigabyte video in a JCR repo), but a lot of the technology choices are still similar.

Customizing a snake

Based on the various tech we have deployed today, and the skills of the people working on this kind of thing, we’re trying to standardize around two main server-side technologies: java and python. This post explains the choices we made for the python universe. At the moment those choices are actually not so easy, since there’s so much happening and so many projects are moving so fast. We scouted the web quite a bit to figure out what to do.

Lower layers

  • OS: ubuntu 7.10 (still some nodes on 6.10)
  • Database: MySQL 5.0.45 (comes with ubuntu, a bit reconfigured of course) with some little bits of replication
  • Python: Python 2.4.4 (2.5 is not on ubuntu 6.10 and not on all our developer workstations, but we’re testing with it and will upgrade eventually)
  • WSGI server: right now we have a slightly customized cherrypy wsgi server (so that it accepts signals, restarts itself, runs from /etc/init.d, logs in all the right places, etc) behind an apache httpd 2.2 ProxyPass, which also handles SSL/AAA. We want to try and move to mod_wsgi but first we need its mac install to suck a bit less, and so far, cherry is not quite falling over on us. If mod_wsgi doesn’t work out it’ll probably be back to twisted, probably also behind apache for SSL reasons.

Application glue

  • Database access layer: storm, our own slightly modified version. We really like storm, and every now we find we are pushing it a bit beyond its limits, which leads to some bits of patching (by people smarter than me!). Fortunately it seems the guys working on it are quite responsive on IRC. I expect there’ll be a few (more) patches from us that flow back upstream. I really hope someone implements support for forking out reads and writes to different nodes (like you get for free with MySQL Connector/J), either in MySQLdb or inside storm.
  • Python web glue: We’re trying to do everything completely WSGI-based, though most everything at the moment is actually inside CherryPy 3.1b1 handlers. The WSGI pattern works just fine and scales nicely enough in our tests.
  • Templating: Genshi 0.4.4 (we had to pick one, there’s a few good choices here)
  • XML bits and pieces: lxml 1.3.6. It’s the best XML support in python so far, but it still isn’t quite as good as what you get in java. All the various bits and pieces just aren’t quite as mature, and the underlying libxml2 doesn’t quite do XML schema support as well, and I also miss something like XMLBeans for python.

Out of the box?

We took a look at a bunch of the web frameworks out there. We didn’t seriously consider zope, but we took a long stare at pylons, turbogears and django before deciding not to bother with them. We’re not using much of paste either. Basically we missed one or more of

  • good support for storm out of the box
  • doing everything the WSGI way
  • good and correct documentation
  • easy to scale / make efficient
  • stable core with excellent compatibility and bugfixing

And perhaps a few other things, and on the balance we guessed it would be easier to roll our own and integrate components, rather than strip something else down, and maintain lots of vendor branches.

Key point: standardization good

Two years ago I would’ve picked twisted without blinking and invented another fancy wheel on top of it, but I’m happy I don’t have to do that anymore. Twisted has quite a learning curve, not just for app developers, but also for the people that need to deploy and scale the beast.

Two good things happened to the python webapp world: competition and standardization. Now things are progressing rapidly.

Progress is good, but it can result in various kinds of chaos that don’t help the application developer that likes to plan ahead a bit. The new scripting language based mega frameworks seem to attract a certain kind of developer and they probably work for a certain set of use cases, but standardizing on patterns and interfaces is much more useful for (opinioned!) people like us (with subtly deviating use cases). So framework authors: please do keep working on bridging the gap between all of them by cutting ’em down into tiny little WSGI middleware bits and pieces, and turn frameworks into libraries where you can.