Getting a feel for the cost of using mysql with WSGI

So is the database the bottleneck?

Note that compared to the last post, I enabled KeepAlive, i.e. I added the -k flag to ab:

ab -k -n 10000 -c 100 http://localhost/hello.txt

I also toned down the MPM limits a bunch:

StartServers         16
MinSpareServers      16
MaxSpareServers     128
MaxClients           64
MaxRequestsPerChild   0

(Apache will now complain about reaching MaxClients, but given the db is on the same machine, things end up faster overall, this way)

And then I decided to just run ‘ab’ locally since I couldn’t really see that much difference vs running it over the network.

Then I proceeded to get myself a MySQL 5 server with a single InnoDB table inside with a single row in it that reads (key:1, value:'Hello world!'). I also set up a memcached server. Now, look at this…

What? Requests per second relative performance hit CPU usage
hello.txt 9700 httpd 2.2% per process
python string 7500 23% httpd 2% per process
memcached lookup 2100 23% * 72% memcached 10%, httpd 2% per process
_mysql 1400 23% * 82% MySQL 77%, httpd 1.2% per process
mysql 880 23% * 82% * 37% MySQL 65%, httpd 4.5% per process
SQLAlchemy 700 23% * 82% * 37% * 20% MySQL 19%, httpd 5.5% per process

Using a mysql database backend costs 82% req/s compared to the “serve python string from global variable”. Totally expected of course, and we have also been taught how to deal with it (caching).

In this example, we’re not actually getting a lot out of using memcached – the memcached approach (I’m using python-memcached) is still 72% slower than having the in-process memory, though if you look at it the other way around, it is 50% faster than using the _mysql driver. Nevertheless, this shows that the mysql query cache is pretty fast, too, esp. if you have a 100% cache hit ratio 🙂

Using the SQLAlchemy ORM framework makes things 20% slower, and it gobbles up a lot of CPU. We also expected that, and we know how to deal with (hand-optimize the SQL in performance-sensitive areas, or perhaps use a faster ORM, like storm).

But, did you know that you can take a 37% performance hit just by using the _mysql bindings directly instead of the MySQLdb object oriented wrappers? Looks like some optimizations should be possible there…

Now, in “real life”, imagine a 90% hit ratio on the memcache, and for the other 10% there’s some expensive ORM processing. Further imagine a 5% performance hit in the logic to decide between memcache and ORM layer. You’d end up doing about 1850 requests per second.

This leads to Leo’s WSGI performance rule of thumb number 1: Even if your python webapp is really really really fast, it will still be about 5 times slower than apache serving up static files.

3 thoughts on “Getting a feel for the cost of using mysql with WSGI”

  1. Don’t underestimate the value of issuing fewer queries to the database, which overall will save a lot more time than just reducing the time spent for each of thousands of small SELECT statements. SQLA was designed around the model of issuing as little SQL as possible to the database, including the “eagerload” model, maintaining the state of loaded collections in-memory, and lots of fluency with joins and subqueries. unit of work options like autoflush and expire on commit can be disabled, and network traffic can be reduced by using deferred columns. These are all performance features that are specific to SQLAlchemy. Per-column queries instead of full object loads also save a ton of resources.

  2. Also worth highlighting is that there is a marked difference even between a ‘hello world’ application and your more typical real application. This is significant as it in part shows that it doesn’t matter what WSGI hosting mechanism you use, any additional throughput you may see in a ‘hello world’ test using one hosting solution over another are going to be almost insignificant in the greater overheads of your actual application.

    Thus, it is more important to find a hosting solution that meets your requirements of ease of setup and management plus features rather than outright speed. To that end, that is why many people settle for mod_wsgi daemon mode. It is more than adequate for most peoples sites and you get features like ability to run as separate user, along with good integration with virtual environments and easy application reloading.

    It would therefore be interesting to see the above done but comparing mod_wsgi embedded mode vs mod_wsgi daemon mode for each test.

Comments are closed.