Bootstrapping a web application platform

Some rules for building a web application platform:

Use a configuration management system.
Put all configuration in version control.
Have back ups of everything important. Verify all backups.
Have documentation of configuration and process.
Reliability and availability of configuration management is critical.
Have monitoring of everything important. Verify all monitoring.

Unfortunately it is quite hard to get to this state for some existing system that violates those rules.

I’ve seen a few organizations deploy a first version of a platform “by hand” and then spend painful months re-imaging all their servers based on some chosen configuration management tool, causing all sorts of destabilization and pain.

Similarly, time and time again I also see people fail to consider backups from the start. Sorting out backups properly once they have been burnt by data loss is a very costly process, too.

It is well worth avoiding such mess by making sure you are at this clean state from the start, and then never ever leave it.

Before you begin

Some of the choices that you should have made:

hardware and/or hosting platform
operating system
version control system
backup system
configuration management system
monitoring system
availability / redundancy requirements

That last bullet might be unexpected, but you need some idea of how much availability you want right away. The reason is that when all your servers are managed through centralized configuration management backed up by a version control system, those components become critically important. Imagine a scenario where a newly found security hole in one of your software components is being exploited for a DDOS attack. To deal with this, you need to upgrade the software component, and to do that, you need your configuration management and version control functioning.

Because there are a lot of scenarios that you haven’t planned for, and because extensive downtime is expensive, it is probably worth significantly over-provisioning for your configuration management, version control systems, and monitoring systems.

Boostrapping configuration management

Install a repo server
Install version control on the repo server
Install configuration management on the repo server
Put the configuration of the repo server into version control
Install a backup server
Install configuration management on the backup server
Put the configuration of the backup server into version control
Configure backups of the repo server
Install a fallback repo server using the versioned configuration
Restore a backup of the repo server onto the fallback repo server
Fail over to the fallback repo server
Reinstall the repo server using the versioned configuration
Restore a backup of the fallback repo server onto the repo server
Fail over to the repo server
Install a secondary backup server using the versioned configuration
Restore a backup from the secondary backup server onto the fallback repo server
Fail over to the fallback repo server
Fail over to the repo server
Reinstall the backup server using the versioned configuration
Restore a backup from the backup server onto the fallback repo server
Fail over to the fallback repo server
Fail over to the repo server
Reboot every server in turn
Shut down all servers then start up all the servers
Document everything you’ve done so far in version control:
- Document the precise bootstrap procedure
- Document the precise failover procedure
- Document the precise backup restoration procedure
- Document the precise procedure to add a new server
- Document the precise procedure to reboot a server
Test all the documented procedures (by executing them)

Bootstrapping essential services

With basic configuration management in place, we can now work on setting up things like DNS (zone files obviously go in version control), NTP (which should run on every machine), SSH (make sure you’re secure enough), logrotate (make sure you don’t fill up your disks).

What services should be considered essential varies a bit. For example you might use some external DNS provider if all you have are public IPs. Similarly, if you want to use x509 client certificates to talk to SVN, then this is probably a good time to set up your certificate authority. Obviously, you should make sure to back up your certificate authority root certificate and the server certs for your essential servers somewhere really safe, like on a couple of dedicated, labelled, high quality USB keys.

Again, make sure you have enough instances of all these services to satisfy your availability guarantees. 2 is a good minimum.

Boostrapping package management

Up until now you’ve probably been downloading and installing packages from somewhere external. It is a good idea to set up a mirror (and a backup mirror) of the package servers you use so that you’re not dependent on the 3rd party’s server availability. Make sure all your own servers then use these package mirrors.

This is also a good point to consider how you will handle installation of your own software. If you’re going to be packaging it up, also set up your own package management servers at this point.

Bootstrapping security

Security is a rather big topic and I’ll ignore most of it here, but there’s a few places where it connects specifically to bootstrapping. Your configuration management tools should help ensure you can relatively easily roll out software updates. However you still have to put in place a process to ensure that you will roll out required security updates in a timely fashion.

For example, you should subscribe to security announcements from the vendors of the software that you use. It can be a good idea to keep your own log of security updates in version control. Keep a timestamp of when the update was received and what action was taken (where a common action is “no action”). This allows you to audit your vulnerability windows later on.

Bootstrapping a base build

If you are using managed hosting or a cloud service, presumably you’re starting off from a base install of your chosen operating system and so you don’t have to build your own servers. But if you’ve got your own hardware, you need a good process in place for building or rebuilding a server, which needs to be documented and repeatable, and hopefully its quick, too.

If you’re using your own hardware, remember that some hardware can be faulty so whenever you add new hardware you should have burn-in tests, which can usually double as a basic hardware benchmark.

Setting up initial monitoring

A monitoring setup is probably the first complicated piece of infrastructure to set up. Its pretty reasonable to assume that most of the currently installed systems behave reasonably well, and that individual performance metrics on them are not very important, so you don’t need that much monitoring. Perhaps:

alerts:
- server unreachable
- service unreachable
- backup failed
- configuration management operation failed
- partition usage above threshold (80%)
- swap usage above threshold (1%)
- 15 minute load average above threshold (10.0)
Per-server graphs at 5 minute granularity:
- load average
- CPU usage
- Memory usage
- Network usage and/or I/O

With monitoring it is particularly important to make sure that alerts are working, so make sure to test this. Its also important that newly added servers get monitored, so automate (and test) that, or make sure the procedure documentation explains how to add a server to the monitoring.

Oh, and once you build an actual web application, it is a really good idea to get 3rd party monitoring of your web application(s), so that you are not dependent on your own monitoring to know when they go down. You might want to use one of those 3rd party services to monitor your monitoring tool.

A dummy service

You should now create a very simple service as an integration test of all the basic infrastructure. I like setting up a vanilla apache httpd with a trivial CGI that writes data you give it to a file.

Make sure to test that you can install, upgrade, downgrade, uninstall, reinstall the service
Make sure that you get alerts when it goes down
Make sure 3rd party monitoring and your monitoring correlate
Make sure log rotation of the apache logs is working
Make sure backups of the written data are working and that you can restore from them

Bootstrapping complete!

That’s it. Keep following the same discipline that’s exemplified in the description above. As a consequence you will know that things are safe and secure, because you will always follow the base rules mentioned above.

You can now either set up a real service, or start setting up the development infrastructure, such as issue tracker, wiki, and continuous integration servers. I don’t think those are strictly necessary at first – in my experience it is much easier to change how you do issue tracking or software builds than it is to change how you do deployments.

Is this really necessary?

Well, maybe not, but I tend to think it’s really a worthwhile investment. Consider what you now have:

Peace of mind because you have full backups
Peace of mind because you have full monitoring
Peace of mind you will get an alert when something important breaks
Peace of mind because you can quickly fail over if anything breaks
Peace of mind that you could rebuild everything in the case of disaster

Also consider that if you already know what you are doing, the above is actually less work than you may think. I haven’t tried in quite a few years, but I imagine it’d take me about a week to get it all set up from scratch.

Cloud computing to the rescue?

If you use an application-as-a-service platform like Google AppEngine you don’t have to do any of this, that is, as long as you trust your cloud provider to have done a good enough job themselves so that you are confident you don’t have to have a backup “just in case”!

If you use an elastic computing platform like EC2, you really still have to do most of this work. More importantly, because you don’t get very good uptime guarantees on EC2 instances themselves, you still have to find some dedicated managed hosting for your configuration management systems.

That said, it is pretty easy to use EC2 to do most of the bootstrapping work, and EC2 is also a pretty cheap way to keep your backup servers, that you can mostly keep turned off until you actually need them, especially if having a few minutes of unscheduled downtime is acceptable.