When should you deploy a new build?
Working for a small company doing ongoing contract work for a large company, there can be often be disagreements over how to solve problems. Large companies like to solve problems with “process” – a series of steps that must be performed to carry out some task.
The key thing about a process is that it absolves everyone of responsibility: if you encounter failure after carrying out the process, it is the process at fault. Unfortunately process failure leads to process review, with the end result typically being more steps added to the process.
One process that is familiar to many developers is that of website deployment: the process of actually changing what’s online from an old release to a new release. At Mammoth we firmly believe in many of the tenets of agile development and so have a release every one to two weeks. We store our releases in Subversion so the actual mechanics of getting a new release onto the site are both easy and repeatable.
The matter of when during the day (or night) to actually deploy a build seems an easy answer at first. Let’s say for example that you have spent the day testing your application; its now 4pm and time to deploy. The deployment is carried out, and your database immediately falls over: it turns out your application has a bug that causes 1000 extra queries on a popular page.
This outrageous mistake has caused ten minutes of downtime until the deployment was rolled back to the prior release (you do have a rollback plan dont you?). At this point, the deployment process comes under attack: “Why is there a deployment at 4pm? We should deploy at 4am when no one is using the site!”
At first (and second) glance, this line of thought is hard to argue with and indeed this is the current process utilised by our client. And indeed it does appear to work – the site essentially never does crash immediately after being deployed.
Upon closer analysis though, this is not the least bit surprising. The load on our website at 4am is essentially the same as that created during testing. If the website was going to crash with this level of load, it already would have during the test process, hence meaning it would already be fixed.
So at this point everyone pats themselves on the back for another successful deployment and goes to bed. The reality is somewhat different however: what has been deployed is essentially a ticking time bomb. The site may very well still have a bug in it that when exposed to the level of load that 4pm entails, causes the entire site to crash.
This I think is the crux of why 4am deployments (on their own) are bad: it fosters a false sense of safety. Recall that our actual problem was that a defect caused the number of users at 4pm to crash the site. There is only one way to actually prevent this problem – ensure that such a defect is not in the site at 4pm. A couple of methods spring to mind as ways to prevent this:
- Utilise a load testing process prior to each deployment. This should catch most your problems
- Ensure there is adequate monitoring of your database, CPU, memory, etc resources that it is obvious when a defect causes a sudden day-over-day spike.
A secondary problem is that our client is rigid about the one-release-per-week: if a release doesn’t “take” the only option is to rollback. This creates somewhat of a quandry when combined with 4am releases – if you do not rollback until say 4pm, customers have been exposed to new functionality for 12 hours that suddenly disappears.
What’s my answer then? My preference is for 10am deployments: our load peaks at night, so 10am is still relatively quiet. The development team is at work and so on hand to look at any issues. But whatever the time of the release I think being able to make a secondary update later in the day is crucial: on a limited budget and weekly releases, the occasional defect will make it onto the site – that I think is simply reality. What I think is important then is to focus on keeping the customer happy by both avoiding peak-time deployments (potential downtime) and avoiding rollbacks (feature loss).