How GOV.UK Reduced their Incidents and Alerts
CEO & Founder of Server Density.
Published on the 25th October, 2016.
Keep reading gentle reader, this is not some Friends episode potboiler joke. We just can’t help getting pumped up with all the amazing HumanOps work that’s happening out there. Independent 3rd party events are now taking place around the world (San Francisco and Poznan most recently).
So we decided to host another one closer to home in London.
In the meantime, let’s take a look at the recent GOV.UK HumanOps talk. GOV.UK is the UK government’s digital portal. Millions of people access GOV.UK every single day whenever they need to interact with the UK government.
Bob Walker, Head of Web Operations, spoke about their recent efforts to reduce their incidents and alerts (a core tenet of HumanOps). What follows is the key take-aways from his talk. You can also watch the entire video or download it in PDF format and read at your own time (see right below the article).
GOV.UK does HumanOps
After extensive rationalisation, GOV.UK have reached a stage where only 6 types of incidents can alert (wake them up) out of hours. The rest can wait until next morning.
GOV.UK mirrors their website across disparate geographical locations and operates a managed CDN at the front. As a result, even if parts of their infrastructure fail, most of their website should remain available.
Once issues are resolved, GOV.UK carries out incident reviews (their own flavour of postmortems). In reiterating the importance of blameless postmortems, bob said:
Every Wednesday at 11:00AM they test their paging system. The purpose of this exercise is to not only test their monitoring system but also to ensure people have configured their phones to receive alerts!
Want to find out more? Watch Bob Walker’s talk. And if you want the full transcript, go ahead and use the download link right below this post.
See you in a HumanOps event!