この記事は日本語でもご覧頂けます。 詳細はこちら。

How we use Puppet – infrastructure, config, failover, deploys

How we use puppet

By David Mytton,
CEO & Founder of Server Density.

Published on the 19th June, 2014.

We’ve been using Puppet to manage the infrastructure behind Server Density for several years now. It helps us in a number of different ways and although we use it as standard config management, that’s actually only about 25% of our use case.

We have 4 main uses for Puppet – infrastructure, config management, failover and deploys – each of which I’ll go through here.

How we use Puppet – Infrastructure

We first started using Puppet when we moved our environment to Softlayer, where we have a mixture of bare metal servers and public cloud instances, totalling around 75-100. When this was set up, we ordered the servers from Softlayer then manually installed Puppet before applying our manifests to get things configured.

Although we recently evaluated moving to running our own environment in colo data centres, we have also started evaluations with other providers, including Google Cloud. My general view remains that colo is significantly cheaper in the long run but there are some initial capital expenses which we don’t want to spend. We also want to make use of some of the Google products like BigQuery.

Using Google Cloud (specifically, Google Compute Engine), or indeed any one of the other major cloud providers, means we can make use of Puppet modules to define the resources within our code. Instead of having to manually order them through the control panels, we can define them in the Puppet manifests alongside the configuration. We’re using the gce_compute module but there are also modules for Amazon and others.

For example, defining an instance plus a 200GB volume:

gce_instance { 'mms-app1':
  ensure         => present,
  machine_type   => 'n1-highmem-2',
  zone           => 'us-central1-a',
  network        => 'private',
  tags           => ['mms-app', 'mongodb'],
  image          => 'projects/debian-cloud/global/images/backports-debian-7-wheezy-v20140605',
}

gce_disk { 'mms-app1-var-lib-mongodb':
  ensure      => present,
  description => 'mms-app1:/var/lib/mongodb',
  size_gb     => '200',
  zone        => 'us-central1-a',
}

The key here is that we can define instances in code, next to the relevant configuration for what’s running on them, then let Puppet deal with creating them.

How we use Puppet – Config management

This is the original use case for Puppet – defining everything we have installed on our servers in a single location. It makes it easy to deploy new servers and keep everything consistent.

It also means any unusual changes, fixes or tweaks are fully version controlled and documented so we don’t lose things over time e.g. we have a range of fixes for MongoDB to work around issues and make optimisations which have been built up over time and through support requests, all of which are documented in Puppet.

Puppet in Github

We use the standard module layout as recommended by Puppet Labs, contained within a Github repo and checked with puppet lint before commit so we have a nicely formatted, well structured library describing our setup. Changes go through our usual code review process and get deployed with the Puppet Master picking up the changes and rolling them out.

Previously, we wrote our own custom modules to describe everything but more recently where possible we use modules from the Puppet Forge. This is because they often support far more options and are more standardised than our own custom modules. For example, the MongoDB module allows us to install the server and client, set options and even configure replica sets.

  include site::mongodb_org

  class {'::mongodb::server':
    ensure    => present,
    bind_ip   => '',
    replset   => 'mms-app',
  }

  mount {'/var/lib/mongodb':
    ensure  => mounted,
    atboot  => true,
    device  => '/dev/sdb',
    fstype  => 'ext4',
    options => 'defaults,noatime',
    require => Class['::mongodb::server'],
  }

  mongodb_replset { 'mms-app':
    ensure  => present,
    members => ['mms-app1:27017', 'mms-app2:27017', 'mms-app3:27017']
  }

We pin specific versions of packages to ensure the same version always gets installed and we can control upgrades. This is particularly important to avoid sudden upgrades of critical packages, like databases!

The Server Density monitoring agent is also available as a Puppet Forge module to automatically install the agent, register it and even define your alerts.

All combined, this means we have our MongoDB backups running on Google Compute Engine, deployed using Puppet and monitored with Server Density.

MMS

How we use Puppet – Failover

We use Nginx as a load balancer and use Puppet variables to list the members of the proxy pool. This is deployed using a Puppet Forge nginx module we contributed some improvements to.

When we need to remove nodes from the load balancer rotation, we can do this using the Puppet web UI as a manual process, or by using the console rake API. The UI makes it easy to apply the changes so a human can do it with minimal chance of error. The API allows us to automate failover in particular conditions, such as if one of the nodes fails.

How we use Puppet – Deploys

This is a more unusual way of using Puppet but has allowed us to concentrate on building a small portion of the deployment mechanism, taking advantage of the Puppet agent which runs on all our servers already. It saves us having to use custom SSH commands or writing our own agent, and allows us to customise the deploy workflow to suit our requirements.

It works like this:

  1. Code is committed in Github into master (usually through merging a pull request, which is how we do our code reviews).
  2. A new build is triggered by Buildbot which runs our tests, then creates the build artefacts – the stripped down code that is actually copied to the production servers.
  3. Someone presses the deploy button in our internal control panel, choosing which servers to deploy to (branches can also be deployed) and the internal version number is updated to reflect what should be deployed).
  4. /opt/puppet/bin/mco puppetd runonce -I is triggered on the selected hosts and the Puppet run notices that the deployed version is different from the requested version.
  5. The new build is copied onto the servers.

Buildbot

Status messages are posted into Hipchat throughout the process and any one of our engineers can deploy code at any time, although we have a general rule not to deploy non-critical changes after 5pm weekdays and after 3pm on Fridays.

There are some disadvantages to using Puppet for this. Firstly, the Puppet agent can be quite slow on low spec hardware. Our remote monitoring nodes around the world are generally low power nodes so the agent runs very slowly. It’s also eventually consistent because deploys won’t necessarily happen at the same time, so you need to account for that in new code you deploy.

Puppet is most of your documentation

These four use cases mean that a lot of how our infrastructure is set up and used is contained within text files. This has several advantages:

  • It’s version controlled – everyone can see changes and they are part of our normal review process.
  • Everyone can see it – if you want to know how something works, you can read through the manifests and understand more, quickly.
  • Everything is consistent – it’s a single source of truth, one place where everything is defined.

It’s not all of our docs, but certainly makes up a large proportion because it’s actually being used, live. And everyone knows how much they hate keeping the docs up to date!

Puppet is the source of truth

It knows all our servers, where they are, their health status, how they are configured and what is installed.

Free eBook: 4 Steps to Successful DevOps

This eBook will show you how we i) hacked our on-call rotation to increase code resilience, ii) broke our infrastructure, on purpose, to debug quicker and increase uptime, and iii) borrowed practices from the healthcare and aviation industry, to reduce complexity, stress and fatigue. And speaking of stress and fatigue, we’ve devoted an entire chapter on how we placed humans at the centre of Ops, in order to increase their productivity and boost the uptime of the systems they manage. What are you waiting for, download your free copy now.

Help us speak your language. What is your primary tech stack?

What infrastructure do you currently work with?

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time