Many projects with Vagrant and Puppet

Many Projects with Puppet and Vagrant

By David Mytton,
CEO & Founder of Server Density.

Published on the 7th May, 2013.

When we started Server Density v2, one of the main ideas was to build it as a collection of RESTful services, all talking over HTTP.

Initially, these were installed locally on a developer’s machine and set up via Apache vhosts running each component separately.

This soon became unmaintainable on a daily basis without a lot of work. We were spending too much time discussing whether a certain component was up to date and fighting bugs caused by API incompatibilities between versions. As we added more services to deal with things beyond the core of the product, this just got worse.

The answer was Vagrant.

What is Vagrant?

Essentially, vagrant is a command line tool for managing virtualbox instances (other backends are now available in the latest version). You use a pre built box (or package your own) to create a fresh virtual machine with all your tools installed, accessible via SSH from the host machine.

Once a vagrant box is configured, it’s just a case of vagrant up, waiting a while and then a system is up and running.

Our vagrant box

We went through some iterations and experimentation to find something suitable for the way we wanted to work, combined with what we could actually achieve in the box. This is the end result for now, and I’ll list some modifications that we have planned but haven’t found the time to do.

The box is split in two, a base and an environment box:

The Base Box

The first stage in the vagrant build was to build a customised base box. We based it on Ubuntu 12.04, 64-bit build as that’s what we planned to use in production. Once that was chosen, I collected a set of dependencies and development tools that were required for each service (MongoDB, Apache, Node.js, vim, screen, etc). These were then deployed into the base box using a fairly standard puppet manifest with some available modules. At this stage, it installs just the tooling, dependencies and makes required system level config changes (networking setup, DNS entries).

Once this box is built, it’s uploaded to a development webserver so all the team have access to it. The Vagrantfile and the puppet manifests live in git, alongside the development box. This means that the base box can be recreated/tweaked/reviewed by anyone at any time if that’s a necessity.

Using a base box like this loses some flexibility. Every time you want to add something new at the base level, you have to rebuild and re upload the entire box. But it saves deployment time in the next stage, which overall results in a win.

You can temporarily work around this by adding a dependency into the environment box, but if you’re not strict with how you manage this it becomes a bit of a mess of where everything is, and you lose the advantage of having a pre-built base box.

The development environment

Vagrant and Puppet

This is a little less straightforward than the base box build.

Starting with a Vagrantfile with the base box imported/declared, we added puppet modules to handle installing our agent into the box so it can report to itself, and a module to install apache vhosts.

Once that was done, we created a puppet module that can handle installing all of the services from git. This includes a clone/update, build process (buildout/composer), and finally run any test or development data scripts that we have. This was mostly a mash of already available bits and abuse of the exec puppet command.

The Vagrantfile is just a file of Ruby code, so it was easy to add something that set the box hostname based on the username of the host, and import a separate settings file for overriding the defaults for the vagrant settings (code checkout locations being the main one).

The final stage for this was to add a script that will kill and then start all the code that runs as a service (tornado/celery mainly). This grew out of an ugly hack involving starting lots of things in screen, and hasn’t really been updated to anything else. It does have a convenient advantage that `screen -list` will tell you exactly what is running, and a total number at the bottom for quick verification that everything started okay.

The code is checked out into a shared directory between the host and the guest, ultimately living on the host. This uses nfs for performance which means we can edit the code using the host editors and tools but the code will still run inside the vagrant box.

Debugging is taken care of by xdebug being configured to point to the host IP for the PHP services, and some work with WingIDE remote debugging for the python services, again with Wing configured to connect into the vagrant box.

Once the box was up and running, we added settings, configs and a custom domain (using vagrant-dns) to enable decent separation and ensure we don’t accidentally hardcode production/development URLs (or at least, that these are easier to catch if it does happen).

The main feature of this box is that the puppet manifests run with each provision. These update and redeploy the code each time, simplifying the update process to a single command, across every service and repository that we have deployed.

The Advantages

  • Reproducible environment for everyone involved.
  • The puppet manifests mean that just a vagrant provision then waiting is enough to bring everything up to date with the latest master.
  • Self contained stack, you can see what is running at any point.
  • Closer to production. We mostly develop on OSX, but deploy to Linux, this gives us both.
  • Shared URLs for testing. We can pop a URL from our vagrant machines into Hipchat, and other members of the team can use it locally, without having to change it. vagrant-dns is a big win for us there.
  • Easy to install. Install virtualbox, vagrant, get the Vagrantfiles, run vagrant up.
  • While there’s nothing in the vagrant configurations that couldn’t be as easily done with some scripting for the host machine and remove the need for virtualisation, it’s handy that when it goes wrong a fresh rebuild is just a destroy and up away. Extremely useful for testing system wide settings.

The Disadvantages

  • A from-nothing vagrant up takes 25 minutes and downloads about 2Gb (1.2Gb for the base box, the rest for code + dependencies + extras).
  • vagrant provision to update to latest can take up to 10 minutes depending on speed of the connection and the host machine, so it’s not that easy to ‘just test’ something. You can update the individual services manually, but then you have the problem that we started with, making sure that everyone has the same code.
  • It’s hard work for the host machine. The box we have configured has 2 cores and 2Gb of RAM allocated. On a 4Gb Macbook Air, that can start getting a little close to resource starvation, particularly with an IDE and a debugger running.
  • Debugging isn’t as easy as I’d like, setting up the debugger is fairly involved in settings, and you can only really debug one thing at a time. Can be awkward when you’re trying to trace values across multiple services.
  • The configuration we have isn’t as close to production as I’d like (no nginx, no caching, no centralised logging) but this is just a matter of spending more time.
  • It doesn’t entirely solve ‘it works for me’. It just becomes ‘it works on my vagrant’. Fortunately, instances of that seem to be a lot less common.
  • Random virtualbox/vagrant/host problems. We’ve had boxes crash, networks go away and all manner of strange things. At least with the code living on the host machine, we’ve not lost work when that happens.

While it seems that there’s more disadvantages than advantages, overall the reproducibility and simplicity of reducing updating to a single command far outweigh the drawbacks of working in a virtualised environment.

Future plans

Most of the future plans for this revolve around gradually bringing it in line with the production environment without losing the flexibility that we have gained.

  • Use the production puppet manifests where possible. Our infrastructure is entirely puppet controlled, so I’d like to increase the reuse where possible.
  • Create a version that uses the multi-vm capability of vagrant to simulate a cluster, with each service separately. Could be handy for looking at scaling/communication problems.
  • See if we can reduce provision time even further, with possible build optimisations. This may then transfer to our deployment system.
  • Move to vagrant 1.2 and test out some other backends.
  • Remove the screen based development start script and move to something more production-like. (This is only used in the vagrant box, the live deployments use proper init scripts.)

Overall, the use of vagrant has been a big win for us as a company, and has reduced a lot of the problems we were having. There’s still some work to be done until we’re completely happy with it, but I’d recommend that anyone looking at building this type of project take a serious look to see if it suits them.

  • omouse

    Why not have a separate machine that only runs vagrant/puppet and creates images and then others can download them from the file server? If even that is too slow, you can move it to a USB stick.

    I never understood this attitude of, “oh here’s some scripts and instructions (either using vagrant, shell scripts, or manual instructions), get your dev environment setup”.

    • TomW

      We use a hybrid approach. The base box is prebuilt and uploaded to a file server, then with vagrant installed and the config files checked out on the local machine, just `vagrant up` is enough to checkout the code, build it and configure the environment box to run the project using the puppet manifests.

      The main reason for the split approach is the filesize of the downloaded box, with a remote team, uploading and downloading multiple gigabytes regularly is impractical.

  • Sgoettschkes

    What we did is mostly similar, but we use the basebox from vagrantup directly and install everything through chef. The upside is that nobody needs to download the vm more than once. Of course provisioning takes some time, but thats ok I guess. The downside is of course that a destroy/up takes about 1 – 2 hours (Mainly because we have a dependency on the whole LaTeX package from debian, which is about 1GB). But that’s something we normally don’t do, so if you really want a clean box, you do that in the evening or through lunch.

  • Cyril Martin

    Very interresting post. I have similar experiences and I asked myself the same questions.

    But I am disagree with the practice to put all tools/softwares in the box ^^ I think you should use a nude OS and produce more complex scripts (Puppet, Chef, Cloudify, Bash, what you want…) to install all dependencies.
    The main reason is not the flexibility, but to be able to apply a security patch on production with no delay (I don’t want to have to modify the box and wait for the next applicative release… even if it’s an option)
    Another benefit, it is easier to perform a diff on two post-intallation scripts than on two box files.

    My two cents,

  • jippignu

    How does vagrant-dns allow you to share the URLs ? wouldn’t each user see their own local install and not the setup of the user sharing the link? since vagrant-dns is “all local” ?

    • It puts a resolver in your local host (OS X) which causes generic names to be resolved to the vagrant box. E.g. we use http://svcname.honshuu.dev where honshuu.dev is our internal name which gets resolved to vagrant. In most cases it’s based on the TLD as a custom resolver.

      • jippignu

        sure – but say there is two developers, one with their own setup

        If Developer 1 share a link to “honshuu.dev” and Developer 2 got the same setup, then his ‘honshuu.dev’ is a different box than Developer 1 ?

        Or does each developer have their own TLD or domain – and static LAN ip?

        • This is really for local development. You probably could set it up to share over a LAN but that’s what a staging environment is for.

  • Docker is another great tool that fits like a glove around Vagrant that may help you guys.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time