How do you document your ops infrastructure?

By David Mytton,
CEO & Founder of Server Density.

Published on the 14th April, 2016.


Editor’s note: For a detailed look at how we systematically unearth productivity black-holes in our Ops team, join the webinar at the end of this page. Note, this is a new version of an article originally published on 03/15/2014.

As your team infrastructure grows, one of the most important things is how it’s documented. Anyone new joining the team, existing members working on new areas, or even the on-call team needs to know how things work.

The first line of documentation is essentially config management, and for this we use Puppet. This defines things like packages, config files, server roles, etc. However, it only defines the “state.” In addition to this, documentation needs to cover things like emergency response, how to deal with alerts, failover procedures, processes, checklists and vendor information.

What do we want from our ops documentation?

I recently started a project at Server Density to revamp all our docs. We’ve had some problems which could have been avoided or resolved faster if our docs were better. As our infrastructure continues to grow, this is important to address properly, and then keep well maintained.

Confluence

Historically, we used Confluence as a wiki but we gradually transitioned to using GitHub with markdown formatted files alongside code. However there are some problems:

  • Search. Searching in GitHub is more designed to search code, and requires some filters for the organisation and repository. We’d need to split the docs to a separate repo to avoid the code alongside them also being searched. In Confluence search was never accurate and also quite slow.
  • Editing. The biggest challenge for any documentation is keeping it up to date. Being able to quickly edit the docs is important and there’s some overhead with a wiki format or having to commit code—it’s minor, but is an extra step. Formatting is also inflexible.
  • Collaboration. Being able to work on a a doc simultaneously or discuss changes / comment on areas of a doc is much better in GitHub than on Confluence but is still focused around individual commits, or pull requests combining specific changes. This works well for a specific body of work but not for ongoing discussions.
  • Speed. GitHub has a good performance but Confluence is really slow at everything. We used their hosted version rather than the on-premise install.

In summary, we want a system that has minimal barriers to creating / editing docs, can be searched quickly and accurately, is easy to collaborate on and ideally it should also be available offline and/or downloadable.

GitHub

How do other people document their infrastructure?

I asked on Twitter to see what other people were doing, having looked online and not found much about what other companies are doing (other than a brief mention of Confluence by Etsy).

You can click through to see the range o replies—they included things like Mediawiki, GitHub Wiki, OneNote, HackPad (was since acquired by DropBox), Confluence, and some more complex tools with offline sync. Also noted was how GitHub do this, using Markdown files which are synced offline too.

What did we pick for Ops documentation?

Having already tried confluence and Markdown files in GitHub, I decided to try Google Docs. The whole team already had access to it through the web, offline and via mobile; documents can be created and edited very quickly, in-line and collaborated on by multiple team members; it has a built in drawing tool so we can create system diagrams; it’s very fast to load; and crucially, search is incredibly fast and accurate. Indeed, it is Google search after all! You can also download documents in multiple formats to store offline if you prefer.

Are you doing something different or have a good way to address the documentation problem—please do comment!

Also, make sure you join the Running Better Ops Teams webinar (see below). It reveals the ins and outs of how to systematically unearth engineer-time black holes, eliminate knowledge silos, and save time for things that matter: Improving your product and growing your business.

Google Docs

Free eBook: 4 Steps to Successful DevOps

This eBook will show you how we i) hacked our on-call rotation to increase code resilience, ii) broke our infrastructure, on purpose, to debug quicker and increase uptime, and iii) borrowed practices from the healthcare and aviation industry, to reduce complexity, stress and fatigue. And speaking of stress and fatigue, we’ve devoted an entire chapter on how we placed humans at the centre of Ops, in order to increase their productivity and boost the uptime of the systems they manage. What are you waiting for, download your free copy now.

Help us speak your language. What is your primary tech stack?

What infrastructure do you currently work with?

  • One challenge with Google Docs will be inter-document linking. This is why I happen to prefer wikis (Dokuwiki or Mediawiki). Many wikis seem to have rather robust communities with lots of plugins to extend features. But if your org doesn’t really need inter-document linking and is fine with the existing feature set, then I’d imagine Google Docs would actually have the upper hand, especially with its ability to index and search a large document base.

    • This is a good point. Google Docs does have the ability to do linking within Google Drive (although I could only make this work when the link anchor text exactly matched the name of the destination of the link) but it has to be created as a manual hyperlink within the doc. In most cases this will be one of the big, long Google Docs links. The advantage of a wiki is it does all that for you (when you use the right syntax).

      Going from where we are now with using Markdown files in Github then we have to create the links manually anyway.

      Using a wiki just seems very heavy to me, and search isn’t very good in my experience. The text editors built into them often break and it’s quite rigid in how it is structured.

      • nowthatsamatt

        What do you do about keeping everything organized in Google Apps? Ours is untenable now because we have so many documents. I rarely go in there because all I see is this: http://puu.sh/7ldWH.png

        I’d love to have something in plaintext for speed and organized into categories of documents maybe?

        • My approach has been to use a well organised directory structure to do some basic categorisation, and have a top level folder that is shared rather than the individual files. Otherwise I rely on search to find the docs rather than browsing.

  • bicofino

    In the past I just used mediawiki, but today all documentation(network draws, documents, procedures and so on) is on gitlab as a wiki page or on a repository. What I like about it is I can check fast when someone changed a file or such.
    And markdown is nice!

    • Markdown is nice. I actually use a plugin in Dokuwiki that lets you write your pages in Markdown.

      • sirgilot

        What`s the name of this plugin? Thanks!

    • Can you do all the diagrams easily in Gitlab? Presumably that’s outside the abilities of Markdown?

      One cool thing in Google Docs is the ability to subscribe to a document to get notified of changes. Unfortunately this is only on Spreadsheets currently and also not in their new version of spreadsheets yet. https://support.google.com/drive/answer/91588?hl=en

      • bicofino

        Outside markdown I just insert images on wiki pages, or just upload the files to the repository and link them at wiki.

      • Pir

        You can also have this in Confluence, when you “watch” the page you get a mail upon change.

  • sam_benne

    I like using Google docs. My issue would be the data that should never go online like passwords etc. With Google docs you would have to keep them in another location. We currently use confluence which does allow us to keep all the information together. Its just too big and missing a fee key features.

    • Isn’t it just as risky in Confluence, if not more because you have to deal with all the security? Seems like everyone should have their own account for things rather than a shared password, all of which should be stored in something like 1Password (or equivalent).

      • sam_benne

        We keep it all inside the company not hosted online.

        • Do you allow external access for people who are on call or if there’s an emergency?

  • One thing to watch out for with Google docs is file owners. When somebody leaves you must make sure they transfer all their ownership first. If you are using apps for business a year after they leave Google will delete all the files they own.
    We mostly use Confluence for our docs as when you are in a rush trying to fix something then digging into puppet code can be too slow. We also do static dumps of the site for backup.
    We quite often develop larger documents in Google but then transfer them to other places for storage.

    • Yeh that’s a good point. You can get the user to transfer ownership but from the Google Admin console you can do this too, when deleting the user.

      • transcordia

        Which is good if your authors are part of a Google Domain you have admin rights over. But if you are using Google Docs and your company domain is not a Google Apps domain, you do not have control.

  • MeRoBo

    https://readthedocs.org/ Does all you listed with github – I stumbled on it today after reading your post.

  • We are using mediawiki at the shop I work for. The wiki has grown slow but steadily. It has become the main source of information interchange between infrastructure and tech support / help desk. Currently our main problem now is the lack of structure.

    I’m working on the creation of some new templates to address this problem. I’m also evaluating the installation of a new CSS template to improve navigation and motivate the team to improve the documentation.

    @dmytton:disqus Thanks for sharing your experience on this subject. From my point of view, it’s one of the most neglected areas. Everybody talks about Culture, tools, cloud computing, containers … But very few people talk about their approach to documenting the infrastructure. I think I’ll use part of Confluence’s navigation menu to improve my wiki.

  • If anyone is looking for ideas, blueprints about how to approach the creation of documentation, please take a look at http://docs.writethedocs.org/

    Very few people like documenting. Most of us prefer implementing but I am a firm believer that it can be very rewarding if approached using the right tools and specially the right mind set.

  • SAMER MACHARA

    Hi, I have the greate Idea of use WordPress to document my Network Infrastructure, with WP I have Private and Public Zone, Version control, Document version, Markdown edition.

    What do you think?

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time