Picking server hostnames

Picking server hostnames

By David Mytton,
CEO & Founder of Server Density.

Published on the 23rd January, 2014.

[Editor’s note: We just published a sequel here.]

Last night we had an issue in our failover data center in San Jose, USA. We use our data center (Softlayer) internal DNS resolvers in the /etc/resolv.conf on all our servers and their resolvers went down. This didn’t cause any customer impact but meant that all the servers in that data center were unable to resolve remote hostnames.

Our workaround for this was to change the resolvers to something else and because we’re using Puppet, we can do this very quickly across a large cluster of machines. However, we only wanted to apply the change to the SJC data center.

Our hostname policy is fairly straightforward. An example web server is:

hcluster3-web1.sjc.sl.serverdensity.net

This is made up of several parts:

  • hcluster3 – this describes what the server is used for. In this case, it’s of cluster 3, which hosts our alerting and notification service (all of Server Density is built using a service orientated architecture). Other examples could be mtx2 (our time series metrics storage cluster, version 2) or sdcom (servers which power our website).
  • web1 – this is a web server (either Apache or nginx) and is number 1 in the cluster. We have multiple load balanced web servers.
  • sjc – this is the data center location code, San Jose in this case. We also have locations like wdc (Washington DC) or tyo (Tokyo).
  • sl – this is the facility vendor name, Softlayer in this case. We also have vendors like rax (Rackspace) and aws (Amazon Web Services).

When we had a much smaller number of servers, the naming convention was based on characters in His Dark Materials by Philip Pullman. So for example, a master database server was Lyra with the slave being Pan. Picking names like this doesn’t scale after 10 or so servers but then you can transition to other things, like names of stars, lakes, rivers, etc.

xkcd: Permanence

xkcd: Permanence

We moved to the current naming structure a few years ago and this now allows us to quickly identify key information about our servers but also helps when we want to filter by provider or specific locations.

In our Puppet /etc/resolv.conf template we can then do things like:

<% if (domain =~ /sl/) -%>
<% if (domain =~ /sjc.sl/) -%>
# google DNS - temp until SL fixed
nameserver 8.8.8.8
nameserver 8.8.4.4
<% else %>
# Internal Softlayer DNS
nameserver 10.0.80.11
nameserver 10.0.80.12
<% end -%>
...

And when it comes to using the Puppet console, we can quickly trigger puppet runs or take actions against specific nodes:

Puppet Filtering

How do other people do this?

If we dissect a traceroute to the Server Density website we can pick out locations (lon, nyc, wdc) and router names:

22/01/14 12:28:18 david@dm-mba ~: mtr serverdensity.com
 1. 10.0.0.1
 2. c.gormless.thn.aa.net.uk
 3. a.aimless.thn.aa.net.uk
 4. bbr01.lon01.networklayer.com
 5. ae1.bbr02.tl01.nyc01.networklayer.com
 6. ae7.bbr01.tl01.nyc01.networklayer.com
 7. ae1.bbr01.eq01.wdc02.networklayer.com
 8. ae0.dar02.sr01.wdc01.networklayer.com
 9. po2.fcr03.sr02.wdc01.networklayer.com
10. gip.wdc.sl.serverdensity.net

Our ISP uses names related to the use case with “-less” appended, because when they ordered their first piece of equipment they took the structure from the Dilbert cartoon of the day!

Looking at how Amazon route traffic for the static website hosting feature of S3, you can even look at network maps to visualise where packets are flowing from the UK, to Osaka and then Tokyo in Japan.

22/01/14 12:28:50 david@dm-mba ~: mtr serverdensity.jp
 1. 10.0.0.1
 2. c.gormless.thn.aa.net.uk
 3. d.aimless.thn.aa.net.uk
 4. 83.217.238.93
 5. ae-7.r23.londen03.uk.bb.gin.ntt.net
 6. as-0.r22.osakjp01.jp.bb.gin.ntt.net
 7. ae-0.r22.osakjp02.jp.bb.gin.ntt.net
 8. 61.200.91.154
 9. 27.0.0.199
10. 27.0.0.2011
11. 103.246.151.32
12. 103.246.151.38
13. s3-website-ap-northeast-1.amazonaws.com

This works nicely for a relatively small number of servers which are not transient and have defined workloads but it could be problematic if you have a huge number of machines and/or frequently launch/destroy transient cloud instances or VMs. You can see how Amazon deals with this by setting the hostname to the internal IP address but I could see other solutions where you use an instance ID instead. I’d be interested to learn how other people are doing this.

Free eBook: 4 Steps to Successful DevOps

This eBook will show you how we i) hacked our on-call rotation to increase code resilience, ii) broke our infrastructure, on purpose, to debug quicker and increase uptime, and iii) borrowed practices from the healthcare and aviation industry, to reduce complexity, stress and fatigue. And speaking of stress and fatigue, we’ve devoted an entire chapter on how we placed humans at the centre of Ops, in order to increase their productivity and boost the uptime of the systems they manage. What are you waiting for, download your free copy now.

Help us speak your language. What is your primary tech stack?

What infrastructure do you currently work with?

  • There’s some interesting naming conventions being highlighted over on Reddit: http://www.reddit.com/r/sysadmin/comments/1w0v5p/how_do_you_pick_your_server_hostnames/

    • Jeremy Wilson

      I got into a rather heated discussion on reddit about hostnames – not in that particular thread but another one. I’m of the opinion that “cute” names are unprofessional and they really tried hard to argue otherwise.

      I’m glad ServerDensity uses a logical naming scheme instead of typical Lord of the Rings characters. I’ve been at this 20 years – so many “gandalf”s… Too many.

      • Agreed. Using fun names is probably fine if you have one or two of your own servers at home but using sensible elements makes things much easier for managing them, particularly as your infrastructure grows!

  • Hey David, nice post! At Transloadit we pick random Scottish girl names from a list of 10000. It works better for us as our platform is highly volatile (numbering could lead to conflicts when launching in parallel), and when debugging problems it’s much easier to form a mental picture when talking about alice and anneroy, vs encoder8894 and encoder8849.

    As for resolving dns outage, we had similar problems and to counter I wrote a little bash script that changes resolv.conf as long as there is outage. It’s more robust than plugging local proxies in between than can also break imho. It’s been working well for us so maybe it helps http://kvz.io/blog/2013/03/27/poormans-way-to-decent-dns-failover/

    • Isn’t there a possibility of name conflicts too? If you use a random number generator that should have the same effect as picking the names?

      • We cycle until we get a lock on a non-existing name. This would be much harder to implement for sequential numbering though as two launchers would always fight over the next number in line, whereas with these random names there’s a ~9999:1 chance to acquire it in one go

  • Zoran

    Here is how we did it – somebody likes it, somebody thinks it is too complicated and non-human. :)

    Before that , it was names of gods, animals, …. ans so on.

    But we ended up having so many servers, so those names were about to be hard to find out of, and it was also hard to find an unused server name.

    In the “nomenclature”, the hostname was comprised of the following:

    “Importance”:
    – P = production
    – T = test
    – S = staging
    – D = development

    Physical placement:
    – 1 = data center 1
    – 2 = data center 2
    – 3 = backup location

    “Owner”:
    – CIT = Corporate IT
    – FIN = Department of Finance
    – ….

    Application hosted:
    – DC = domain controller
    – FW = firewall
    – ….

    Host number:
    – the number that identifies unique host among other of the kind: 01, 02, …

    OS:
    – W = MS Windows
    – L = Linux
    – U = Unix

    Functionality:
    – PTL – portal
    – SS – shared services
    – …

    So, i.e. a server named “P2CITSHP07WPTL” would be a production server located in datacenter 2, belonging to Corporate IT, having MS SharePoint installed on a Windows OS and serving as a frontend for a portal.

    Of course, one could rearrange the order of appearance in the names if having other priorities, and it was not problem-free (a lot of machines were virtual, and could be moved from one DC to another, and then would have wrong datacenter number, and so on…).

    • Quite a lot of information contained within the hostname there. Have you considered splitting them up as part of the subdomain, as that might make it easier to read?

  • serverhorror

    Have you considered removing any meaning from hostnames and actually decide to perform actions against hosts that have certain attributes?

    Something along the lines of:

    facter is_mysql … is_apache is_production ….

    • That would be a good way to deal with large numbers of servers and works well for things like web pools, but I’m not sure it’s suitable for database servers which tend to stay around for a longer period. If you did it this way then you’d also need to ensure you split them up into groups so you don’t update them all at once.

      • serverhorror

        well current naming schemes are just displaying the fact that a host is thought to have certain attributes so why not go all the way and actually rely on attributes that stay updated?

        I’ve seen too many ….db-master….. hosts that indeed were a db-master for something but not mainly (apart from calling something master which implies it can failover and the something else….). Maybe the name matched at some point in time, it didn’t when I got my hands on it.

        As for not updating everyting at once….

        facter is_canary

        or use something that provides an election method (etcd leaders could be of use here)

        • serverhorror

          On a side note: Iff you are using puppet you are doint that anyway if you are using any kind of fact. Maybe not fully but relying on the information facter presents to you is actually just that: forget hostnames, test in a certain environment and let it apply to everything that matches.

          • Indeed. Hostnames are really something that has come from years ago when infrastructure was more static. This is why hostnames for auto scaling environments tend to just be random IDs. However, there is still some relevance to at least part of the hostname e.g. the domain, location, data center, and even the cluster use e.g [randomString]-web1.wdc.sl.serverdensity.net

        • Agreed, putting deeper meta data like “is master” into the hostname is a not a good idea, because that state can change and so is better stored in the meta data or even queried directly from the DB. We don’t do this, and name members of a DB cluster just numerically and with letters e.g. 1a, 1b, 1c, 1d, etc.

          I think naming needs to be generic enough to help with understanding the architecture without sometimes being wrong, as is the case when master state changes.

  • Back in the early 90’s we had a nice Ren & Stimpy theme going. Log was one of my favorites.

  • VaidikKapoor

    How do you guys differentiate between machines running in different environments? Your naming strategy does not have a provision for that. So wondering how do you guys solve that problem.

    • Easy, You should only differentiate on knowledge relevant to the team that knows it. Meaning don’t somehow get that the developers consider a certain server ‘PROD” in your directory or hostname. Now, you can add them to your website name, but that would hopefully be provisioned by a different team than the one who did the host… thus you don’t differentiate, as the host doesn’t have ENVs….. the app does.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time