Picking server hostnames

Picking server hostnames

By David Mytton,
CEO & Founder of Server Density.

Published on the 23rd January, 2014.

[Editor’s note: We just published a sequel here.]

Last night we had an issue in our failover data center in San Jose, USA. We use our data center (Softlayer) internal DNS resolvers in the /etc/resolv.conf on all our servers and their resolvers went down. This didn’t cause any customer impact but meant that all the servers in that data center were unable to resolve remote hostnames.

Our workaround for this was to change the resolvers to something else and because we’re using Puppet, we can do this very quickly across a large cluster of machines. However, we only wanted to apply the change to the SJC data center.

Our hostname policy is fairly straightforward. An example web server is:

hcluster3-web1.sjc.sl.serverdensity.net

This is made up of several parts:

  • hcluster3 – this describes what the server is used for. In this case, it’s of cluster 3, which hosts our alerting and notification service (all of Server Density is built using a service orientated architecture). Other examples could be mtx2 (our time series metrics storage cluster, version 2) or sdcom (servers which power our website).
  • web1 – this is a web server (either Apache or nginx) and is number 1 in the cluster. We have multiple load balanced web servers.
  • sjc – this is the data center location code, San Jose in this case. We also have locations like wdc (Washington DC) or tyo (Tokyo).
  • sl – this is the facility vendor name, Softlayer in this case. We also have vendors like rax (Rackspace) and aws (Amazon Web Services).

When we had a much smaller number of servers, the naming convention was based on characters in His Dark Materials by Philip Pullman. So for example, a master database server was Lyra with the slave being Pan. Picking names like this doesn’t scale after 10 or so servers but then you can transition to other things, like names of stars, lakes, rivers, etc.

xkcd: Permanence

xkcd: Permanence

We moved to the current naming structure a few years ago and this now allows us to quickly identify key information about our servers but also helps when we want to filter by provider or specific locations.

In our Puppet /etc/resolv.conf template we can then do things like:

<% if (domain =~ /sl/) -%>
<% if (domain =~ /sjc.sl/) -%>
# google DNS - temp until SL fixed
nameserver 8.8.8.8
nameserver 8.8.4.4
<% else %>
# Internal Softlayer DNS
nameserver 10.0.80.11
nameserver 10.0.80.12
<% end -%>
...

And when it comes to using the Puppet console, we can quickly trigger puppet runs or take actions against specific nodes:

Puppet Filtering

How do other people do this?

If we dissect a traceroute to the Server Density website we can pick out locations (lon, nyc, wdc) and router names:

22/01/14 12:28:18 david@dm-mba ~: mtr serverdensity.com
 1. 10.0.0.1
 2. c.gormless.thn.aa.net.uk
 3. a.aimless.thn.aa.net.uk
 4. bbr01.lon01.networklayer.com
 5. ae1.bbr02.tl01.nyc01.networklayer.com
 6. ae7.bbr01.tl01.nyc01.networklayer.com
 7. ae1.bbr01.eq01.wdc02.networklayer.com
 8. ae0.dar02.sr01.wdc01.networklayer.com
 9. po2.fcr03.sr02.wdc01.networklayer.com
10. gip.wdc.sl.serverdensity.net

Our ISP uses names related to the use case with “-less” appended, because when they ordered their first piece of equipment they took the structure from the Dilbert cartoon of the day!

Looking at how Amazon route traffic for the static website hosting feature of S3, you can even look at network maps to visualise where packets are flowing from the UK, to Osaka and then Tokyo in Japan.

22/01/14 12:28:50 david@dm-mba ~: mtr serverdensity.jp
 1. 10.0.0.1
 2. c.gormless.thn.aa.net.uk
 3. d.aimless.thn.aa.net.uk
 4. 83.217.238.93
 5. ae-7.r23.londen03.uk.bb.gin.ntt.net
 6. as-0.r22.osakjp01.jp.bb.gin.ntt.net
 7. ae-0.r22.osakjp02.jp.bb.gin.ntt.net
 8. 61.200.91.154
 9. 27.0.0.199
10. 27.0.0.2011
11. 103.246.151.32
12. 103.246.151.38
13. s3-website-ap-northeast-1.amazonaws.com

This works nicely for a relatively small number of servers which are not transient and have defined workloads but it could be problematic if you have a huge number of machines and/or frequently launch/destroy transient cloud instances or VMs. You can see how Amazon deals with this by setting the hostname to the internal IP address but I could see other solutions where you use an instance ID instead. I’d be interested to learn how other people are doing this.

Free eBook: 4 Steps to Successful DevOps

This eBook will show you how we i) hacked our on-call rotation to increase code resilience, ii) broke our infrastructure, on purpose, to debug quicker and increase uptime, and iii) borrowed practices from the healthcare and aviation industry, to reduce complexity, stress and fatigue. And speaking of stress and fatigue, we’ve devoted an entire chapter on how we placed humans at the centre of Ops, in order to increase their productivity and boost the uptime of the systems they manage. What are you waiting for, download your free copy now.

Help us speak your language. What is your primary tech stack?

What infrastructure do you currently work with?

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time