Picking server hostnames

By
David Mytton,
CEO & Founder of Server Density.
Published on the 23rd January, 2014.
[Editor’s note: We just published a sequel here.]
Last night we had an issue in our failover data center in San Jose, USA. We use our data center (Softlayer) internal DNS resolvers in the /etc/resolv.conf
on all our servers and their resolvers went down. This didn’t cause any customer impact but meant that all the servers in that data center were unable to resolve remote hostnames.
Our workaround for this was to change the resolvers to something else and because we’re using Puppet, we can do this very quickly across a large cluster of machines. However, we only wanted to apply the change to the SJC data center.
Our hostname policy is fairly straightforward. An example web server is:
hcluster3-web1.sjc.sl.serverdensity.net
This is made up of several parts:
hcluster3
– this describes what the server is used for. In this case, it’s of cluster 3, which hosts our alerting and notification service (all of Server Density is built using a service orientated architecture). Other examples could bemtx2
(our time series metrics storage cluster, version 2) orsdcom
(servers which power our website).web1
– this is a web server (either Apache or nginx) and is number 1 in the cluster. We have multiple load balanced web servers.sjc
– this is the data center location code, San Jose in this case. We also have locations likewdc
(Washington DC) ortyo
(Tokyo).sl
– this is the facility vendor name, Softlayer in this case. We also have vendors likerax
(Rackspace) andaws
(Amazon Web Services).
When we had a much smaller number of servers, the naming convention was based on characters in His Dark Materials by Philip Pullman. So for example, a master database server was Lyra with the slave being Pan. Picking names like this doesn’t scale after 10 or so servers but then you can transition to other things, like names of stars, lakes, rivers, etc.
We moved to the current naming structure a few years ago and this now allows us to quickly identify key information about our servers but also helps when we want to filter by provider or specific locations.
In our Puppet /etc/resolv.conf template we can then do things like:
<% if (domain =~ /sl/) -%> <% if (domain =~ /sjc.sl/) -%> # google DNS - temp until SL fixed nameserver 8.8.8.8 nameserver 8.8.4.4 <% else %> # Internal Softlayer DNS nameserver 10.0.80.11 nameserver 10.0.80.12 <% end -%> ...
And when it comes to using the Puppet console, we can quickly trigger puppet runs or take actions against specific nodes:
How do other people do this?
If we dissect a traceroute to the Server Density website we can pick out locations (lon, nyc, wdc) and router names:
22/01/14 12:28:18 david@dm-mba ~: mtr serverdensity.com 1. 10.0.0.1 2. c.gormless.thn.aa.net.uk 3. a.aimless.thn.aa.net.uk 4. bbr01.lon01.networklayer.com 5. ae1.bbr02.tl01.nyc01.networklayer.com 6. ae7.bbr01.tl01.nyc01.networklayer.com 7. ae1.bbr01.eq01.wdc02.networklayer.com 8. ae0.dar02.sr01.wdc01.networklayer.com 9. po2.fcr03.sr02.wdc01.networklayer.com 10. gip.wdc.sl.serverdensity.net
Our ISP uses names related to the use case with “-less” appended, because when they ordered their first piece of equipment they took the structure from the Dilbert cartoon of the day!
Looking at how Amazon route traffic for the static website hosting feature of S3, you can even look at network maps to visualise where packets are flowing from the UK, to Osaka and then Tokyo in Japan.
22/01/14 12:28:50 david@dm-mba ~: mtr serverdensity.jp 1. 10.0.0.1 2. c.gormless.thn.aa.net.uk 3. d.aimless.thn.aa.net.uk 4. 83.217.238.93 5. ae-7.r23.londen03.uk.bb.gin.ntt.net 6. as-0.r22.osakjp01.jp.bb.gin.ntt.net 7. ae-0.r22.osakjp02.jp.bb.gin.ntt.net 8. 61.200.91.154 9. 27.0.0.199 10. 27.0.0.2011 11. 103.246.151.32 12. 103.246.151.38 13. s3-website-ap-northeast-1.amazonaws.com
This works nicely for a relatively small number of servers which are not transient and have defined workloads but it could be problematic if you have a huge number of machines and/or frequently launch/destroy transient cloud instances or VMs. You can see how Amazon deals with this by setting the hostname to the internal IP address but I could see other solutions where you use an instance ID instead. I’d be interested to learn how other people are doing this.