Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Blog   >   Servers   >   Global elastic IPs – multi-region routing

Global elastic IPs – multi-region routing

On Jul 24 we received an e-mail from our hosting provider, Softlayer, announcing the availability of a new feature called “Global IP Addresses”. I’m surprised that Softlayer haven’t made more noise about this feature because it’s looking to prove incredibly useful, and is something that even the likes of Amazon don’t provide.

Elastic IPs are a feature of Amazon EC2 which provide region level static IPs which can be rerouted to any instance at any time. This means if one instance goes down, you can repoint them to another instance with minimal downtime – no waiting for DNS updates or replacement instances to come up. However, they are limited to a specific AWS region. They cross availability zones but if a whole region goes down, you can’t point the IP to instances in another region.

This level of failover still has to be handled with DNS with a low TTL where you would update the IPs in the zone when there is an outage. However, this relies on ISPs honouring the TTL. There are reports (source 1, source 2) of TTL being ignored, which means some users may still try to access your old, down IP. This can be mitigated by using round robin DNS but there’s still the possibility of hitting the old IP.

Softlayer’s global IP address option allows you to work around this. For $20/m per IP, you have an IP address which can be routed to any cloud or dedicated instance within the Softlayer network, which includes all their data centres (Dallas, Seattle, Washington, Houston, San Jose, Amsterdam and Singapore). So in the event of a local failure of your server (which might be a load balancer), you can re-route the IP within a single DC. And if the whole DC goes down, you can re-route to a completely different DC.

On a network level this works by routing the traffic for the IP to the user’s closest Softlayer presence which forwards it to the allocated data centre. For example, the IP address provided in their documentation – 108.168.255.185 – is allocated to a node in their Dallas data centre. From one of our AWS nodes in US East, it enters at Washington and hits Dallas via Atlanta:

traceroute to 108.168.255.185 (108.168.255.185), 30 hops max, 60 byte packets
...
3 10.1.53.14 (10.1.53.14) 0.497 ms 10.1.43.14 (10.1.43.14) 0.602 ms 10.1.53.14 (10.1.53.14) 0.744 ms
4 216.182.224.88 (216.182.224.88) 21.862 ms 21.835 ms 21.839 ms
5 205.251.245.57 (205.251.245.57) 0.876 ms 72.21.220.241 (72.21.220.241) 0.690 ms 0.972 ms
6 72.21.222.154 (72.21.222.154) 1.838 ms 1.382 ms 1.368 ms
7 72.21.220.30 (72.21.220.30) 2.026 ms 17.269 ms 1.722 ms
8 72.21.221.50 (72.21.221.50) 1.695 ms 2.184 ms 4.305 ms
9 ae7.bbr01.eq01.wdc02.networklayer.com (173.192.18.194) 17.115 ms 17.098 ms 17.540 ms
10 ae0.bbr01.tl01.atl01.networklayer.com (173.192.18.153) 14.465 ms 14.445 ms 14.448 ms
11 ae13.bbr02.eq01.dal03.networklayer.com (173.192.18.134) 35.344 ms 36.102 ms 35.154 ms
12 ae6.dar01.sr01.dal05.networklayer.com (50.97.18.195) 35.143 ms 35.661 ms 34.963 ms
13 po2.fcr03.sr03.dal05.networklayer.com (173.192.118.145) 34.394 ms po1.fcr03.sr03.dal05.networklayer.com (173.192.118.143) 35.931 ms 35.928 ms
14 * * *
15 108.168.255.185-static.reverse.softlayer.com (108.168.255.185) 36.230 ms 36.216 ms 36.387 ms

And from US West, it goes through San Jose, Los Angeles and then Dallas:

traceroute to 108.168.255.185 (108.168.255.185), 30 hops max, 60 byte packets
...
3 216.182.236.107 (216.182.236.107) 0.730 ms 216.182.236.109 (216.182.236.109) 0.725 ms 216.182.236.107 (216.182.236.107) 4.761 ms
4 72.21.222.20 (72.21.222.20) 4.744 ms 4.738 ms 72.21.222.16 (72.21.222.16) 2.070 ms
5 205.251.229.184 (205.251.229.184) 2.299 ms 2.037 ms 2.275 ms
6 te1-7.bbr01.eq01.sjc01.networklayer.com (206.223.116.176) 9.212 ms 205.251.229.184 (205.251.229.184) 1.989 ms te1-7.bbr01.eq01.sjc01.networklayer.com (206.223.116.176) 5.768 ms
7 te1-7.bbr01.eq01.sjc01.networklayer.com (206.223.116.176) 2.825 ms 2.803 ms ae7.bbr02.eq01.sjc02.networklayer.com (173.192.18.165) 3.223 ms
8 ae0.bbr02.cs01.lax01.networklayer.com (173.192.18.151) 11.286 ms 11.280 ms ae7.bbr02.eq01.sjc02.networklayer.com (173.192.18.165) 3.199 ms
9 ae7.bbr01.cs01.lax01.networklayer.com (173.192.18.166) 11.762 ms ae0.bbr02.cs01.lax01.networklayer.com (173.192.18.151) 11.243 ms 10.697 ms
10 ae19.bbr01.eq01.dal03.networklayer.com (173.192.18.140) 80.711 ms ae7.bbr01.cs01.lax01.networklayer.com (173.192.18.166) 11.713 ms 11.235 ms
11 ae6.dar02.sr01.dal05.networklayer.com (50.97.18.193) 78.446 ms ae19.bbr01.eq01.dal03.networklayer.com (173.192.18.140) 78.208 ms ae6.dar02.sr01.dal05.networklayer.com (50.97.18.193) 80.954 ms
12 po2.fcr03.sr03.dal05.networklayer.com (173.192.118.145) 78.568 ms 78.570 ms po1.fcr03.sr03.dal05.networklayer.com (173.192.118.143) 80.660 ms
13 po1.fcr03.sr03.dal05.networklayer.com (173.192.118.143) 80.899 ms * 108.168.255.185-static.reverse.softlayer.com (108.168.255.185) 77.595 ms

We will be testing this new functionality over the next few weeks and will report back any interesting findings.

  • http://twitter.com/nonuby Matt Freeman (@nonuby)

    How has this worked out in practice? I remember reading their were many pitfalls of this and to avoid using for anything other than DNS, although I’ve lost the link (typical).. You make a point about providers not respecting DNS TTL, which I’ve certainly observed before, but presuming a global IP works on providers using your latest advertised routes, what if they tweak the route for optimal performance? (i.e. when you call up your provider and say hey this is going from NY to Washington via LA wtf? and they say ah I see and pin the route over a more suitable transit provider)

    • https://blog.serverdensity.com David Mytton

      We’ve been using this for over a month now and it’s working well. We’ve rerouted the IP several times for maintenance and we regularly test the automated failover of the load balancer too. Routing always happens within a few seconds and we don’t see any loss of traffic. So everything is working nicely. The big test will be when our primary data centre fails and we have to reroute to another DC. We have tested this with no issues but it’s always difficult to replicate that kind of failure (source of the failure, load on their management systems, etc).