High availability with Pound load balancer

By Pedro Pessoa, Operations Engineer at Server Density.
Published on the 21st August, 2012.

Following our migration to SoftLayer and their release of global IPs, we started to implement our multi-datacenter plan by replacing dedicated (static IP) load balancers with Pound and a global IP. Pound in itself is very easy to deploy and manage. There’s a package for Ubuntu 12.04, the distribution we are using for the load balancer servers, allowing us to have it running in no time out of the box, particularly if what you’re load balancing doesn’t have any special requirements.

Pound load balancer

Unfortunately, this was not the case of our Server Density monitoring web app that requires a somewhat longer cookies header. The Pound Ubuntu package is compiled with the default MAXBUF = 4096 but we needed about twice as that to allow our header through. This is a bug we didn’t discover until testing Pound because our hardware load balancers didn’t have this limit but it highlights something to fix on the next version of Server Density. We don’t particularly like recompiling distribution packages, mostly because we diverge from the general usage and eventually will cause suspicion on these changes if some problem arises on that particular package.

Presented with no other option without breaking existing customer connections (cookie is sent before we can truncate it) we decided to start a PPA for our Pound changed package. This carries two advantages we appreciate, it’s shared with the world and we can make use of Launchpad build capabilities.

Pound Load Balancer Configuration

Besides the previous application specific change, our Pound configuration is quite simple and managed from Puppet Enterprise – hence the ruby template syntax ahead. From the defaults, we changed:

  • Use of the dynamic rescaling code. Pound will periodically try to modify the back-end priorities in order to equalize the response times from the various back-ends. Although our backend servers are all exactly the same, they are deployed as virtualised instances on a “public” cloud at Softlayer, so can independently suffer from performance impact on the host.
    DynScale 1
  • A “redirect service”. Anything not *serverdensity.com* is immediately redirected to http://www.serverdensity.com and doesn’t even hit the back-end servers.
    Service "Direct Access"
     HeadDeny "Host: .*serverdensity.com.*"
    Redirect "http://www.serverdensity.com"
    End
  • Each back-end relies on a co-located mongo-s router process to reach our Mongo data cluster.  We use the HAport configuration option to make sure the back-end is taken out of rotation when there is a problem with the database and the webserver is still responding on port 80.
    HAport <%= hAportMongo %>
  • Finally, because we have a cluster of load balancers, we needed to be able to trace which load balancer handled the request. For this we add an extra header.
 AddHeader "X-Load-Balancer: <%= hostname %>"

Redundancy and automated failover

The SoftLayer Global IP (GIP) is the key to give us the failover capability with minimum lost connections to our service. By deploying two load balancers per data center, being targeted by a single GIP, we can effectively route traffic to any load balancer.

We deploy the load balancers in an active-standby configuration. While the active is being targeted by the GIP, receiving all traffic, the standby load balancer monitors the active health. If the active load balancer stops responding (ICMP, HTTP or HTTPS) the GIP is automatically re-routed to the standby load balancer using the SoftLayer API. This situation is then alerted through a PagerDuty, also using their API, to allow the on-call engineer to respond. There’s no automatic recover attempt to avoid flapping and to allow investigation of the event.

Next

For the upcoming version of Server Density, we’ll be deploying nginx because we’ll be using Tornado. Another blog post will be in order by then.

We’ll also be presenting the integration of Pound and this automation with Puppet Enterprise at PuppetConf 2012 on September 27th and 28th.

  • Why Pound and not HAProxy ?

    • pessoa

      We had previous experience with Pound and needed to deploy quickly.

    • HAProxy has a lot more features but Pound is simpler so easier to learn and deploy. We don’t have any complex requirements right now.

    • Looks like HAProxy only just started to allow SSL Offloading in the latest dev release. We use SSL offloading in Pound so our web servers just serve HTTP and the LB deals with SSL. Simplifies things.

      http://blog.exceliance.fr/2012/09/04/howto-ssl-native-in-haproxy/

  • Hi David,

    We also use SoftLayer, have two proxy servers and try to do it high available. I contact with support and their said that GIP doesn’t automatically switch between servers and switch will take about several minutes. Could you write more about how you use GIP? Did you setup some application which ping proxy server and switch GIP on it crash?

    Thanks, Pavel

    • Oops, I think I found answer on SL forum
      https://forums.softlayer.com/showthread.php?p=49450#post49450
      a
      Sorry for flood. And many thanks for great article.

    • We wrote a script which has a number of heuristics. We use internal ping and HTTP request plus an external monitoring service (Pingdom) to determine whether we should failover. If we should, we use the Softlayer API to do this then trigger an alert through the PagerDuty API so we can investigate. Importantly, we only ever trigger 1 failover then get a human to check everything is ok (on-call is immediately paged). This allows us to have automated failover to reduce downtime and avoids flapping. See http://blog.serverdensity.com/avoiding-flapping/

      Softlayer advertise that the switch of the IP may take a few minutes for the routing to update. In all our tests and real failover scenarios it always updates instantly. I think they’re being conservative to allow for routers to update across their worldwide data centres.

  • Assos Panos

    Hi David,
    I would like to ask if there is any guide, or if you can guide us how to cluster the load balancers ex. in case we have 2 LBs.
    Thank you.

    • We deploy the load balancers in an active-standby configuration. While the active is being targeted by the GIP, receiving all traffic, the standby load balancer monitors the active health. If the active load balancer stops responding (ICMP, HTTP or HTTPS) the GIP is automatically re-routed to the standby load balancer using the SoftLayer.

      We wrote a custom script to do this so don’t have any guides on how to do it I’m afraid.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time