Introducing the new Server Density infrastructure

By David Mytton,
CEO & Founder of Server Density.

Published on the 10th September, 2010.

Since we moved to Rackspace from our very first servers with Slicehost, our hosting requirements have grown significantly. Using MongoDB has helped us scale fairly well but as some of our users have been telling us, the performance of our server monitoring application, Server Density, has been slow at times. This has been caused by the service growing faster than we can provision new hardware – our MongoDB servers average between 1,500 and 5000 inserts per second.

We’ve been working on deploying an entirely new infrastructure for some time and although it has taken longer than we’d hoped, we’re finally ready to make the transition to an entirely new environment. This has been a lot of work not only negotiating the right deal but also changing our backend to take advantage of the sharding that is now in MongoDB 1.6.

This means we have 2 major changes being implemented at the same time:

  • Migrating to a completely new server environment with a new provider.
  • Migrating the application to use sharding.

It makes sense to do them both together because we’d have to migrate the old data anyway.

We’re moving (back) to the cloud! Kinda.

We currently have physical dedicated servers across 2 data centres with Rackspace. Our new environment is on the Terremark Enterprise Cloud which although is entirely virtual, provides us with functionality you wouldn’t normally associate with the cloud such as hardware load balancers and network based SANs.

Terremark Cloud Architecture

And unlike “public” clouds like EC2 (which we use for testing), with Terremark we’ve purchased an allocation of resources which we can then deploy as we wish.

You might purchase 80GB of RAM and 100Ghz of CPU and then deploy that across 16 VMs with 16GB of RAM each, or 70VMs with 1GB of RAM each. It makes it very flexible so if we see a sudden spike of traffic, we can launch a new web server from our templates and add it into the load balancer within a few minutes, without paying any extra. But there’s also “burst mode” so if we suddenly need more capacity than we’ve bought, it’s available at the push of a button!

Enterprise Cloud Resources

The problem we’ve found with physically dedicated servers is that it’s too slow to provision new machines, and there are performance discrepancies with public cloud providers like EC2. We can combine the advantages of both with a private cloud – flexibility, consistent performance and the support & SLAs that we need.

Security is also very important and given Terremark’s government clients, and having visited their (extremely cool!) data centre in Miami, we’re certain they know what they’re doing. Indeed, we have multiple internal networks allocated by purpose, behind firewalls and completely private from other cloud users.

“We set out to build the most secure data center you’ve ever seen,” said Norm Laudermilch, Managing Director of the NAP of the Capital Region for Terremark. “We set out to find all the security requirements the government and security agencies have, and built this facility to meet or exceed every requirement imaginable.”

– Data Centre Knowledge, “Inside Terremark’s Culpeper Data Fortress

Terremark Miami

We also now have much more redundancy. Replica sets in MongoDB 1.6 are much more robust, allowing for automated failover with multiple nodes. And we have 2 environments with Terremark across 2 of their data centres in the US – one in Miami and the other in Virginia. Privately linked load balanced clouds ensure multiple machine failures won’t take the service offline.

Sharding with MongoDB

Although a very new release, our testing has shown sharding to work extremely well. We will be starting with 3 shards, each one made up of 4 node replica sets (2 servers per data centre), plus 3 config servers. That’s a total of 15 servers in our MongoDB cluster, plus a mongos router process on each of our 7 web servers. Data will be split across shards automatically which will allow for much faster parallel queries and easy scaling – we can just add new shards and data will automatically rebalance.

So what does it mean for you?

Everything will be much faster, the web UI and API will be much more responsive and alerts will be delivered even faster. Our benchmarks showed a huge performance increase – we tested a web + MongoDB cluster half the size of what we will be deploying with over 200 requests per second, with load going no higher than 0.5!

The migration process

We’ve already switched over our website but the main application will be switched over on Monday 13th September 2010. This is what you need to know:

  • For most users, you will not need to make any changes to anything – everything will just work.
  • If you have allowed our IP address for outgoing connections, you will need to add a new rule for our new IP, which is 72.46.232.168.
  • All your settings, servers, alert history, etc will be available right away but historical metric data will not.
  • After the switch is completed on 13th Sept, historical data will be migrated for each account in alphabetical order of the hostname, unless priority migration was requested (see below). This will be migrated gradually over the following week. This is because we have to re-insert all the historical data from our old servers into our new database structure, and there’s a lot of it!
  • Only historical data prior to 13th Sept will be initially unavailable (because it’s on our old servers). Data that is posted to us from 13th Sept will go into the new system and will appear immediately.

Priority historical data migration

If you need access to your historical data then we can perform a priority migration. Otherwise, your data will become available over the week following the 13th Sept. We will be accepting requests for priority historical data migration up until 19:00:00 UTC on 13th September 2010. This is free and available for paid customers only. Just e-mail us to make your request.

Status

We’ll be posting further updates and status reports on our service status blog.

Questions?

We’ve been frustrated by the performance of our application over the last few months (even though it has been much better recently, following a number of upgrades) so we’re excited to get the new environment online. Not only will it dramatically improve our service, but it’s also pretty cool!

Let us know if you have any questions.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time