Server monitoring that doesn't suck.

See for yourself

Blog   »   Cloud   »   Saving $500k per month buying your own hardware – cloud vs colocation

Saving $500k per month buying your own hardware – cloud vs colocation

Cloud vs colocation

This post was originally published on GigaOm on 7th Dec.

Last week I compared cloud instances against dedicated servers showing that for long running uses such as databases, it’s significantly cheaper if you do not use the cloud, but that’s not the end of it. Since you are still paying on a monthly basis then if you project the costs out 1 or 3 years, you end up paying much more than it would have cost to purchase the hardware outright. This is where buying your own hardware and colocating it becomes a better option.

Continuing the comparison with the same specs for a long running database instance, If we price a basic Dell R415 with x2 processors each with 8 cores, 32GB RAM, a 500GB SATA system drive and a 400GB SSD, then the one time list price is around $4000 – more than half the price of the SoftLayer server at $9,468/year in the previous article.

Dell PowerEdge R415 front

Dell PowerEdge R415 front

Of course, the price you pay SoftLayer includes power and bandwidth and these are fees which depend on where you locate your server. Power usage is difficult to calculate because you need to actually stress test the server to figure out the maximum draw and then run real workloads to see what your normal usage is.

My company, Server Density, just started experimenting with running our own hardware in London. We tested our 1U Dell with very similar specs as discussed above was using 0.6A normally but stress tested with everything maxed out to 1.2A. Hosting this with the ISP who supplies our office works out at $161/month or $1932/year (it would work out cheaper to get a whole rack at a big data centre but this was just our first step).

This makes the total annual cost look as follows:

serverdensitychart

Remember, again, that this is a database server so whilst with Rackspace, Amazon and SoftLayer you pay that price every year, after the first year with colocation the annual cost drops to $1932 because you already own the hardware. Further, the hardware can also be considered an asset which has tax benefits.

Server Density is still experimenting at on small scale but I spoke to Mark Schliemann VP of Technical Operations at Moz.com because they run a hybrid environment. They recently moved the majority of their environment off AWS and into a colo facility with Nimbix but are still using AWS for processing batch jobs (the perfect use case for elastic cloud resources).

Moz worked on detailed cost comparisons to factor in the cost of the hardware leases (routers, switches, firewalls, load balancers, SAN/NAS storage & VPN), virtualization platforms, misc software, monitoring software/services, connectivity/bandwidth, vendor support, colo and even travel costs. Using this to calculate their per server costs means on AWS they would spend $3,200/m vs $668/m with their own hardware. Projecting out 1 year results in costs of $8,096 vs AWS at $38,400.

Moz’s goal for the end of Q1 2014 is to be paying $173,000/month for their own environment plus $100,000/month for elastic AWS cloud usage. If they remained entirely on AWS it would work out at $842,000/month.

Optimizing utilization is much more difficult on the cloud because of the fixed instance sizes. Moz found they were much more efficient running their own systems virtualized because they could create the exact instance sizes they needed. Cloud providers often increase CPU allocation alongside memory when in real world uses you tend to need one or the other. Running your own environment allows you to optimize this and was one of the big areas Moz have used to improve their utilization. This has helped them become much more efficient with spend.

Right now we are able to demonstrate that our colo is about 1/5th the cost of Amazon but with RAM upgrades to our servers to increase capacity we are confident we can drive this down to something closer to 1/7th the cost of Amazon.

Colocation has its benefits once you’re established

Colocation looks like a winner but there are some important caveats:

  • First and foremost, you need in-house expertise because you need to build and rack your own equipment and design the network. Networking hardware can be expensive and if things go wrong, you need to have the knowledge about how to deal with the problem. This can involve support contracts with vendors and/or training your own staff. However, this does not usually require hiring new people because the same team that has to deal with cloud architecture, redundancy, failover, APIs, programming, etc, can work on the ops side of things running your own environment.
  • The data centers chosen have to be easily accessible 24/7 because you may need to visit at unusual times. This means having people on-call and available to travel, or paying remote hands at the data center high hourly fees to fix things.
  • You have to purchase the equipment upfront which means large capital outlay but this can be mitigated by leasing.

So what does this mean for the cloud? On a pure cost basis, buying your own hardware and colocating it is significantly cheaper. Many will say that the real cost is hidden with staffing requirements but that’s not the case because you still need a technical team to build your cloud infrastructure.

At a basic level, compute and storage are commodities. The way the cloud providers differentiate is with supporting services. Amazon has been able to iterate very quickly on innovative features, offering a range of supporting products like DNS, mail, queuing, databases, auto scaling and the like. Rackspace has been slower to do this but is now starting to offer similar features.

Flexibility of cloud needs to be highlighted again too. Once you buy hardware you’re stuck with it for the long term but the point of the example above was that you had a known workload.

Considering the hybrid model

Perhaps a hybrid model makes sense, then? This is where I believe a good middle ground is and we can see Moz making good use of such a model. You can service your known workloads with dedicated servers and then connect to the public cloud when you need extra flexibility. Data centers like Equinix offer Direct Connect services into the big cloud providers for this very reason, and SoftLayer offers its own public cloud to go alongside dedicated instances. Rackspace is placing bets in all camps with public cloud, traditional managed hosting, a hybrid of the two and support services for OpenStack.

And when should you consider switching? Dell(s dell) cloud exec Nnamdi Orakwue said companies often start looking at alternatives when their monthly AWS bill hits $50,000 but is even this too high?

  • Vincent Janelle

    Are you including the costs of having your own DC team(s), hardware upgrades (3 year cycles of depreciation), bandwidth, network admins, maintaining multiple points of presence, etc?

    • http://blog.serverdensity.com/ David Mytton

      Yes. The Moz costs include all that.

      • Vincent Janelle

        Ah, because some of the software/hardware renewal costs I have exceed that amount, even at low volumes. Seems a bit low, for annual pricing :)

  • mzzs

    a 500GB SATA system drive and a 400GB SSD

    Stopped reading, no credibility. If you’re going to talk about on-prem vs. cloud, compare apples to apples. A server with a single (SATA) OS disk and a single data disk is amateur stuff.

    Edit: Yes, you can build application-level redundancy so that the hardware underneath doesn’t matter. Yes, there are use cases for no RAID like the QA example below. No, just because Google does something doesn’t mean that the average company should do it. Google saves millions on hardware, but they’ve spent millions in engineering effort to do so and use in-house filesystems, cluster tools, etc. You’re not Google.

    Azure, AWS, and others have availability at the VM level so that you don’t know if their hardware underneath your instance fails, it keeps running (unless you’re in US-EAST-1, ;) ). A single colocated server doesn’t have this hardware independence that these cloud-based services provide. There’s value to that which isn’t calculated here.

    This article talking about renting a single cloud instance vs purchasing and colocating a single server. By having a single server with a single disk, you need to buy more than one of them to get some level of redundancy and your application has to support it at the application layer. This throws off all of the calculations in the article. Seriously guys, it’s silly.

    • http://blog.serverdensity.com/ David Mytton

      This article is a continuation from https://blog.serverdensity.com/cloud-pricing-vs-dedicated-pricing-cheaper/ where I was specifically comparing the cloud compute instances to dedicated. This spec is completely valid for a large number of workloads from database servers to tools servers. Not every long running workload requires storage level redundancy.

      Comparing the compute costs is simplified by looking at on-instance storage, which is a completely legitimate way to get good performance and run databases when you get redundancy from having multiple nodes, especially if you couple this with deploying across zones/regions. You assume multi-server redundancy is something you have to implement yourself in your application. This is an example workload for databases and they’re very good at dealing with replication and failover, which you’d need anyway. So it’s no extra work.

      You’re expecting cloud providers to have magical redundancy on the host level so if a disk fails then you can simply migrate to a new one with no impact. This is a big misconception with the cloud – that it handles redundancy and scaling magically. That’s simply not true and you have to consider host level failures, which happen frequently.

      Your criticism would be partially valid if that’s where I’d stopped, but I went into much more detail through the Moz example, where their costs do consider single server level redundancy as well as multi-server, multi-region, etc. The cost analysis with a single server is indicative and is a good, simple example of the cost differences, but is just the introduction to the article. If you’d continued reading you could’ve learned more about what Moz are doing and what the various tradeoffs were with colo.

      • masasuka

        as someone who works for a company that maintains an enterprise level cloud environment for hundreds of thousands of customers, this statement : “ou’re expecting cloud providers to have magical redundancy on the host
        level so if a disk fails then you can simply migrate to a new one with
        no impact. This is a big misconception with the cloud” is beyond wrong.

        A ‘proper’ (amazon has this, google, microsoft, etc…) cloud setup (including the one we run) has redundancy, meaning that if a host drive dies, another host picks up the slack, this happens automatically, and takes less than 10 seconds to kick over. Storage in the cloud is floating, meaning that any host can access it on any drive array. If a Front end host dies, the controller takes it out of the ‘load balanced’ group, and the load of your site’s control is handled by another host. If one of the controllers dies, then the auto failover kicks in and the backup host takes over the switching job. The ONLY ways to take down a good cloud system are to either over flood it with network traffic, (rather hard to do, but as amazon has had this happen, it’s possible) knock out the power (if this happens, it doesn’t matter if you’re in the cloud, on a rented server, or a colo, you’re server is down), or kill the network, (again, as with power, you’re out a server regardless of type).

        On to the actual article. One of the things you pay for with a Dedicated server as opposed to a colocation server, is hardware , and software ‘always on’ guarantees. If you have a colocation, and a drive dies, you have to go to a store, buy a new drive (hopefully they have them in stock), then head to the datacentre, and replace the drive. If you have a dedicated server, you only have to let the guys know the best time to have the raid rebuild (usually a time when the load on the server is low, and the disk I/O of a raid rebuild wouldn’t impact performance). If you have a non-technical company, then you also have to factor in the costs of a systems admin as well as a website developer for a colocated unit, but for dedicated servers, the company you purchase from will provide os management and ensure it stays up and running. Also, you didn’t mention if you include OS licensing costs (redhat, windows, ubuntu advantage, etc..) as well as additional services that may be offered for free (backups, external firewall, load balancing, dns management, etc…)

        • http://blog.serverdensity.com/ David Mytton

          You might be describing how your cloud works but that’s not how EC2 instance work and it’s not how Google Compute Engine instance work (in Europe, the US ones do have live migrate).

          You’re also confusing compare the compute instance local storage with SAN based network storage. The comparison was specifically with local instance storage on the host itself with no network communication because that’s most optimal for database workloads.

          All the points you mentioned about dedicated vs colo are mentioned in the article and are valid. You do have to be responsible for hardware and maintenance which is part of what you’re paying for from a dedicated provider – those replacement time guarantees, spare parts, etc.

          One point I did concede on the original article comments posted at GigaOm was that both Server Density and Moz are technical companies so already have tech teams. If you’re a non-tech company then it’s more difficult because you have to hire the team to run things (you’d have to do this anyway with the cloud). Dedicated is more appropriate here as you can pay for managed service so you don’t have to do anything.

          The costs for licenses and sysadmins were included in the Moz costs. Again, my single server example was supposed to compare the pure compute costs and there was an academic discussion of all the extra costs because they’re much more difficult to compare like for like. So that’s why I included the Moz analysis which did consider absolutely everything, vs AWS.

        • Dan

          That’s absolutely not how Amazon/Rackspace/Google works. When an Amazon server dies your instance is gone. There is no redundancy unless you design it into your application

  • cloudy mcjones

    If you are running an instance for a year you’d be silly to pay on-demand rates. Amazon’s reserved instance rates significantly cut costs up to 70% which definitely makes it a more cost effective solution.

    You say that electricity is hard to quantify but it really isn’t because you provision and pay for the maximum you can pull. Those costs are fixed and can be easily quantified. Sure, this changes if you pay metered but you still pay for the outlet coming into your rack.

    The server, ultimately, isn’t the real discussion here. It is what is running on the server that is important. You run your business on the server. You run applications on the server. When you have an idea for something new, you want to quickly spin up that idea into an application and see how well it work. If it doesn’t work, you want to turn it off and cut your losses. If it does work then you need to, hopefully, quickly scale it. This doesn’t work in a hardware model. Also, these applications require a proper SDLC environment. So when you buy one, you usually buy two or three or four (dev, QA, stage, production). When your developers go home at night, your hardware runs. In the cloud, you turn the environment off and don’t pay for it.

    Some folks are also not looking to optimize for cost. Optimizing for agility is sometimes paramount. You can’t be very agile on hardware.

    Looking at it on a unit-by-unit basis is too simplistic and doesn’t really get to the root of why you’re buying that server or instance. You are buying these things because you have to run an application and solve a business problems. These business problems are not the same and require different approaches. Simplifying it to say that when you spend $50,000 per month you want to move to hardware is silly. Talk to Netflix, airbnb, pinterest, etc and ask them why they’re still in the cloud when their spend is way more than $50,000 per month.

    • http://blog.serverdensity.com/ David Mytton

      You’re correct that if you’re doing long running workloads then you’d use the reserved pricing, which is what my figures are from. See the first article: https://blog.serverdensity.com/cloud-pricing-vs-dedicated-pricing-cheaper/

      It’s easy to say how much you pay per unit of electricity but it’s not easy to know how many units you’re using. And units can vary between facilities – could be kWh could be Amps, etc. The difficulty is you have to run your actual workload on the hardware. You don’t pay the maximum you could ever draw, it’s some calculation from that. For example Equinix charge 70% of what your maximum is. There’s capacity reservation fees, utilisation fees and in the UK at least, carbon offset charges.

      A unit by unit comparison is supposed to be simplistic, to show the base compute costs. That’s why I included the real world figures from Moz, too. They consider everything.

      You’re correct about the hardware flexibility – that’s a perfect use case for the cloud. A new project with unknown requirements and for thing with truly flexible workloads e.g. processing or handling traffic spikes.

  • Mark

    First off I would disagree with your assessment that you
    need to use Amazon’s dedicated host option for a database to be able to utilize
    the IOPS from your SSDs. Amazon puts a lot of effort to prevent a noisy
    neighbor from affecting your performance. Dedicated hardware in my experience
    is more commonly used to support audit requirements.

    Secondly, your pricing is only using a 1 yr reserve, when
    your article is discussing keeping the physical hardware for a long time. It seems
    like it would make more sense to do your calculations with a 3 year reserve.

    So if you go with a 3 year heavy reserve w/o dedicated
    hardware, you are looking at: $4,185.99 a year (5804 upfront / 3 year) +
    ($0.257 hourly rate * 8760 hours in a year)

    Note: This is not currently including AWS bandwidth charges,
    because those would vary based on load and architecture.

    When you priced out the Dell sever, does that include
    support costs on the hardware? If so how long is the support contract, and what
    does it cover?

    For the sake of this comparison let’s assume that the dell
    server comes with a top tier support with quick onsite support, and replacement
    of failed hardware. This means over a 3 year period AWS will cost $2,761.96
    more than hosting the physical hardware.

    So for an annual savings of $920.65 you can have a physical
    server, that can never change size. If the hardware dies, you will have to wait
    for that hardware to be fixed, before you can use it again. Since this pricing
    does not allow for spare hardware. You have to proactively know how much
    hardware you will require, and it will take at least 2 weeks? For dell to deliver
    and you to configure the hardware in your data center.

    You will also need networking equipment, which I don’t know
    if that was included in your Colocation price. You will need someone who knows
    how to configure the networking for your environment. The argument that a cloud
    admin could / would know how to configure the network and other components of a
    physical data center is a bit off the mark. They are two very different skill
    sets.

    At the end of the day, all of the additional work and time
    your employees will have to spend configuring and setting up the on premise
    servers, will more than likely cost you more than the $920.65 difference in
    price between AWS and on premises.

    I’m not sure if Moz’s figures were also using dedicated
    hardware, and 1 year reserves for AWS, but if they are I don’t see that
    comparison as very valid either.

    We also have not discussed how you are backing up your
    database, and how much that storage is going to cost you.

    All in all, you seem to be taking a very narrow view of the
    pricing difference between on premise on cloud pricing.

    • http://blog.serverdensity.com/ David Mytton

      It’s true that the AWS pricing is 1year and that’s because that’s what I was looking at in the first article before I had the Moz figures: https://blog.serverdensity.com/cloud-pricing-vs-dedicated-pricing-cheaper/

      However, the Moz figures are based on 3 years for their hardware and for the AWS comparisons. They also include all the support costs, hardware, replacements, etc. So I think that addresses all your points.

  • http://www.scriptcrafty.com/ david karapetyan

    It would be more interesting if you had more numbers on all the other infrastructure, e.g. networking equipment, and expertise required to run your own virtual compute infrastructure on top of dedicated hardware. You sweep that under the rug but networking in general is not easy and a misconfigured router here and there can be disastrous. You also need to know enough to make sure there are no security holes in how you are virtualizing the network and all the virtual machines that will run on it. Putting all that together no longer paints such a rosy picture.

    Docker and other lightweight virtualization technologies can go a long way towards mitigating some of these issues but there is still no way around getting the right people and expertise to manage all that hardware.

    • http://blog.serverdensity.com/ David Mytton

      Networking is a lot more difficult to compare directly because AWS are at such a massive scale they need completely different networking infrastructure than if you were deploying your own setup in a colo facility. But numbers are useful to have and I’ll be posting the figures from the setup we choose as we continue the colo experiment at Server Density.

      That said, the Moz figures do include everything, especially networking and the team requirements. So it is considered, just not in separated into the component costs.

  • Chris Beck

    David, I can 100% agree with this. We moved from AWS to dedicated and saved a fortune in the process. This was cloud to dedicated, but colo would provide even further savings. We have 10TB’s of SSD on 20 servers each with 72GB ram + 4 application servers all good specs, 1 gigabit dedicate line out, 48 port dedicated switch and LB for ~$4000/mo.

    • http://mag.entropy.be mag

      Interesting, can you recommend a good dedicated servers provider? with modern hardware and SSDs

      • http://blog.serverdensity.com/ David Mytton

        We are currently with Softlayer so I’d recommend them.

  • Martino Io

    I fully agree with the article, while there are many concepts explained in the wrong way, colocation is still the best option as long as you have qualified technical skills to manage your own hardware+software. As long as you satisfy such requirements you will always save incredible amount of money, as the cloud providers usually have incredible margins (even if prices are falling down continuously) and of course they don’t work for free.
    Hardware can be obtained from refurbished stocks for incredibly low prices (slightly decreasing computing density), implementing OSS software with support contracts from developers will give you first class support for a fair price.
    I manage such infrastructure, currently 4 racks (1 full of storage, 3 of compute nodes) and the expenses we pay are peanuts compared to any “managed – cloud – paas whatever you want” contracts.
    70TB of storage over FC, 550 Xeon cores 2.8Ghz each, 10TB or RAM and around 500 active VMs at current time average expenses for colo, including bandwidth and power are 75K euros per year (on average), I don’t even want to know how much it would be on a hosted infra…

    • http://blog.serverdensity.com/ David Mytton

      Which concepts aren’t correctly explained?

      • Martino Io

        For instance the numbers in the comparison of the 1U server, if you want a fair comparison then compare apples with apples; have a fully working minimal deployment of Open Stack and compare against it, you will definitely see more interesting numbers.
        You know It’s like comparing storage prices (per Gb), enterprise storage boxes from EMC/IBM vs USB HDD; well both contain data and of course if you would to use USB HDD the price per Gb will be probably thousand times less, although they both achieve the same goal of storing the data.
        Then you do your maths assuming that the hardware will be leased, while this is a viable option today, there are businesses that employ either second hand HW or they just buy what minimally accomplishes the task and then add more resources later; this should shift the maths toward calculating the TCO (and compare against) for a period of 3 to 5 years.

  • dtooke

    A hybrid model works best for us. The cloud is very convenient but dedicated servers will save money in the long run. You have more control with dedicated and don’t have mysterious system reboots. I would like to cost comparison including support.

  • https://www.colocationauthority.com/ Tory Ziebell

    Most data centers have different resiliency than the web-scale
    architectures deployed at Google, Amazon, Facebook, and eBay. Start by
    investigating and documenting the resiliency of the enterprise,
    colocation, or cloud deployment. Make sure to align server, network, and
    storage system resiliency with the appropriate data center Tier rating.