この記事は日本語でもご覧頂けます。 詳細はこちら。

The tech behind our time series graphs – 2bn docs per day, 30TB per month

MongoDB Tech behind graphs

By David Mytton,
CEO & Founder of Server Density.

Published on the 14th March, 2014.

This post was also published on the MongoDB Blog.


Server Density processes over 30TB/month of incoming data points from the servers and web checks we monitor for our customers, ranging from simple Linux system load average to website response times from 18 different countries. All of this data goes into MongoDB in real time and is pulled out when customers need to view graphs, update dashboards and generate reports.

We’ve been using MongoDB in production since mid-2009 and have learned a lot over the years about scaling the database. We run multiple MongoDB clusters but the one storing the historical data does the most throughput and is the one I shall focus on in this article, going through some of the things we’ve done to scale it.

1. Use dedicated hardware, and SSDs

All our MongoDB instances run on dedicated servers across two data centers at Softlayer. We’ve had bad experiences with virtualisation because you have no control over the host, and databases need guaranteed performance from disk i/o. When running on shared storage (e.g a SAN) this is difficult to achieve unless you can get guaranteed throughput from things like AWS’s Provisioned IOPS on EBS (which are backed by SSDs).

MongoDB doesn’t really have many bottlenecks when it comes to CPU because CPU bound operations are rare (usually things like building indexes), but what really causes problem is CPU steal – when other guests on the host are competing for the CPU resources.

The way we have combated these problems is to eliminate the possibility of CPU steal and noisy neighbours by moving onto dedicated hardware. And we avoid problems with shared storage by deploying the dbpath onto locally mounted SSDs.

I’ll be speaking in-depth about managing MongoDB deployments in virtualized or dedicated hardware at MongoDB World this June.

2. Use multiple databases to benefit from improved concurrency

Running the dbpath on an SSD is a good first step but you can get better performance by splitting your data across multiple databases, and putting each database on a separate SSD with the journal on another.

Locking in MongoDB is managed at the database level so moving collections into their own databases helps spread things out – mostly important for scaling writes when you are also trying to read data. If you keep databases on the same disk you’ll start hitting the throughput limitations of the disk itself. This is improved by putting each database on its own SSD by using the directoryperdb option. SSDs help by significantly alleviating i/o latency, which is related to the number of IOPS and the latency for each operation, particularly when doing random reads/writes. This is even more visible for Windows environments where the memory mapped data files are flushed serially and synchronously. Again, SSDs help with this.

The journal is always within a directory so you can mount this onto its own SSD as a first step. All writes go via the journal and are later flushed to disk so if your write concern is configured to return when the write is successfully written to the journal, making those writes faster by using an SSD will improve query times. Even so, enabling the directoryperdb option gives you the flexibility to optimise for different goals e.g. put some databases on SSDs and some on other types of disk (or EBS PIOPS volumes) if you want to save cost.

It’s worth noting that filesystem based snapshots where MongoDB is still running are no longer possible if you move the journal to a different disk (and so different filesystem). You would instead need to shut down MongoDB (to prevent further writes) then take the snapshot from all volumes.

3. Use hash-based sharding for uniform distribution

Every item we monitor (e.g. a server) has a unique MongoID and we use this as the shard key for storing the metrics data.

The query index is on the item ID (e.g. the server ID), the metric type (e.g. load average) and the time range but because every query always has the item ID, it makes it a good shard key. That said, it is important to ensure that there aren’t large numbers of documents under a single item ID because this can lead to jumbo chunks which cannot be migrated. Jumbo chunks arise from failed splits where they’re already over the chunk size but cannot be split any further.

To ensure that the shard chunks are always evenly distributed, we’re using the hashed shard key functionality in MongoDB 2.4. Hashed shard keys are often a good choice for ensuring uniform distribution but if you end up not using the hashed field in your queries, could actually hurt performance because then a non-targeted scatter/gather query has to be used.

4. Let MongoDB delete data with TTL indexes

The majority of our users are only interested in the highest resolution data for a short period and more general trends over longer periods, so over time we average the time series data we collect then delete the original values. We actually insert the data twice – once as the actual value and once as part of a sum/count to allow us to calculate the average when we pull the data out later. Depending on the query time range we either read the average or the true values – if the query range is too long then we risk returning too many data points to be plotted. This method also avoids any batch processing so we can provide all the data in real time rather than waiting for a calculation to catch up at some point in the future.

Removal of the data after a period of time is done by using a TTL index. This is set based on surveying our customers to understand how long they want the high resolution data for. Using the TTL index to delete the data is much more efficient than doing our own batch removes and means we can rely on MongoDB to purge the data at the right time.

Inserting and deleteing a lot of data can have implications for data fragmentation but using a TTL index helps because it automatically activates PowerOf2Sizes for the collection, making disk usage more efficient. Although as of MongoDB 2.6 this storage option will become the default.

5. Take care over query and schema design

The biggest hit on performance I have seen is when documents grow, particularly when you are doing huge numbers of updates. If the document size increases after it has been written then the entire document has to be read and rewritten to another part of the data file with the indexes updated to point to the new location, which takes significantly more time than simply updating the existing document.

As such, it’s important to design your schema and queries to avoid this, and to use the right modifiers to minimise what has to be transmitted over the network and then applied as an update to the document. A good example of what you shouldn’t do when updating documents is to read the document into your application, update the document, then write it back to the database. Instead, use the appropriate commands – such as set, remove, and increment – to modify documents directly.

This also means paying attention to the BSON data types and pre-allocating documents, things I wrote about in MongoDB schema design pitfalls.

6. Consider network throughput & number of packets

Assuming 100Mbps networking is sufficient is likely to cause you problems, perhaps not during normal operations but probably when you have some unusual event like needing to resync a secondary replica set member.

When cloning the database, MongoDB is going to use as much network capacity as it can to transfer the data over as quickly as possible before the oplog rolls over. If you’re doing 50-60Mbps of normal network traffic, there isn’t much spare capacity on a 100Mbps connection so that resync is going to be held up by hitting the throughput limits.

Also keep an eye on the number of packets being transmitted over the network – it’s not just the raw throughput that is important. A huge number of packets can overwhelm low quality network equipment – a problem we saw several years ago at our previous hosting provider. This will show up as packet loss and be very difficult to diagnose.


Scaling is an incremental process – there’s rarely one thing that will give you a big win. All of these tweaks and optimisations together help us to perform thousands of write operations per second and get response times within 10ms whilst using a write concern of 1.

Ultimately, all this ensures that our customers can load the graphs they want incredibly quickly, whilst behind the scenes we know that data is being written quickly, safely and that we can scale it as we continue to grow.

Free eBook: The 9 Ingredients of Scale

From two students with pocket money, to 20 engineers and 80,000 servers on the books, our eBook is a detailed account of how we scaled a world-class DevOps team from the ground up. Download our definitive guide to scaling DevOps and how to get started on your journey.

Help us speak your language. What is your primary tech stack?

What infrastructure do you currently work with?

  • ebottabi

    Hey David, good points here. I believe one of the reasons i had hard times scaling mongodb. Thanks for sharing.

  • Avi

    It is so confusing what to choose… the WEB has tones of pros and cons articles about MONGO, cassandra has only good points, but not as much….

  • Charity Majors

    great piece David! I would also note that too-high blockdev readahead settings are the quickest way to saturate a nic.

  • Very interesting stuff here. Just a note on the links though, they don’t seem to work.

    • Fixed, thanks!

      • Cool. Also just noticed, not sure if it’s something you wrote yourself or a third party sharing tool, +1 works but if you try to share from it, as soon as you leave the “share this page” mouse over, it disappears not really allowing you to share with Google+ unless you tab navigate the form. Where Tweet pops up a new window so isn’t affected by it.

        • Thanks, we’ll take a look at fixing that too!

  • Very useful, thanks, mongodb is greate! We deployed our sharded cluster recently, and found that the balance process was really really slow.

    • The balancer will throttle to avoid affecting performance where possible, so you may be at the limits of your hardware if it’s really slow and things are becoming unbalanced.

      • Yes, maybe that’s because our disk io, this disk utilization is alway around 90%, and it’s also the reason why we shard, when the shards are balanced, it should be ok.

        BTW, could you introduce some automatic tools or scripts you use in your production for deploy and maintain mongo sharded cluster, in our production, we use fabric and puppet.
        and how do you usually maintain and backup your cluster?

        Thanks a lot.

        • 90% is too high – you have no headroom for migrations. If you need to add a new shard it’s going to take a long time to migrate the data to it and that could be too late if you need to quickly add a new shard.

          We use Puppet to manage everything too. Our backups are currently full mongodumps from the mongos at varying intervals depending on the importance of the data. We take the assumption that if we have to restore a backup it’s because there’s a catastrophic failure so we would be restoring the entire cluster, not a single server.

          • Thanks, before we deployed the sharded cluster, it’s 80% approximately, now it’s 90% because of migrating data, we should deploy earlier, fortunately it’s not too late, just the migrate process is slow.

          • Are you migrating data off the shard with 90% disk i/o utilisation then? If not then you need to add another shard.

          • Yes, it’s 90% when migrating data because of adding a new shard. Thanks!

  • Yeh that’s correct – it has the same effect.

  • Van T Nguyen

    Hi thanks for the post. May I ask how many shards you have in the cluster and how large is a document?

    • We have 4 shards but we split up the data into 2 databases:

      – Hours. This is the hourly data we keep forever. We insert 1 document per metric per server per day and it gets updated with the latest value every minute. We use sum and increment to store the number of updates and the sum of the value for those updates, then use that to calculate the mean value. This means the document is quite large at an average of 11470 bytes.
      – Minutes. This is 1 document per metric per server per minute so is written frequently (inserts) but never updated. They are smaller documents at an average size of 405 bytes.

      • Van T Nguyen

        Hi David. According to this, you mean you’re not saving 30TB/month to the mongodb cluster but you’re processing these large number of documents and update the pre-aggregation value to database, is it correct?

        Because there are too many requests, I’m curios to know whether you’re using some kind of queuing platform for those messages?

        Thank David.

        • That’s right, we’re not storing 30TB a month – currently it’s about 2TB in total size.

          We don’t do any queueing at the moment. The request volume is very high but it’s all exposed internally over HTTP APIs so we can easily load balance it. Queuing is something we’ll look at if we need to.

      • Floris Weerman

        Hi David, are you storing the hourly and minute data each into a single collection on a separate DB, or are you using separate collections for each metric?

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time