MongoDB Benchmarks

MongoDB Benchmarks

By David Mytton,
CEO & Founder of Server Density.

Published on the 18th April, 2013.

There are no official MongoDB benchmarks because the developers don’t believe they accurately represent real world usage. This is true because you can only really get an idea of performance when you’re testing your own queries on your own hardware. Raw figures can seem impressive but they’re not representative of how your own application is likely to perform. Benchmarks are useful for indicating how different hardware specs might perform but are really only worth it if you use real world queries.

For Server Density v2 I have been benchmarking MongoDB with different tweaks so we can get maximum performance for our high throughput clusters, but make cost savings for our less important systems. A lot has been said about various choices of write concern, deploying to SSDs and replication lag but there aren’t really any numbers to base your decision on.

This set of MongoDB benchmarks is not about the absolute numbers but is designed to give you an idea of how each of the different options affects performance. Your own queries will differ but the idea is to prove general assumptions and principles about the relative differences between each of the write options.

Test methodology

These MongoDB benchmarks test various options for configuring and querying MongoDB. I wrote a simple Python script to issue 200 queries and record the execution time for each. It was run with Python 2.7.3 and Pymongo 2.5 against MongoDB 2.4.1 on an Ubuntu Linux 12.04 Intel Xeon-SandyBridge E3-1270-Quadcore 3.4GHz dedicated server with 32GB RAM, Western Digital WD Caviar RE4 500GB spinning disk and Smart XceedIOPS 200GB SSD.

The script was run twice, taking the results from the second execution. This avoids slowdown cause by initially allocating files, collections, etc – MongoDB only creates databases when they’re first written which adds a bit of time to the first call but isn’t really relevant in real world usage.


import time
import pymongo
m = pymongo.MongoClient()

doc = {'a': 1, 'b': 'hat'}

i = 0

while (i < 200):

    start = time.time()
    m.tests.insertTest.insert(doc, manipulate=False, w=1)
    end = time.time()

    executionTime = (end - start) * 1000 # Convert to ms

    print executionTime

    i = i + 1

 

This is a dummy document because I’m not trying to simulate a real application here. Document size, number/size of indexes and the type of operation will all play a part in the actual numbers. This is only testing inserts but there are other optimisations you can make with updates, particularly ensuring documents don’t grow. However, this is sufficient for what I’m trying to show in these tests – the relative difference between the write options.

Write concern

The write concern allows you to trade write performance with knowing the status of the write. If you’re doing high throughput logging but aren’t concerned about possibly losing some writes (e.g. if the mongod crashes or there is a network error) then you can set the write concern low. Your write calls will return quickly but you won’t know if they were successful. The write concern can be dialed up to including error handling (the default) so the write will be acknowledged (not necessarily safe on disk).

It’s important to know that an acknowledgement is not the same as a successful write – it simply gives you a receipt that the server accepted the write to process. If you need to know that writes were actually successful one option is to require confirmation the write has hit the journal. This is essentially a safe write to the single node with the option to go further to request acknowledgement from replica slaves. It’s much slower to do this but guarantees your data is replicated.

MongoDB insert() Performance (w flag)

  • w=0 is the fastest way to issue writes, with an average execution time of 0.07ms, max of 0.11ms and min of 0.06ms. This setting disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors to the application.
  • w=1 takes double the time to return, with an average execution time of 0.13ms, max of 0.32ms and min of 0.11ms. This guarantees that the write has been acknowledged but doesn’t guarantee that it has reached disk (the journal), so there is still potential for the write to be lost – there’s a 100ms window where the journal might not be flushed to disk. Setting j=1 protects against this.
  • j=1 (spinning disk) is several orders of magnitude slower than even w=1, with an average execution time of 34.19ms, max of 34.28ms and min of 34.10ms. The mongod will confirm the write operation only after it has written the operation to the journal. This confirms that the write operation can survive a mongod shutdown and ensures that the write operation is durable.
  • j=1 (SSD) is x3 faster than a spinning disk with an average execution time of 11.18ms, max of 11.24ms and min of 11.11ms.
  • There is an interesting ramp up for the initial few queries every time the script is run. This is likely to do with connection pooling and opening the initial connection to the database, whereas subsequent queries can use the already open connection.
  • Some spikes appear during the script execution. This could be the connection closing and being recreated.

This means that you can reasonably use the default w=1 as a safe starting point but if you need to be sure data has gone to a single node, j=1 is the option you need. And for high throughput you can half query times by going down to w=0.

SSD vs Spinning Disk

It’s a safe assumption that SSDs will always be faster than spinning disks, but the question is how much – and is that worth paying for them? The more data you store, the more expensive the SSD will be – higher capacity SSDs are available but they are fairly cost prohibitive. However, MongoDB supports storing databases in directories which can be mounted to their own devices, giving you the option of putting certain databases on SSDs.

Putting your journal on an SSD and then using the j=1 flag is a good optimisation. You need the --directoryperdb config flag and you can then mount the databases on their own disks. The journal is always in its own directory so you can mount it separately without any changes if you wish.

MongoDB insert() Performance (j flag)

Replication

If you specify a number greater than 1 for the w flag then this will require n number of replica slaves to acknowledge the write before the query completes. I tested this in a x4 node replica set with the primary and a slave in the same data centre (San Jose, USA) as the execution script and the remaining x2 nodes in a different data centre (Washington DC, USA).

The average round trip time between the nodes in the same data centre is 0.864ms and between different data centres is 71.187ms.

MongoDB insert() Performance (w > 1 flag)

  • w=2 required acknowledgement from the primary and one of the 3 slaves. Average execution time was 14ms, max of 867ms and min of 1.6ms.
  • w=3 required acknowledgement from the primary plus 2 slaves. Average execution time was 310ms, max of 1329ms and min of 96ms. The killer here is the range in response times, which are affected by network latency + congestion, communication overhead between 3 nodes and having to wait for each one.

Using an integer for the w flag lets MongoDB decide which nodes must acknowledge. My replica set has 4 nodes and I specified 2 and 3 but I didn’t get to choose which ones were part of the acknowledgement. This could be local slaves but could also be remote, which is probably responsible for the range in response times where a remote slave happened to return faster than the local one. More control is possible using tags.

Conclusion

It’s fairly clear that these MongoDB benchmark results validate the general assumptions that SSDs are faster and there is a fairly variable latency involved with replicating over a network, particularly over long distances. What this experiment shows is the differences between the write concern options so you can make the right tradeoff between durability and performance. It also highlights that you can significantly improve performance if you need the journal based durability by adding SSDs.

MongoDB benchmarks raw results

w=0 w=1 j=1
Spinning
j=1
SSD
w=2
Same DC
w=3
Multi-DC
Average 0.07ms 0.13ms 34.19ms 11.18ms 14.26ms 311ms
Min 0.06ms 0.11ms 34.10ms 11.11ms 1.65ms 97ms
Max 0.11ms 0.32ms 34.28ms 11.24ms 867.29ms 1,329ms

MongoDB Benchmarks

mongodb-benchmarks-no-w3

Free eBook: The 9 Ingredients of Scale

From two students with pocket money, to 20 engineers and 80,000 servers on the books, our eBook is a detailed account of how we scaled a world-class DevOps team from the ground up. Download our definitive guide to scaling DevOps and how to get started on your journey.

Help us speak your language. What is your primary tech stack?

What infrastructure do you currently work with?

  • swap that with 100,000 or 1M writes and I think you’ll gain more insight, 200 writes for a doc that tiny isn’t revealing :)

    • Thanks for your comment! 200 writes is sufficient for what I’m trying to demonstrate in this post – the differences between the write concern options, SSDs and replication.

      Ramping up the number of writes has a different effect because it’ll start to demonstrate the differences between faulting and non-faulting writes. This is an interesting test to do because you could see what effect SSDs would have on faulting writes, but is outside the scope of this post. I did something similar in http://blog.serverdensity.com/goodbye-global-lock-mongodb-2-0-vs-2-2/

      The document size is another test and is mostly interesting for updates, because you can then look at updates in place and padding factors. Again, another test to do and something I touched on in http://blog.serverdensity.com/mongodb-schema-design-pitfalls/

      If you do any of your own benchmarks, feel free to comment with the link!

      • Bala Nair

        interesting, but I’d be more interested in seeing the numbers from running the script in parallel – say start at parallellism of 2 and ramp up to 10. Then run that test against a single mongo server vs replica slaves. Our experience is that you get very different numbers from running multiple clients because of lock contention, which drops performance radically. We tried this with both safe saves on and off. Performance of mongo in these kind of insert/update tests was only about half that of mysql and ~ an order of magnitude slower than redis.

        • Good point – client concurrency is important with web apps and is something not tested here. The idea was to illustrate the differences with the write options to help towards estimating real world differences when deciding what you want to use. Client concurrency is a little more difficult because it depends on your load and even your driver (different connection pool sizes, etc).

  • Dan Pasette

    One thing to point out regarding your testing of j=true.

    The default journalCommitInterval is 100ms unless the journal is on a separate volume, in which case it is set to 30ms by default. See: http://docs.mongodb.org/manual/reference/configuration-options/#journalCommitInterval

    When a write operation with j:true is pending, mongod will reduce journalCommitInterval to a third of the set value. So, 33ms or 10ms, depending on where your journal is.

    So, how does this impact your benchmark? Since you are testing SSD’s on a separate volume and spinning disk on the same volume as your data, it means that you are really testing j=true with journalCommitInterval at 30ms vs 100ms. I imagine you would see very similar performance on both spinning and SSD if both were separate drives.

  • Nuno Castro

    Nice work! It would also be very interesting to analyze the impact of this parameters tested with YCSB.

  • CK

    I would suspect that the primary reason for SSD would be for 1) initialization of data into memory when needed and 2) for instances when your indexes and data do not fit neatly into your RAM.

    There are some notable issues with providers out there that ran out of RAM on their mongo systems and ended up suffering horribly. The answer when you just can’t shove any more RAM into those systems is either to find a way to shard [which in our case would be a nightmare until 2.5 features for db auto sharding are done] or SSD.

    Oddly enough, we’ve found that the lifetime of our [enterprise class] SSD drives is substantially larger than that of our similarly classed spinning drives. We’ve replaced 8x more spinning drives. Though most were under warranty, the time lost accounts very quickly for the increased expense of drives.

    The new Intel S3700 SSDs are faster and cheaper than ever before, making it almost a no-brainer for any database system. You can also RAID0 to get the size you need [and performance]. If you’re properly replica’d, failure of a RAID0 should be recoverable within 30-60 minutes.

    • SSDs get more expensive when you go for the higher capacity ones. It’s fairly cheap to get the lower ones like 32GB, 64GB, etc…which admittedly are entirely appropriate for the journal (only needs a few GB). But for storing your entire DB on SSDs like we do with the Server Density graphing backend, we need significantly higher capacity.

  • tinye

    what impact does replication lag across a long distance have on the cluster? i.e. let’s say we have a primary and 10 secondarys locally, then put an additional secondary replica set 1000’s of miles away. Would the replication speed across the WAN govern the entire cluster updates? or can Mongo handle the nodes individually?

    • It doesn’t affect the cluster but it could affect your application depending on what write concern you have set. By default it won’t acknowledge writes to any slaves but you can configure that if you need to know a write was successful and replicated.

  • dougrollins

    Newbie question: I seem to get the same response time (using j=1) whether the database is placed on HDD (15K RPM SAS, direct IO mode, write through) or Enterprise SSD (100GB, 22,000 + 4K RND WRI in steady state). I set which drive to use via mongodb –dbpath datadb for each run, then start mongo.exe (confirming connection is live), then start the python script.

    I expected a much larger difference, so I must be doing something wrong.

    Any suggestions would be appreciated.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time