Server monitoring that doesn't suck.

See for yourself

Blog   »   MongoDB   »   Does everyone hate MongoDB?

Does everyone hate MongoDB?


For a guaranteed surge of traffic and to hit the Hacker News homepage, all you need to do is write about why you hate MongoDB and/or migrated to some other database. We’ve been using it to power our server monitoring service, Server Density, for over 3 years now and so with experience, many of the problems cited in these posts seem like basic mistakes in deployment and understanding.

With any product, if you decide to deploy it to production you need to be sure you fully understand its architecture and scaling profile. This is even more important with newer products like MongoDB because there is less community knowledge and understanding. This is partly the responsibility of the developers using those tools but also the responsibility of the vendor to ensure that major gotchas are highlighted.

When we originally switched to MongoDB back in 2009 one of the plus points was the detailed documentation. It’s even more detailed now and there is a large project underway rewriting the existing wiki docs. 10gen, the company behind MongoDB, also run huge numbers of conferences around the world so others can share knowledge, offer commercial training, free webinars and recently announced free online courses.

However, there seem to have been quite a few “don’t use MongoDB” posts over the last few months so is there actually a real problem with MongoDB itself? Let’s take a look at a few of them to see what the issues were:

I’ll Give MongoDB Another Try. In Ten Years – 24 Sept 2012

Headline problem: Deployed on 32 bit server so was limited to 2GB database. Writes were being silently discarded.

Mistake: Deployed to 32 bit servers without knowledge of the limit. Did not use safe writes and didn’t check for errors after writes.

Comments: The 32 bit limit is noted (perhaps it should be a warning) on the download page but the main problem was the author did not know when writes started to fail. MongoDB uses unsafe writes by default in the sense that from the driver, you do not know if the write has succeeded without a further call to getLastError. This is because one of the often cited use cases for MongoDB is fast writes, which is achieved by fire and forget queries.

There has been much discussion about whether this is a sensible default and here we’ve seen someone caught out by this. I’ve spoken to quite a few people who didn’t understand this so if it isn’t to be changed, the documentation should highlight it. The PHP docs do this but the quick start tutorials for Ruby and Python don’t. With 10gen controlling all official drivers, this inconsistence could be rectified.

Many suggest this default is a good way to get favourable benchmarks but there are no official ones so I don’t think that’s relevant.

Migrating to Riak at Shareaholic – 31 Aug 2012

Problems: Working set needs to fit into memory, global write lock blocks all queries, slave replication not hot.

Comments: Getting your working set in memory is one of the most difficult things to calculate and plan for with MongoDB. There are currently no tools and no visibility from Mongo itself as to which collections are queried the most and there are few hints into what should be considered the working set. This has to be estimated based on your understanding of your query patterns and by looking at the slow query log for queries not hitting indexes, no indexes or seeing which queries produce slower responses (figuring out your working set through inference).

A general guideline is to provide as much RAM as you can to fit all your data plus indexes or if that’s not possible, at least your indexes. But this isn’t much different from other databases – the more memory the better and disk i/o is bad (mitigated by using SSDs). There was no further clarification of how this is different in Riak, which they migrated to.

The global lock in MongoDB <= 2.0 is an oft-cited problem and pre 2.0 it was an issue that required workarounds, such as throttling of inserts. The way MongoDB yielded was improved in 2.0 in a very significant way and this was taken further in 2.2 with the complete removal of the global lock as a step towards more granular concurrency. Saying “that’s fixed in the latest version” is only partly acceptable in the sense that new users don’t need to be worried about this any more but even for older users, we found the problem was usually exaggerated.

Keeping your replicas “hot” is also a difficult problem not unique to MongoDB. Indeed, the recent Github outages included a similar problem with MySQL. This can be worked around by sending queries to your slaves or by using the new touch command in 2.2 (previously you could do this in the filesystem before starting MongoDB).

From MongoDB to Riak – 14 May 2012

This post doesn’t explain why they moved from MongoDB other than some general hand waving about “operational qualities”:

Now we no longer care if one of the nodes kernel panics in the middle of the night; as has happened a few times already. Nagios will email us instead of page us, and over coffee the next morning we’ll fire up IPMI, reboot the machine, and Riak will read-repair as necessary. No longer will we have to do any master-slave song and dance, nor will we fret about performance, capacity, or scalability; if we need more, we’ll just add nodes to the cluster.

This seems to imply they had problems with the replication in MongoDB, in particular how failover happens. We’ve found that replica sets in MongoDB are a very robust way to handle replication and automated failover. We rarely have instances fail but when they have (and when we regularly test failover), this is generally seemless. Failover happens very quickly (within seconds) and all the drivers we use to connect MongoDB handle this internally by reconnecting to the new master. This triggers an alert and we then investigate what happened.

We have also found we can easily scale MongoDB either vertically by adding more resources (memory, SSDs) or by adding new shards. Adding a new shard requires some work to get a new replica set deployed but with all our servers managed using Puppet this doesn’t actually take long.

A year with MongoDB – April 2012

Problems Non-counting b-trees, memory management, uncompressed field names, global write lock, safe off by default, table compaction, hot slaves.

Comments Using OS memory management has its advantages such as maintaining the cache through process restarts and leaving the OS to decide what is best with knowledge of the whole system, but it does mean that some optimisations can’t be implemented by the database itself. I don’t have enough knowledge of the internals to comment further but this can go back to the comments above regarding the difficulties of calculating the working set.

Uncompressed field names is a problem I’ve written about in the past and is an issue for huge data sets where you’re trying to optimise memory usage (working set) because those duplicated field names can take up a significant amount of space.

Compaction remains a problem if you do a large volume of inserts and removes. Compaction is a manual process and blocks (on a database rather than server level in 2.2). MongoDB uses a padding factor if you do a lot of updates to avoid having to move the data on disk but you may need to consider strategies such as pre-populating documents if you know they are going to be updated/grown in the future.

Goodbye MongoDB – no date

This post is much more of a rant rather than a reasoned technical analysis of why certain things do not work. A number of the points are covered above but some are just wrong. For example:

It is still not possible to express arbitrary queries like in SQL using JSON. One would argue: not needed – but in reality there are always cases where you need more complex queries. The only way around is to implement something client-side or use the server-side JS code execution (single-threaded, slow). Having no option to perform an operation comparable to UPDATE table SET foo=bar WHERE…. (which is possibly a low-hanging fruit).

This is the update query with the multi parameter set to true.

There is also a complaint about map/reduce which has historically been a weak point due to the single threaded JS engine. New in MongoDB 2.2 is the aggregation framework which is supposed to offer an easy introduction to analysis of data, although not a map/reduce replacement. I have not used this so cannot comment further on map/reduce.


For Server Density, MongoDB has been an excellent tool. We really understand how it works and it works very well. We use it for many different things including storing historical time series data for server metrics, our core app data store and for simple queuing. It’s also benefited us on a marketing front as we have grown with new MongoDB releases and have been able to talk about them at conferences and user groups.

As more and more people use a technology there will be those who make mistakes, get burnt or find use cases where it’s not suitable. I think that categorises all of the “MongoDB hate” posts and many of the problems are solved in newer versions so need not concern people thinking about using MongoDB for new projects. When you switch technologies it’s often a valid reason at the time but with the fast paced development, such issues are often fixed in the next release.

Both MongoDB and 10gen are incredibly successful with a huge number of deployments, large and small, so what we’re really seeing the hype cycle in action rather than everyone hating MongoDB.

Everyone hates MongoDB

  • jwilson

    I manage a fairly significant MongoDB installation – dozens of 24-core machines – and by far the largest issue I deal with on a daily basis is MongoDB. Even with the very expensive enterprise support package, I have to open way too many JIRA tickets for my liking.

    I suppose the biggest issue I have with it is the lack of thought that seems to go into the overall design of it. For example, if you have only two servers in a replica set and one goes down, the remaining server won’t become primary – and there is *no way to manually force it*. I’ve had arguments with the programmers at 10gen about that and they stick to the party line that it has to remain automated to avoid split-brain.

    How about giving me the tools to make that decision myself? I should be able to configure the cluster to say “these servers can never be primary” as well as “make this host primary now”. But that seems to be against their philosophy.

    There was that fun time when we found a bug that brought our site down to it’s knees for 5 hours because when initializing a replica, it was reporting correct indexes, and running the index command from mongos reported success, but the indexes weren’t actually created.

    Don’t get me started on the insanity that is the config server design. Oh, if one server (of three!) goes down you can’t migrate chunks? Oh, if one server (of three!!!) is down you can’t start mongos? Oh, if you change the name of a config server you need shut down and reboot your entire cluster? That’s seriously messed up.

    How about when 2.2 came out and I wanted to take advantage of the database-level locking (finally, after years of development!) but you can’t just create a db and rename a collection into it, you need to dump it and re-import it, and re-create all the indexes. That meant over 8 hours of downtime.

    Or, how about all those new awesome 2.2 features like data center awareness? Wow, I can finally get rid of those stupid iptables rules to stop a mongos in Europe from connecting to a replica in California! Oh wait – the PHP driver won’t be released until the *end of October*.

    However, probably my biggest concern right now is their arbitrary limit of 12 replicas. Instead of just adding a couple more servers to handle reads, now I need to add *12* (ie. another shard) to maintain read parity, but I now have an unneeded extra primary to handle a write load I don’t have. It’s extremely wasteful and limiting.

    Don’t get me wrong – MongoDB is way faster and easier than MySQL and it’s made our development cycle much faster, and makes changes and updates way easier. But I really genuinely question how ready for primetime it is, based on the constant streams of issues I deal with on a daily basis.

    • Ryan

      The answer to split-brain is an arbiter, plain and simple. The rule of “greater than 50% of nodes” to be working screwed us many times until we added arbiters, but it’s not as simple as we’d hoped.

      There is no “make me master” command, which is annoying, but you can get the same behavior with freeze() and stepDown().

      I have to agree on the ridiculousness of config servers. There are a lot of things wrong with how they’re implemented. Start with the fact that they’re not a replica set, but you have three of them and they share the same data. Then, as you said, if one goes down they all freak out. That can include your DB servers, if they’re in the middle of moving a chunk and can’t confirm if it moved – the DB “fails safe” and shuts down instead of serving possibly stale content. All of that and ANY config change drops all open connections adds up to a huge serving of WTF.

      Complaints and agreement aside, it sounds like you’re doing something wrong. If you have 12 replicas and are read-limited then your hardware isn’t keeping up. We had this problem on a shared storage array, and that was with a 3×3 shard/replica setup. Just having to make 12 copies of EVERY change is not going to work at some point. Split that into 4×3 sets (or better yet, 6×2 with arbiters), and use the mongos router to give you extra read throughput almost for free. Sure, you have extra write capacity, but don’t bother splitting reads and writes at that point, use 12x hosts in quarters (or sixths) to get more capacity. Obviously YMMV, but we had nothing but trouble until we understood ALL the bottlenecks, and we even fixed the problem without resorting to SSDs or hundreds of gigs of RAM.

      • jwilson

        My main cluster is 3 shards with 3 local replicas. The 12 come in to support remote DCs, analytics, etc. Also, 10gen’s “solution” for my read issues was to run two copies of mongo – basically having two replicas on each physical host. Kinda janky if you ask me.

        It’s easy to eat up the 12 replicas this way.

    • thesmart

      *THIS* is the painful process in which DBs get better. Remember, MySQL and the rest had their trial-by-fire, too.

    • David Mytton

      Sounds like you’ve had a lot of problems – perhaps a candidate for a technical analysis post outlining the issues you’re having, your workarounds and links to 10gen JIRA cases for others to vote on? It’s always good to hear about known issues – that’s the whole point of the community because 10gen can only test so much.

      I’d argue their decision to require a replica set to have a majority is a well designed decision for most people as it avoids the network split issue. Perhaps this is a candidate for safe default (as it is) but allowing a command to let the user temporarily force master in a non-majority set. Or maybe they want to completely avoid support issues where this goes wrong (as it inevitably would). Adding an arbiter would help you here.

      Having to rewrite data into a new database is because they have different data files for each DB, which probably helps with being able to remove the global lock.

      Seems like your use case for over 12 replicas is having nodes in different locations for localised reads?

      • jwilson

        Using an arbiter means losing a spot for a useful replica in a remote DC. I shouldn’t have to run an arbiter just in case I reboot a server.

    • Sanuj S.S

      Why don’t you use an Arbiter ? I had the same issue here and solved it using Arbiter. You don’t need server class hardware to run an Arbiter

      • Jeremy Wilson

        See below. An arbiter uses up one of the 12 replica spots that I need for other uses.

        • Sanuj S.S

          Why can’t you set up mongo services in the application servers itself ? It doesn’t need huge conf as it doesn’t hold data and doesn’t affect the performance of the application servers

          • Jeremy Wilson

            If you mean run arbiters on the app servers, it’s the same answer. There’s a hardcoded limit of 12 replicas, which includes arbiters. I’m using all 12 to handle global replication so there’s no *space* for an arbiter.

            Trust me, there’s not many people in the world more familiar with the operation of MongoDB than myself and my team. We manage clusters with more than 100 nodes.

          • Sanuj S.S

            Ohk. That’s a problem.

            Nice to see you here . Was looking for such a person with massive experience in Mongo production operations. Expecting your help in future … Are you there on LinkedIn ?

  • Mike Bartlett

    Great article. Perhaps either yourself or jwilson could write a good basic guide to running Mongo in production. We’re looking to deploy into EC2 soon, and will need to consider future scale (sharding vs slaving). Would be great to have someone who’s experienced this first hand write something vs the instructive but less informative Mongo documentation.

    • Eran Medan (@eranation)

      All I can say is on EC2 that it’s not MongoDB friendly, no trivial way to automate backups, you’ll have to write your own crontab, command line etc, some may say it’s easy, but I didn’t find it so. Besides that MongoDB is a pleasure to work with from the developer perspective, and I think this is why it’s popular, not because it’s DBA friendly (or not)

      • thesmart

        I wonder if the new provisioned IOPS helps with MongoDB. There is also the High-I/O On-Demand Instances that may be suitable.

      • Charity Majors

        It’s easy to automate backups if you use EBS snapshots.

      • Pete

        Where have you found the best place to store your mongodb instances if not on EC2? Would love to know as in a similar boat as the OP

        • David Mytton

          We run our infrastructure on Softlayer and use a variety of nodes depending on the performance requirements. This ranges from high memory + SSD dedicated hardware to simple cloud instances.

    • jwilson

      EC2 is ill-suited for hosting DBs in general – even running on 6GB SAS drives on dedicated hardware gives me issues, and the poor disk performance in EC2 might prove problematic.

  • Konstantine Rybnikov (@ko_bx)

    Sorry, but I’m horrified by you asking to give you way to force something inside broken replica set to go master. You can make such a huge mess with thing like that. Just add a 3rd arbiter process on first machine and you’ll be fine when second machine is down. If you want to be able any machine to go down — add arbiter into 3rd machine.

    • jwilson

      I’m talking about large, sharded server setups – having an arbiter using up a replica slot means I can’t use that slot for a replica somewhere else, and all it’s doing is providing an election decision during an emergency outage, when I am already logged into the machine and able to make a conscious decision as to which server I want to be primary.

      The ease of development and deployment of mongodb is ultimately it’s greatest weakness when it comes to large scale infrastructure – where fine-grained control is most important. Kind of ironic given their purported “it scales!” tagline.

  • JB

    Thank you for explaining in detail the real issues surrounding these “I hate MongoDB” blog posts. Every database technology has a purpose, best practices and pitfalls. To single one out due to your own lack of knowledge or mis-understanding of a technology seems petty if not completely foolish.

  • warmwaffles

    Can you sum up your experiences in a post, especially the ones covering all the gotchas and how you made your systems run more reliably?

    • David Mytton

      There’s quite a few posts over the years indexed at – anything in particular you’d like to hear about?

      • Jack Frost

        I’d love to hear more about scaling and how you work things into plan with large read/writes overtime, and best ways to automate such a system.

        Thank you for a wonderful article.

  • Charity Majors

    Provisioned IOPS made a big difference for us. Latency spikes were a big problem for us on EBS, but PIOPS flattened out nearly all the spikes. Dropped our end-to-end latency in half too.

  • Calvin French-Owen (@calvinfo)

    Great article David. I read a lot of your series when first getting into MongoDB. They were definitely super useful in terms of getting off the ground and understanding how to operate a cluster in production – so thanks for that!

    I feel like a lot of the articles hating on MongoDB don’t really take the time to understand what’s going on under the covers: the memory-mapped files, how B-Tree indexing actually works, and how documents are stored on disk vs. having a total working set. I think like any other database you have to do a certain amount of homework to get it working at a very large scale. If you aren’t reading as much of the documentation as you can for a db you’re using in production, you’ll probably be in trouble.

    We followed all the best practices outlined by MongoDB regarding document size, RAID setup, XFS, sharding and replicas. We still had some problems with disk seeks even after trying to tune our setup.

    The best article I’ve seen that showcased problems similar to what we had is one from the soundcloud engineering guys: The fact that you have little control over where documents are laid out on disk was pretty problematic for us. Pretty much the only grouping you get is at the document level, and its generally bad to grow these significantly past their padding.

    Don’t get me wrong, I think 10gen has done a good job with MongoDB in terms of building something nice for developers. If your query patterns are more or less random anyway or your app has a nicely shaped working set I believe it will work very well as a DB. It’s tricky to tune these things because each application is so different.

    I’m planning on putting out a more in-depth article once we’ve fully benchmarked our data, but I like Cassandra’s ability to read lots of similar data sequentially off disk. It doesn’t work for every application, but I’m hopeful that it will work for us.

    Anyway, thanks for all the articles!

    • David Mytton

      Position of the raw data on disk is a valid concern and you don’t have any control of that with MongoDB. It’s even more of a problem if you do updates which cause the doc to grow as that will rewrite the data. This is the risk of using a non-specialised database for things where that matters e.g. sequential storage of time series data. I’m not sure what product you’d choose to address this but you’d need something specialised for storing time series data.

  • Ben Zittlau (@benzittlau)

    Props for the effort and thought put into this article. I`ve recently dealt with 3 separate products that were MongoDB based and we`ve been looking to move off of it in all 3. Arguably I can believe that Mongo could work great if the proper investment into configuration and docs reading were invested, but personally I`d rather spend my time thinking more about my product and less about my database. I think the primary conclusion out of my personal experiences is to stick with a standard SQL DB by default, and only consider using one of these less mature NoSQL DB`s if there`s a really strong argument for doing so.

    • David Mytton

      I think after a certain scale you’re going to have to think a lot about ops and your DB whatever product you use. In that sense, MongoDB works well because it does work out of the box so easily for development.

  • Steve

    Nice article.
    I have been using MongoDB for more than a year, and been holding somewhere over 100M documents. It’s been doing good so far, and we managed to leverage the ease of deployment initially. Really, the hard part of scaling mongodb, is to evaluate the access pattern, and keep the entire hot data set in the memory. If these criteria are met, mongodb is amazing.

    *Though, there’s a time, where the entire cluster was brought down due to a seg fault, which i guess has been fixed for version >= 1.8.6.

  • Mike Gagnon

    What are you waiting for? Write a book on deploying MongoDB

  • Andrew Dike

    Nice post. I’m evaluating MongoDB for a project and this post was useful response to common issues that I’ve seen around the internet.

  • sdhull


    IMO MongoDB was praised by a bunch of people who jumped on the bandwagon without ever really doing their homework to understand the technology or even acknowledge that the bleeding edge is called “bleeding” for a reason. Then they deployed to production (“We’ll do it live!”) and got burned and subsequently wrote a bunch of blog posts about why they hate Mongo. People should approach any new/untested technology with trepidation when considering using it to power their production environment.

    That said, I’m a developer and I think that the feature set it offers developers is easily one of the best of any datastore available. People talk about switching to Riak and I think it’s hilarious — it offers literally NO features to developers and it is NOT a devops silver bullet (btw, when did devops start dictating tech choices?). If my devops guy(s) can’t figure out how to deploy/manage MongoDB, then (no offense) I’ll be hiring new devops guys who can. Period.

  • Alejandro Ñext

    I like you, is an excellent tool. But I could not create a stream, and edit documents simultaneously. Any examples?

  • Ryan Pringnitz

    Great Article. Does anyone have an updated version of this document?

    The URL is dead, and I am looking for a best pratices document

  • Tiago Luz

    Hey David! We’re SD customers from and we’re starting a migration from MySQL to MongoDB. Your articles/posts have been great to help us to decide the best practices/ways, and not to suffer the same as you with structure/arquitectural mistakes. Thanks man!! (with lot’s of !’s)