Reclaiming memory from MongoDB indexes

By David Mytton,
CEO & Founder of Server Density.

Published on the 10th December, 2010.

As of the current version of MongoDB (1.7.3), deleting data does not compact indexes and their use of RAM. Indeed, this was the root cause of the problem Foursquare experienced several months ago. This means that once your indexes exceed memory (or the working set exceeds memory), deleting data (or moving it to another shard) will not alieviate any memory problems you might be having.

MongoDB is fairly intelligent about what it puts into memory when there’s not enough to store everything, but this limitation means that you may still see performance issues even after pruning data.

The best way to avoid this situation is to properly monitor your database to ensure that you always have sufficient RAM. However, until online compacting functionality is implemented (expected in Q1 2011), if you hit this limit you can work around the problem by running a repair on the database. This will compact everything (data on disk and indexes) since it rebuilds everything from scratch. But this can take a very long time. Instead, you could re-sync your slaves and then when they are completed, have another server take over as master so you can compact that last remaining one.

  • Luca

    Will MongoDB support a sort of virtual memory like REDIS (seems an interesting feature) ?

    Virtual Memory¶

    Redis Virtual Memory allows users to grow their dataset beyond the limits of their RAM.

    More on virtual memory:


    • You can already do this. There’s no limit to the amount of data/indexes but there’s a performance impact if you go over.

  • Another solution is to run repairDatabase() on each node, one by one. You have to stop the mongod replica set node and restart it as a stand-alone mongod, run repairDatabase() and then restart it as a replica set member.

    A local repairDatabase() + oplog catchup seems to be faster than a re-sync. It also avoids the extra load on the master for the initial sync.

    • A resync was faster for us then a repair.

      • Yeah, I suppose it depends on I/O performance. You have to read & write a fresh database at the same time. If you have multiple disks you can use –repairpath to offload some I/O.

      • gatesvp

        The resync being faster actually seems to make sense. For most people, the drive is going to be the big bottle neck in the “rebuild chain”.

        If you try to re-sync, you get to pull data down from the network and max out the I/O throughput with just writing.

        If you try a rebuild then you have to both read & and write on that same drive. So you’re basically just putting more traffic on the existing bottleneck.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time