Queueing MongoDB using MongoDB

By David Mytton,
CEO & Founder of Server Density.

Published on the 13th April, 2011.

Update Nov 2015: – We have since reached the limits of MongoDB’s atomic findAndModify and have moved over to using Kafka for queuing.

Several months ago we started seeing increased response times in our server monitoring application, Server Density. The problem was growing slowly over time and it quickly became clear that MongoDB was the bottleneck – we were seeing large queue spikes as data was being inserted into the database.

Server Density graph

Every time a server posts its monitoring payload back a number of things happen. One of these is displaying our time series graphs. Other events include triggering alerts and updating the dashboard and API with the latest values but what we’re concerned with here is the insertion of the metrics data. At the time we had the sharding balancer disabled because of a number of bugs (which also prevented us doing manual chunk migrations) and so all the data resided on a single shard. This meant that a huge number of inserts were occurring on a massive collection on a single server.

We use PHP for our frontend web UI and this is what handles incoming agent postbacks. The Mongo PHP driver includes connection pooling and we have a mongos router on each of our web nodes, but even with this there were several hundred threads continuously inserting data into the large metrics collection. This resulted in concurrency problems as the threads were waiting in line to complete the insert and as the queue built up, the whole cluster would slow down because it was waiting for the locks to be released.

With rebalancing data not an option, we needed something to sit in the middle to take all the incoming inserts and simply pass them on to the main MongoDB metrics collection. This would be isolated from the main cluster so read queries could be unhindered and instead of hundreds of insert processes, this middle layer would run just a few. This would reduce the queuing needed as Mongo can handle concurrency much better with just a few threads inserting.

A proper queuing system

RabbitMQ Logo

What we needed is exactly what queuing systems like RabbitMQ and ZeroMQ are designed to do. You have dumb consumers (clients) which request queue items and then perform a task. We already use RabbitMQ for our alert processing so we decided to try it as the frontline for metrics inserts too. Data would come into RabbitMQ then get processed out and inserted into the main Mongo database by several consumers.

Unfortunately this didn’t work out. All of the PHP AMQP libraries we tried were too unstable – either they didn’t work at all or they worked in low load testing but at high loads, performed terribly. We saw excellent response times and then suddenly there would be a massive spike and the entire postbacks process would hang. This appeared to be being caused by the connection pool resetting and every insert trying to recreate its connection.

So we moved to ZeroMQ to see if that would be any better. This time inserting into one queue worked perfectly whilst a second queue in the same connection would mysteriously loose all the data – nothing would ever get inserted.

Evidently PHP isn’t widely used with these queuing systems (probably for good reason since it’s not designed for long running processes) so we considered rewriting our postbacks handling in Python, but that would’ve meant redoing a lot of fairly complex code.

How about Redis?

Redis Logo

We really wanted to use a proper queuing system so we didn’t have to build our own to handle things like queuing messages, acknowledging receipt and handling multiple dumb consumers. So we started looking at Redis because it supports pub/sub, which is a kind of queuing mechanism. Then we thought – why not use MongoDB? We have built up a significant body of knowledge about Mongo and just being able to use another instance would mean would won’t need to re-learn to administer another database system, particularly one that is still “new”.

MongoDB as a (very simple) queue

MongoDB Logo

We were able to get a new MongoDB replica set up and running quickly and it only required a few lines changed in our main application code to direct the relevant inserts into this instance instead of the main one.

Data is inserted into capped collections in this new instance so all we need to do is poll the database for unprocessed collections, insert them into the main database and then mark them processed. Purging the old documents is handled very efficiently by the database. Every document includes a single character field called p which is set to False by default.

A couple of Python daemons continually poll the database for all documents where p is False and insert them into the main cluster before finally doing a { $set : { p : True } } which is an extremely fast in-place update (the document size doesn’t grow so it doesn’t need to be rewritten on disk). It is also atomic.

Capped Collection

Using this set up we can even pause the insertion of metrics into the main cluster (in the event of a problem, or maintenance) with minimal impact to the user – alerts are processed separately so the user would only notice graphs no-longer updating. And when we switch it back on again we can process the backlog in just a few minutes with no data loss.

With the fixes in MongoDB 1.8, we have since enabled the balancing so these large collections are now spread out across the cluster. This helps with performance even more because inserts are spread across multiple machines.

And so we are now queuing MongoDB inserts using MongoDB itself! There are only a few threads doing inserts and this resulted in completely eliminating our queue spikes.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time