Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Blog   >   MongoDB   >   Queueing MongoDB using MongoDB

Queueing MongoDB using MongoDB

Several months ago we started seeing increased response times in our server monitoring application, Server Density. The problem was growing slowly over time and it quickly became clear that MongoDB was the bottleneck – we were seeing large queue spikes as data was being inserted into the database.

Server Density graph

Every time a server posts its monitoring payload back a number of things happen. One of these is displaying our time series graphs. Other events include triggering alerts and updating the dashboard and API with the latest values but what we’re concerned with here is the insertion of the metrics data. At the time we had the sharding balancer disabled because of a number of bugs (which also prevented us doing manual chunk migrations) and so all the data resided on a single shard. This meant that a huge number of inserts were occurring on a massive collection on a single server.

We use PHP for our frontend web UI and this is what handles incoming agent postbacks. The Mongo PHP driver includes connection pooling and we have a mongos router on each of our web nodes, but even with this there were several hundred threads continuously inserting data into the large metrics collection. This resulted in concurrency problems as the threads were waiting in line to complete the insert and as the queue built up, the whole cluster would slow down because it was waiting for the locks to be released.

With rebalancing data not an option, we needed something to sit in the middle to take all the incoming inserts and simply pass them on to the main MongoDB metrics collection. This would be isolated from the main cluster so read queries could be unhindered and instead of hundreds of insert processes, this middle layer would run just a few. This would reduce the queuing needed as Mongo can handle concurrency much better with just a few threads inserting.

A proper queuing system

RabbitMQ Logo

What we needed is exactly what queuing systems like RabbitMQ and ZeroMQ are designed to do. You have dumb consumers (clients) which request queue items and then perform a task. We already use RabbitMQ for our alert processing so we decided to try it as the frontline for metrics inserts too. Data would come into RabbitMQ then get processed out and inserted into the main Mongo database by several consumers.

Unfortunately this didn’t work out. All of the PHP AMQP libraries we tried were too unstable – either they didn’t work at all or they worked in low load testing but at high loads, performed terribly. We saw excellent response times and then suddenly there would be a massive spike and the entire postbacks process would hang. This appeared to be being caused by the connection pool resetting and every insert trying to recreate its connection.

So we moved to ZeroMQ to see if that would be any better. This time inserting into one queue worked perfectly whilst a second queue in the same connection would mysteriously loose all the data – nothing would ever get inserted.

Evidently PHP isn’t widely used with these queuing systems (probably for good reason since it’s not designed for long running processes) so we considered rewriting our postbacks handling in Python, but that would’ve meant redoing a lot of fairly complex code.

How about Redis?

Redis Logo

We really wanted to use a proper queuing system so we didn’t have to build our own to handle things like queuing messages, acknowledging receipt and handling multiple dumb consumers. So we started looking at Redis because it supports pub/sub, which is a kind of queuing mechanism. Then we thought – why not use MongoDB? We have built up a significant body of knowledge about Mongo and just being able to use another instance would mean would won’t need to re-learn to administer another database system, particularly one that is still “new”.

MongoDB as a (very simple) queue

MongoDB Logo

We were able to get a new MongoDB replica set up and running quickly and it only required a few lines changed in our main application code to direct the relevant inserts into this instance instead of the main one.

Data is inserted into capped collections in this new instance so all we need to do is poll the database for unprocessed collections, insert them into the main database and then mark them processed. Purging the old documents is handled very efficiently by the database. Every document includes a single character field called p which is set to False by default.

A couple of Python daemons continually poll the database for all documents where p is False and insert them into the main cluster before finally doing a { $set : { p : True } } which is an extremely fast in-place update (the document size doesn’t grow so it doesn’t need to be rewritten on disk). It is also atomic.

Capped Collection

Using this set up we can even pause the insertion of metrics into the main cluster (in the event of a problem, or maintenance) with minimal impact to the user – alerts are processed separately so the user would only notice graphs no-longer updating. And when we switch it back on again we can process the backlog in just a few minutes with no data loss.

With the fixes in MongoDB 1.8, we have since enabled the balancing so these large collections are now spread out across the cluster. This helps with performance even more because inserts are spread across multiple machines.

And so we are now queuing MongoDB inserts using MongoDB itself! There are only a few threads doing inserts and this resulted in completely eliminating our queue spikes.

  • Wouter

    I remember asking about using MongoDB for a queue before on this blog, found my comment @ http://blog.boxedice.com/2010/08/17/the-exciting-adventures-of-an-alert-notification/#comment-1176.

    I’ve used MongoDB as a queue and never tried other systems, and based on your experiences, I guess I’ll stay with what I know!

  • Benjamin

    Do you have acknowledging for tasks or is this already covered with the open cursors in the capped collection? I’m currently checking out options to queue requests and mongodb would definately be welcome since its the data backend too.

  • http://myCloudWatcher.com/ Edward M. Goldberg

    David,

    Once again a great post. You have “hit the nail on the head”.

    Several points:

    1) Keep the number of programs used as small as you can. Just use more MongoDB in more places.

    2) Use all of the features of the new MongoDB. Like Capped Collections. New features can help.

    3) The update of a “Document” in MongoDB that “does not change size” and is indexed is fast.

    I do have one open question David. You say that you now use “the same MongoDB Array” for both the quque and the store. Why? What is added by “over-lapping” the use of an array for two uses?

    I would assume that the very different use of memory by the Shard Store Array and the Queue server would “clash” and waste reqources. How do they “share nice”?

    Do you process the data from the Queue Collections –> the Store Collection “inside the server” and avoid the data transfers?

    Great post David!! Glad to have you on the MongoDB Team!!

    Please call any time:
    Cell:     916-202-1600
    Skype:  EdwardMGoldberg

    Edward M. Goldberg
    http://myCloudWatcher.com/
    e.m.g.

    • http://www.serverdensity.com David Mytton

      We have 2 separate clusters – one is the main store for the app and the other is the queue discussed here. They’re on separate servers and the queue acts as a middle layer whereby postbacks come in via HTTP, then to the queue, then to the main MongoDB cluster instead of direct to the main MongoDB cluster. This avoids large numbers of inserts onto the main cluster which were affecting read performance.

      There is an additional data transfer. Instead of app server -> main database it is now app server -> queue database -> main database. But this is over a fast internal network so has no real cost (at this stage).

  • Tom

    Did you consider Beanstalkd with the PHP client library called Pheanstalkd. I have used them over the last year and they have been very stable.

    • http://www.serverdensity.com David Mytton

      We didn’t look at that – wasn’t aware it existed.

    • http://99designs.com Lachlan Donald

      Yup, we run a PHP stack and make heavy use of beanstalkd on 99designs.com,

      Paul wrote Pheanstalk because there wasn’t really any decent PHP queuing libraries available:

      https://github.com/pda/pheanstalk

    • http://socialping.com Joel Strellner

      We use beanstalkd at Socialping as well. We push hundreds of thousands of messages per min, and it is rock solid. If you use it, make sure to tune the settings a bit for your needs.

      We’ve been using it for about 2 years now. Initially there were a number of stability issues, but, like Tom, we haven’t had any beanstalkd related issues for about a year now. Highly recommend.

  • http://videlalvaro.github.com/ Alvaro

    When you say that you have high load on the fronted… how many request per second are we talking about? I remember using this library https://github.com/tnc/php-amqplib at a dating website to send notifications via RabbitMQ. We used it to process hundred of thousands of jobs per day.

    • http://www.serverdensity.com David Mytton

      The library we tested “broke” in our production tests at around 20req/s.

      • http://www.rabbitmq.com alexis

        That’s extremely weird – can you post the client code you were using? Even ‘slow’ PHP libraries should be happy with much higher throughput rates than that.

      • http://www.serverdensity.com David Mytton

        I believe it was http://code.google.com/p/php-amqp/ that we tried, and possibly another one too. The code was taken directly from the docs i.e. it was just a couple of lines required to open the connection, define the queue and do the insert.

  • http://chr.ishenry.com Chris Henry

    I’m surprised you guys didn’t try Gearman. We have a similar situation where we need to do a large number of inserts based on incoming requests. We were able to use Gearman to ‘background’ tasks as soon as they came in. There’s no locking, the php client just drops the new job in the queue. I can imagine that even with a completely separate, capped collection, there was some sort of locking, as Mongo does have that nasty global write lock. I would even suspect that using a capped collection would be somewhat dangerous. If your jobs got far enough behind, wouldn’t mongo start dropping jobs that had potentially not been processed?

    Working with Gearman in php is pretty straightforward, as there is a php extension that’s relatively easy to install, and is very reliable, as is the Gearman server. I would highly recommend you take a look at it.

  • David Tinker

    I work for BrandsEye, an online monitoring company processing streams of tweets and other brand mentions. We had a difference experience using MongoDB as a queue. We found that when the capped collection got full it used a lot of IO on the machine + it took a long time to replay messages. Now we use QDB (http://qdb.io/) to buffer messages and push/replay them to RabbitMQ. Much more efficient. Disclosure: I wrote QDB.