Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Blog   >   MongoDB   >   Notes from a production MongoDB deployment

Notes from a production MongoDB deployment

Notes from a production MongoDB Deployment

Back in July last year I wrote about our migration from MySQL to MongoDB. We have been running MongoDB in production for our server monitoring service, Server Density, since then – 8 months – and have learnt quite a few things about it. These are the kind of things that you only experience once you reach a certain level of usage and it is surprising how few issues we’ve had. Where a problem or bug has cropped up, it has been quickly resolved/fixed by the MongoDB dev team; so we’re very happy.

A few stats

Taken from our MongoDB instances as of Fri 26th Feb:

  • Collections (tables): 17,810
  • Indexes: 43,175
  • Documents (rows): 664,158,090

We currently have a manual failover setup with a single master and slave. The master has 72GB RAM and the slave is in a different DC. Given disk space limits, we are in the final stages of migrating to using automated replica pairs with manual sharding across 4 database servers (x2 masters, x2 slaves in different DCs). There were some delays deploying our new servers but we’re expecting all data to finish synching this coming week so we can switch over. This is a classic move from initial hardware (vertical) scaling to adding more servers (horizontal) as we grow. Although manually sharded for now, we’re planning to switch to the auto-sharding coming in MongoDB soon.

Namespace limits

We split our customers across (currently) 3 MongoDB databases because there is a namespace limit of 24,000 per database. This is essentially the number of collections + number of indexes.

You can check the total number of namespaces in use for a particular DB by using the command db.system.namespaces.count() from the MongoDB console. You can also increase this limit by using the –nssize paramter but there are tradeoffs and limits for this. See the MongoDB docs for details.

Durability

There is no full single server durability in MongoDB. This has been highlighted by the MongoDB developers themselves but essentially means you must run multiple servers in replication. If you suffer a power loss and/or MongoDB exits uncleanly then you will need to repair your database. If it’s small this is a simple process from the console, but it involves MongoDB going through every single document and rebuilding it. When you have databases at the size of ours, this would take hours.

You can use master/slave replication (ideally with the slave server in a different data centre) but if there’s a failure on the master you need to failover manually. Instead, you can use replica pairs which handle deciding which is the master between them, and in the event of a switchover will become eventually consistent when they both come back online.

Initial sync/replication of large databases

Our databases are very large and it takes about 48-72 hours to fully sync all our current data onto a new slave in a different DC (via a site-to-site VPN for security). During this time you’re at risk because the slave is not up to date.

Further, you need to ensure that your op log is large enough to hold all actions since the sync begins. MongoDB takes a snapshot from the time you start the sync and then when it’s done, uses the op log to catch up with actions since.

We found we needed a 75GB op log, which can be specified with the –oplogSize command line parameter. However, these files are allocated before MongoDB starts accepting connections. You therefore have to wait whilst the OS creates ~37 2GB files for the logs. On our fast 15k SAS disks this takes between 2-7 seconds per file – 5 mins – but on some of our older test boxes it took up to 30 seconds (~20 mins). During this time your database will be unavailable.

This is particularly a problem if you’re restarting an existing single server in replication mode, or are setting up a replica pair – the restart will take time as the files are allocated.

To avoid this problem, you can create the files yourself and MongoDB will then not have to do it. Run the following from the console, as soon as you hit return after “done” it’ll start making the files:

for i in {0..40} 
do 
echo $i 
head -c 2146435072 /dev/zero > local.$i 
done

Stop MongoDB, make sure any existing local.* files are removed and then move these new ones into your data directory, and restart MongoDB.

This will work for –oplogSize=75000. Note that creating the files will hog the filesystem I/O and slow everything down, but it won’t take your database offline. If you prefer, make the files on another system then transfer them over the network.

Initial sync “slows” things

When doing a fresh sync from a master to a slave, we have observed a “slowdown” in our application response times. We’ve not looked into this in detail because it’s still within acceptable response time ranges but it seems to be caused by a slightly higher wait time when our application is connecting and querying MongoDB. This causes http processes to hang around a little longer for each request and so CPU load increases.

My theory is that because the slave is reading all data from every collection in every database, the cache is being invalidated often and so reads can’t take advantage of it as well. But I’m not sure – it’s not been reported to MongoDB as it’s not really a problem. It would, however, be a problem, if your server was unable to handle the load from the Apache processes. It’s quite an edge case though.

Running MongoDB as a daemon/forked process + logging

Back in the olden days we ran MongoDB in a screen session but you can now just specify the –fork parameter and MongoDB will start as a forked process. If you do this, be sure to specify –logpath so you can check for any errors. We also use the –quiet option otherwise too much gets logged and rapidly fills up the file. There is no inbuilt log rotation.

OS tweaks

We came across a problem with too many open files which was caused by the default OS file descriptor limit. This is set at 1024 which can be too low. MongoDB now has documentation about this but you can set your limit higher on Red Hat systems by editing the /etc/security/limits.conf file. You also need to set UsePAM to yes in /etc/ssh/sshd_config to have it take effect when you log in as your user.

We also disabled atime on our database servers so that the filesystem isn’t caught up updating timestamps every time MongoDB reads from all its files.

Index creation blocks

We create all our indexes at the very beginning so the initial index process is pretty much instantaneous. However, if you have an existing collection and create a new index on it then that process will block the database until the index is created.

This is resolved in the development version of MongoDB (1.3) which introduces background indexing as an option.

Efficiency of reclaiming diskspace

The nature of our server monitoring application means we collect a lot of data, but it is deleted after a set retention period. We have found that there is a massive discrepancy between a master and a freshly copied slave. Obviously the slave is copying the data and storing as optimally as possible (no gaps between sectors where stuff has been deleted) so the very first sync will use less disk space than the master. However, we have seen a master using almost 900GB where a fresh slave uses only 350GB.

We have an open issue with MongoDB commercial support about this.

Support is excellent

Even before 10gen (the company behind MongoDB) got their $1.5m venture funding, their support was excellent. And it still is. We use their gold commercial support service and it has proved useful when we encountered many of the issues above. Opening tickets is easy and we get extremely fast responses from developers. I can also call and speak to them 24/7 in emergencies.

And if you don’t want / need to pay then their open mailing list is still very good. We need the ability to get support very quickly 24/7 but I’ve always had very fast replies from the mailing list.

Conclusion – was it the right move?

Yes. MongoDB has been an excellent choice. Administration of massive databases is easy, it scales extremely well and support is top notch. The only thing I’m missing and waiting for now is auto sharding. We’re doing manual sharding but having it handled by MongoDB is going to be very cool!

  • Pingback: Choosing a non-relational database; why we migrated from MySQL to MongoDB « Boxed Ice Blog

  • http://blog.paulbetts.org Paul Betts

    Great article, I’m very interested in MongoDB and it’s great to see how people are using it in production (and what problems they’re seeing)

    One question though, why are you generating the zero-byte files in such a strange way? It’s *got* to be faster to do:

    dd if=/dev/zero of=local.$i bs=1M

  • http://www.saiyine.com Saiyine

    Have you tried using sparse files for the logs?

    dd if=/dev/zero of=file.tmp bs=1 count=0 seek=2G

    • http://www.serverdensity.com David Mytton

      Some comments here http://news.ycombinator.com/item?id=1157856 about sparse files.

      If my understanding is correct then this would defeat the point. As per the MongoDB docs:

      These files are prefilled with zero bytes. This inititialization can take up to a minute (less on a fast disk subsystem) for larger datafiles; without prefilling in the background this could result in significant delays when a new file must be prepopulated.

      So if you create the files without filling them, you don’t get the advantages of pre-allocation.

  • http://smartic.us bryanl

    On Linux, creating a sparse file using the aforementioned dd method is immensely faster. These tests were run on an off the shelf linux box running ubuntu karmic:


    $ time head -c 2146435072 /dev/zero > slow

    real 0m25.308s
    user 0m0.220s
    sys 0m19.390s


    $ time dd if=/dev/zero of=file.tmp bs=1 count=0 seek=2G
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 1.6775e-05 s, 0.0 kB/s

    real 0m0.010s
    user 0m0.010s
    sys 0m0.010s

    There really is no comparison there.

    • jlhm

      The dd command you are using does not create the file with zeros as the head command is doing. What your dd do is only to allocate the space on the disk for the file.

      MongoDB require that the file contains zeros so your dd command must be modified to actually write zeros to the file.

  • EH

    Your 0 byte files won’t sort properly in a directory listing as you will have local.1 local.10 local.11, etc and then local.2. Try:

    for i in 0{1..9} {10..40}

    • http://www.serverdensity.com David Mytton

      The filenames are not arbitrary, they have to be local.0, local.1, local.2 etc because that is what MongoDB creates/expects. You’re creating the files so MongoDB doesn’t have to, so they need to be the same. It doesn’t matter what the files look like in a directory listing anyway.

    • Dude

      Directory listing? What difference could that possibly make and why was this your only take-away from this article.

      Great post David!

  • http://www.saiyine.com Saiyine

    They won’t be 0 bytes long to the filesystem, but anyway, the wrong order doesn’t come from it, but for the well known difference between the way computers (and programmers!) think ordering things should be and real worl ordering.

    Think ZZZZ.txt coming before aaaa.txt.

  • Pingback: notes from a production mongoDB deployment — award tour

  • Patrickg

    I am curious if you can avoid the oplog creation files problem by using either a different filesystem such as XFS , a compressed loopback filesystem mount, or even an iSCSI-mounted directory with the files on a separate server. XFS has sparse file support, and a compressed filesystem mounted on loopback should greatly compress the file data that needs to be written to disk on the initial creation.

  • MikeBo

    Thanks for the great write up. I’d love to hear some more about your schema design. Did you choose to have lots of small collections vs. fewer but much larger collections based on perf, ease of sharding by hand (before it was included in mongodb), or something else?

    With the new sharding support I’m tempted to go for fewer but much larger collections, include “customer_id” on all documents, and let the system shard by that.

    • http://www.serverdensity.com David Mytton

      We decided not to have a database per customer because of the way MongoDB allocates its data files. Each database uses it’s own set of files:

      The first file for a database is dbname.0, then dbname.1, etc. dbname.0 will be 64MB, dbname.1 128MB, etc., up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB.

      Thus if the last datafile present is say, 1GB, that file might be 90% empty if it was recently reached.

      from the manual.

      As users sign up to the trial and give things a go, we’d get more and more databases that were at least 2GB in size, even if the whole of the data file wasn’t use. We found this used a massive amount of disk space compared to having several databases for all customers where the disk space can be used to maximum efficiency.

      Sharding will be on a per collection basis as standard which presents a problem where the collection never reaches the minimum size to start sharding, as is the case for quite a few of ours (e.g. collections just storing user login details). However, we have requested that this will also be able to be done on a per database level. See http://jira.mongodb.org/browse/SHARDING-41

      There are no performance tradeoffs using lots of collections. See http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections

      • MikeBo

        I actualy wasn’t thinking a database per customer was a good idea. I was more wondering why you have so many collections — surely they’re not all representing different logical data. My guess is that each customer gets some # of collections each. After all, your average row count per collcetion is only ~37k rows — fairly small.

        With the new built-in sharding support, couldn’t you have a ‘customer_id’ field and collapse many of those collections into larger collections, then shard by ‘customer_id’?

        Not saying your approach is wrong at all, I’m just trying to understand tradeoff between these two types of schemas. I’d love to hear if you guys have any experience with collections with a large # of documents (10MM rows or higher) and what the performance was like.

      • http://www.serverdensity.com David Mytton

        This is a relic from when we used MySQL. We just did a 1-to-1 translation from MySQL tables to MongoDB collections. Back then the application was designed to be run independently as a “download” i.e we’d install on customer servers. We decided to drop that idea and offer just a hosted service only, but the architecture to do that is still there for the most part.

        We have a collection that stores the process list that is posted back as part of our agent postback every 60 seconds. With customers with many servers (10+, with some 80+) then this collection can hold many 10s of millions of documents even though we only retain that data for a month.

        In these cases, there is no performance issue. Inserts are extremely fast and we do very rare reads from them – the process list is only read when it’s pulled out in the server snapshot functionality (see the state of a server at a given timestamp) and these can be very random in terms of timestamp. Again, no performance issues even reading a random document amongst 10s of millions.

  • http://sbtourist.blogspot.com Sergio Bossa

    Hi,

    thanks for your writeup.

    Anyways, it seems indeed you had almost the same problems you would had with a MySQL solution:
    - Huge data to deal with.
    - Manual sharding.
    - Sync/replication delays.

    So why didn’t you evaluate to switch to a more “large-scale” nosql solution like Cassandra or Riak?

  • Pingback: Labnotes » Rounded Corners 249 — Life before Google

  • http://elfsternberg.com Elf M. Sternberg

    Is there any reason you create them all using the same process? Why not just create local.1 and then copy it to local.[2-40]?

  • http://kost-bebix.ya.ru kost BebiX

    What about transactions and many-to-many relationships? Do you deal with that? How? Any problems? Thank you.

    • http://www.serverdensity.com David Mytton

      It’s easy to deal with these because they don’t exist :) We don’t need them or use them.

      We maintain a separate database for customer account details and invoicing using MySQL for transaction support, but don’t really need it and are considering moving everything to MongoDB in the future.

  • Pingback: Top Posts — WordPress.com

  • Mike

    Hi,

    Do you’ve pre-defined indexes on every collection?

    I’ve thoughts to use mongodb to store my product data in one collection (ie. dimensions, color, …) which is dynamic because all product data is different* on each of our products. But without to pre-index every dynamic created field because we can’t and don’t know what properties the collection come to have. Do you think this is a bad design?

    Even if mongodb create index when a new property is added, how good is when you want to query the data via the properties that had be created on-the-fly?

    /Mike

    * Example: Product A have these properties: a, b, c, d and Product B: a, c, p, q and Product C: b, q, m, n.

    • http://www.serverdensity.com David Mytton

      If you’re just adding more fields in the same collection and indexing them, then the documents without those fields just won’t have anything for the bindex. As far as indexing speed is concerned, I don’t know how this would be handled e.g. would MongoDB loop over every document? Best post on their mailing list to get a definitive answer.

  • Julie

    Would you mind to share how do you structure your Product Schema, Invoice Schema and Billing Schema. It does not need to be detail. I just want to know if I sell digital good or web subscription service, how am I design the schema in MongoDB.

    Thanks again

    • http://www.serverdensity.com David Mytton

      All our invoicing is done in MySQL. We only store very basic records and use http://www.xero.com and their API to handle sending invoices, PDFs, recording payments, etc.

      • Julie

        Thanks David. It is really useful.

        However your comment make me to reconsider using MongoDB for my invoice. Seems like a lot of others site also use mix database. Less valuable data in MongoDB and important data in MySQL.

      • http://www.serverdensity.com David Mytton

        That’s right. MongoDB is designed for high performance but has tradeoffs in terms of durability and atomicity. You should use a relational database for invoicing.

      • Julie

        Excellent info David. Do you know what if the possible worst case if I use “invoice” in MongoDB? For example
        – MongoDB crash? invoice gone, just restore back the database
        – MongoDB is not atomicity ==> pc shut down, the user see their balance become 0. ?

      • http://www.serverdensity.com David Mytton

        The lack of atomicity means you cannot guarantee writes will succeed and a crash may result in data loss (although unlikely). See http://www.mongodb.org/display/DOCS/Durability+and+Repair

      • Julie

        Thanks again for really helpful feedback. You are a very helpful person. It is really help me in my research since MongoDB still very young. and thanks also for xero.com link.

  • http://hasmanyquestions.wordpress.com Gavin Stark

    Can you speak a bit about how you chose to do backups?

    • http://www.serverdensity.com David Mytton

      We have replica pairs which act as immediate “backup” in terms of automated failover. As for actual backup in terms of being able to restore data in case of deletion or massive failure, we wrote our own script to use the mongodump utility to dump customer collections: http://www.mongodb.org/display/DOCS/Import+Export+Tools

  • http://www.appliedeye.com Adi

    i was wondering if having the slave in another data center is the reason slave takes too long to catch up with master..or did i miss something..

    • http://www.serverdensity.com David Mytton

      Yes that’s right.

  • Joseph

    What is the avg document size of your largest table? How many documents are in it?
    We have more than 500M documents for a table. Each document is 100-500 bytes.
    We think we’ll have few thousand of updates per second. What is your system update
    rate? Wondering if MongoDB can handle such big update volume. What is the typical
    query performance that you see on MongoDB? 200 QPS? 1000 QPS?

    • http://www.serverdensity.com David Mytton

      Our largest collections have between 20 and 50m documents. We are seeing between 20 and 60 inserts per second. You’d need to run your own tests on the kind of hardware you’re expecting to use to see how well it performs at those levels of inserts/data.

      I’d think it’ll really depend on what indexes you have as they have to be updated on every insert unless you use background indexing (http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation). Of course, a background index means it won’t necessarily be available right away for reads.

      You also want to ensure your indexes will fit in memory otherwise you’ll get a lot of swapping.

      • Joseph

        Thanks for providing performance information.

        http://blog.mongodb.org/post/472834501/mongodb-1-4-performance shows much higher rate for inserts and updates. Am I correct in assuming that 20-60 inserts per second are
        for non-trial documents and you have multiple index keys? Do you see any performance
        degradation (for query and insert) when a collection reach certain size? B-tree has
        O(log n) performance. Do you do much in-place updates in your application? How about
        deletion?

        Your monitor system generates less than 60 documents per second? Otherwise, how
        can Mongo keep up with all the inserts if you are seeing 20-60 inserts/second.

      • http://www.serverdensity.com David Mytton

        We haven’t seen any performance issues with increasing numbers of documents so long as the index stays in RAM. If it doesn’t then you get performance problems.

        We do no updates on our larger collections but do many deletes. See “Efficiency of reclaiming diskspace” in the post above and the MongoDB case that is tracking this: http://jira.mongodb.org/browse/SERVER-366

  • max

    Hi, where i download a test mongodb (large db, more than million entry..) ?
    thanks

    • http://www.serverdensity.com David Mytton

      You have to make one yourself.

    • Henning

      You can download the stackoverflow.com datadump. Not sure what the exact url is though

  • Pingback: MongoDB monitoring « Boxed Ice Blog

  • Pingback: MongoDB Webinar: Approaching 1 Billion Documents « Boxed Ice Blog

  • http://www.braintapper.com Steven Ng

    This is a great article. I’ve just made the decision to migrate a multi-tenant app I’m working on from MySQL to MongoDB and I had some questions in the area of approach, and this article (and its comments) answered them all.

    Thanks, David!

  • Pingback: MongoDB Webinar: Approaching 1 Billion Documents slides and audio « Boxed Ice Blog

  • Pingback: MongoUK MongoDB London conference discount « Boxed Ice Blog

  • Pingback: Beginner Step Into MongoDB | KomunitasWeb

  • http://danweinreb.org/blog Dan Weinreb

    Thank you so much for taking the time and effort to share all this useful information!

  • http://www.mattinsler.com Matt Insler

    Hey, I was just wondering, where are you hosting your databases? I’m looking at running MongoDB on EC2 and trying to evaluate the speed and cost. Do you have any numbers you’d be willing to share?

  • Jaco van Staden

    I’m currently reviewing MongoDB as a possibility for going forward in our company. I’m specifically interested in setting up a location based sharding environment and running our domain code from a .Net environment. I went through your article around how to set up the MongoDB environment with sharding and replica sets, but was wondering whether you have any advice on how to set up locale based sharding across multiple international data-centers?

    Thanks for your posts, they’re really useful and helpful and made life a bit easier :-)

    • http://www.serverdensity.com David Mytton

      You can manually migrate data to different shards but it could always be moved again. You could create a shard key which is the location so that data with the same key will exist on the same shard but there’s no way to specify which shard data should exist on.

      I think the best option would be to create a separate cluster within each data centre and send queries to the appropriate cluster within the application logic.

  • hackeron

    Why not just use CouchDB? – It seems all these database unavailable, database repair, master-master replication, namespace limits, index creation blocking, etc are all non issues in CouchDB. Why stick with MongoDB over CouchDB?

    • http://www.serverdensity.com David Mytton
      • hackeron

        I’ve seen that. You can always use a megaview (https://wiki.mozilla.org/Raindrop/Megaview) or create multiple views on the keys you’re interested and join on the app level or something? – Can you give me an example of an adhoc query you need to run? – When you have monitoring data coming in from all your clients, isn’t it a problem you can’t store it for hours?

        For my app, which is a web nased surveillance system, see: http://demo1.xanview.com (user/pass: demo/demo) – I plan to predict some of the views like “large object”, “long event”, etc and when an adhoc query is needed (which I don’t think will be often), a user will be able to generate an extra view with a name and description which will show up in the interface in gray with a progress spinner until it’s finished processing as adding views doesn’t block.

        All this is theory though, I’m still evaluating which database technology to choose. However it seems the MongoDB disadvantages are pretty huge: database unavailable for hours at a time when you repair, add an index, initial sync/replicate, inefficiency reclaiming disk space, inability to do master/master replication, namespace limits, no incremental map/reduce so queries are expensive.

        CouchDB’s disadvantages seem to be no adhoc queries and slower with smaller loads or am I missing something?

    • http://www.serverdensity.com David Mytton

      I’ve not looked at CouchDB again since we migrated to Mongo. It was easy to migrate at the beginning because the service was so new but we’re now heavily integrated into it and moving isn’t really an option.

      To address some of the disadvantages:

      - repair database only needs to happen where the DB was shut down improperly e.g. power failure. You’d use replication to mitigate this so you can repair a database without loss of service. This lack of single server durability will be “fixed” in 1.8 later this year. We’ve never had to do a repair in over a year of usage.

      - Indexing blocks but you can use background indexes to prevent that.

      - Initial sync/replication is an issue with any database where you have to do a full resync

      - Disk space inefficiency is resolved in the latest version and further compaction will be introduced in a future release.

      - Namespace limits – do you need 24k indexes and/or collections per database?

      - Map reduce is fairly slow right now – this is perhaps the only real problem that you might need to choose CouchDB over MongoDB.

      It’s also important to note that MongoDB has commercial support behind it, and they are very responsive. This has been extremely useful for us.

  • http://www.allaboutbalance.com/ Rob S.

    We’re in a similar situation – bringing up a slave against a new master with a huge database (100GB+).

    Hugely helpful man, thanks!

    • Sanuj S.S

      100GB is nothing for Mongo. We could setup a slave in hardly 10 hours with 2 TB. :)