Server Density Blog

Interesting devops tech stuff

Pssst… Server Density v2 is coming soon!

Switching to MongoDB and auto increment

Written by David Mytton

I originally prototyped our server monitoring service, Server Density, using MySQL, a database I’d been using for years on other projects. One standard feature is the auto_increment field property which lets MySQL manage IDs for each row you insert. You can set this to unique and be sure that the ID will increment by one each time – essentially a unique ID for each row. This works fine for small scale systems and although I worked on some high traffic sites, I never got to the scale where we had to consider sharding or issues with unique identifiers across huge systems.

Document IDs

So when we switched over to MongoDB, I simply ported over the integer fields and put the auto increment logic into the application code, rather than using the MongoDB object IDs. This was partially because the system already had a fairly large amount of data already using the integer IDs that I didn’t want to update but mostly because I didn’t consider the scaling requirements and the implications of things like race conditions.

3 years later, this choice has been an ongoing problem in our codebase for a while. The issue is to do with figuring out how to calculate the next number to increment across clusters and maintaining counts when documents are deleted. For example, this could be done based on the current highest ID + 1 but that is dependant on items not being deleted (otherwise you reuse IDs).

You could also set an ID in your code, check to make sure it doesn’t exist then write it, but in-between the check and the write another write may have come in instead. The application code has no way to check this because it’s not managed by the database.

One solution here is the use of a central registry. This could keep track of the IDs and use atomic modifiers to ensure that they are unique. That might work right now but as things continue to grow, you’d likely hit locking concerns on that field. This can be done using findAndModify:

function counter(name) {
    var ret = db.counters.findAndModify({query:{_id:name}, update:{$inc : {next:1}}, "new":true, upsert:true});
    // ret == { "_id" : "users", "next" : 1 }
    return ret.next;
}

db.users.insert({_id:counter("users"), name:"Sarah C."}) // _id : 1
db.users.insert({_id:counter("users"), name:"Bob D."}) // _id : 2

Essentially, having learnt a significant amount about scaling high volume data services, I can see the mistake that I made way back in the olden days. I now consider auto increment and integer IDs completely inappropriate for any use case. Instead, you need to use a proper UUID method to ensure that you don’t hit race conditions and the ID is truly unique across clusters.

MongoDB ObjectID

And this is what the MongoID is designed for. It’s designed to be cluster unique (not worldwide unique like a proper UUID) and is provided by default for every document under the _id field:

A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter.

We have a major new release of Server Density coming out next year where this has been solved at the core (rather than the workarounds we have in place now). This is really only a problem if you are migrating existing code to MongoDB – new projects should really use MongoIDs from the beginning.

I’m often asked what my biggest mistake / gotcha has been with MongoDB and I usually have trouble thinking of a good reply (MongoDB hasn’t been without it’s problems but they are usually bugs which get fixes very quickly)…so I think this is now a good response. Everyone makes mistakes and we learn as we go, especially when building a startup to the level of data we process each month now (over 17TB).

Enjoy this post? You may also like MongoDB Benchmarks