Checking if a document exists – MongoDB slow findOne vs find

server-density-and-mongodb

By David Mytton,
CEO & Founder of Server Density.

Published on the 14th March, 2013.

One of the biggest optimisations to make MongoDB writes faster is to avoid moving documents by doing updates in place to a preallocated document. You only want to preallocate if the document doesn’t exist and to check this you need to do a query.

The findOne() method seems like the right choice for this because you can query the relevant index and check to see if a document or None is returned, ideally using a covered index.

However, it is significantly faster to use find() + limit() because findOne() will always read + return the document if it exists. find() just returns a cursor (or not) and only reads the data if you iterate through the cursor.

So instead of:

[sourcecode]db.collection.findOne({_id: "myId"}, {_id: 1})[/sourcecode]

you should use:

[sourcecode]db.collection.find({_id: "myId"}, {_id: 1}).limit(1)[/sourcecode]

By making this change I saw a change in performance of 2 orders of magnitude:

Query performance after switching from findOne() to find(). Can't even see the response time of the find() call.

Query performance after switching from findOne() to find(). Can’t even see the response time of the find() call.

Same graph with the findOne series hidden. Less than 1ms response time.

Same graph with the findOne series hidden. Less than 1ms response time.

  • I measured the exact same problem and got a patch submitted to MongoMapper/Plucky to fix this as the default behavior for exists? and other methods:

    https://github.com/jnunemaker/plucky/pull/26

  • Has anyone every tried using using the findOne command and then only requesting the _id back, my only fear with the find command is that the cursor means extra waiting file descriptors.

    • Yes. Although it’s not shown in the example above the actual code that produced the graphs was db.collection.findOne({_id: “myId”}, {_id: 1})

      • David,
        You talk about using covered indexes but then your examples clearly don’t and could account for even more of the difference. In the comment here you identify the actual difference, but I read this first through an RSS reader which didn’t pull in the comments.

        Other than that, this is a good post and quite informative.

        • I’ve adjusted the code in the post so it includes the {_id: 1} projection since that is what actually produced the graphs, and yes otherwise covered indexes won’t be used!

  • Has anyone gotten this to work with the PHP MongoDB driver? It seems that that MongoCollection::find() always returns a MongoCursor object, regardless of whether any items were found, and that any method I use to test whether any items were found (hasNext(), count(true), etc), make it slower than using findOne. Am I missing something here?

    • This post was written based on the latest PHP MongoDB driver at the time of writing: 1.3.4. The find() just returning the cursor is quicker because it doesn’t have to return the document for you to discover if it exists or not. findOne() always returns a document so has the additional overhead. Even with the additional call to count() and checking if that is empty or not, it’s faster.

      • So you do an additional call to count( true ) after the mongo cursor is returned? Any chance you can share the code you used for benchmarking? My test, using, I think, the same configuration, didn’t show any improvement, but I wouldn’t be surprised if I was doing it wrong.

        • The graphs are measuring the actual call to find() vs findOne() rather than any additional calls afterwards. The code is:

          $hours = $hoursDB->selectCollection('hours');
          $exists = $hours->find(array('_id' => $_id), array('_id'))->limit(1);
          $exists = $exists->count();
          if (empty($exists))
          {

          ...

          Note this is querying on _id and only returning _id so can use a covered index.

          • Hmm… Well, it seems to me that the graphs are comparing apples and oranges. If you are suggesting that using find() is a good way to speed up querying whether a document exists, you should include all of the queries needed to find out if it exists. I’m not even sure that find() alone even goes to the db before you’ve called count() or hasNext() or something–it might just make the MongoCursor object locally. That would explain why the benchmark was so fast for find alone.

          • Even combining the find() with count() was faster than the findOne(). I’d need to instrument each call separately to find out exactly how much but the performance change on the entire execution time was significant.

          • So I did run a test comparing findOne() vs find() with count(), and I found that find() with count() was very slightly slower, but my test setup was pretty crude. I’d be interested in seeing how much better the performance is overall when you include the count() in your benchmarks.

          • I came to the same conclusion with this benchmark: https://gist.github.com/jmikola/5208918

            For covered-index queries, findOne() was a tiny bit faster, but both methods perform roughly the same. The projection is actually irrelevant for find()->limit(1)->count(true) case, because it’s going to issue a count command. A non-covered query using findOne() will always perform worse, and very much so in my benchmark where each document contains a 4MB string.

          • So the important thing is using a covered index or not, which applies generally anyway. But in the case where you just want to find whether a document exists it’s particularly important.

          • The MongoCursor is constructed locally, and the driver will not actually communicate with the server until you begin iterating, count, or explain. Along those lines, you’re allowed to apply sort, limit, skip and other such flags up until the point we hit the server and actually execute something.

  • Cang Do

    how Can i check document is exists with find command ? with findOne I need check is null. But With find I need hasNext or else ?

    Do you know that hasNext is also very heavy ?
    Please help

    • Nathaniel Witmer

      Looks like this was asked a while back but maybe it will help someone so I’ll still reply- if there are no matches to your query find() will just return an empty array. So you can simply check if the length of the array is greater than zero.

  • Is this faster even if there is a significant probability that the document will exist?

    • Not really sure that makes a difference if you still have to check.

  • shellbye

    I found that if you simply use find vs findOne, find is really fast, but just find alone did’t tell you if the document is exist. You have to use cursor.hasNext(), and then you do use it, it well slow down the find, and even slower than findOne. If you’re interesting my test is http://shellbye.com/blog/tech_world/mongodb_findOne_vs_find/ . You can read the code even you don’t know Chinese.

  • great piece. thanks!