Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Blog   >   MongoDB   >   Checking if a document exists – MongoDB slow findOne vs find

Checking if a document exists – MongoDB slow findOne vs find

server-density-and-mongodb

One of the biggest optimisations to make MongoDB writes faster is to avoid moving documents by doing updates in place to a preallocated document. You only want to preallocate if the document doesn’t exist and to check this you need to do a query.

The findOne() method seems like the right choice for this because you can query the relevant index and check to see if a document or None is returned, ideally using a covered index.

However, it is significantly faster to use find() + limit() because findOne() will always read + return the document if it exists. find() just returns a cursor (or not) and only reads the data if you iterate through the cursor.

So instead of:

db.collection.findOne({_id: "myId"}, {_id: 1})

you should use:

db.collection.find({_id: "myId"}, {_id: 1}).limit(1)

By making this change I saw a change in performance of 2 orders of magnitude:

Query performance after switching from findOne() to find(). Can't even see the response time of the find() call.

Query performance after switching from findOne() to find(). Can’t even see the response time of the find() call.

Same graph with the findOne series hidden. Less than 1ms response time.

Same graph with the findOne series hidden. Less than 1ms response time.

  • http://twitter.com/wpeterson Winfield
  • http://twitter.com/xXstandstillXx Machika Kara Kuro

    Has anyone every tried using using the findOne command and then only requesting the _id back, my only fear with the find command is that the cursor means extra waiting file descriptors.

    • http://blog.serverdensity.com/ David Mytton

      Yes. Although it’s not shown in the example above the actual code that produced the graphs was db.collection.findOne({_id: “myId”}, {_id: 1})

      • http://twitter.com/spf13 Steve Francia

        David,
        You talk about using covered indexes but then your examples clearly don’t and could account for even more of the difference. In the comment here you identify the actual difference, but I read this first through an RSS reader which didn’t pull in the comments.

        Other than that, this is a good post and quite informative.

        • http://blog.serverdensity.com/ David Mytton

          I’ve adjusted the code in the post so it includes the {_id: 1} projection since that is what actually produced the graphs, and yes otherwise covered indexes won’t be used!

  • http://twitter.com/karptonite Daniel Karp

    Has anyone gotten this to work with the PHP MongoDB driver? It seems that that MongoCollection::find() always returns a MongoCursor object, regardless of whether any items were found, and that any method I use to test whether any items were found (hasNext(), count(true), etc), make it slower than using findOne. Am I missing something here?

    • http://blog.serverdensity.com/ David Mytton

      This post was written based on the latest PHP MongoDB driver at the time of writing: 1.3.4. The find() just returning the cursor is quicker because it doesn’t have to return the document for you to discover if it exists or not. findOne() always returns a document so has the additional overhead. Even with the additional call to count() and checking if that is empty or not, it’s faster.

      • http://twitter.com/karptonite Daniel Karp

        So you do an additional call to count( true ) after the mongo cursor is returned? Any chance you can share the code you used for benchmarking? My test, using, I think, the same configuration, didn’t show any improvement, but I wouldn’t be surprised if I was doing it wrong.

        • http://blog.serverdensity.com/ David Mytton

          The graphs are measuring the actual call to find() vs findOne() rather than any additional calls afterwards. The code is:

          $hours = $hoursDB->selectCollection('hours');
          $exists = $hours->find(array('_id' => $_id), array('_id'))->limit(1);
          $exists = $exists->count();
          if (empty($exists))
          {

          ...

          Note this is querying on _id and only returning _id so can use a covered index.

          • http://twitter.com/karptonite Daniel Karp

            Hmm… Well, it seems to me that the graphs are comparing apples and oranges. If you are suggesting that using find() is a good way to speed up querying whether a document exists, you should include all of the queries needed to find out if it exists. I’m not even sure that find() alone even goes to the db before you’ve called count() or hasNext() or something–it might just make the MongoCursor object locally. That would explain why the benchmark was so fast for find alone.

          • http://blog.serverdensity.com/ David Mytton

            Even combining the find() with count() was faster than the findOne(). I’d need to instrument each call separately to find out exactly how much but the performance change on the entire execution time was significant.

          • http://twitter.com/karptonite Daniel Karp

            So I did run a test comparing findOne() vs find() with count(), and I found that find() with count() was very slightly slower, but my test setup was pretty crude. I’d be interested in seeing how much better the performance is overall when you include the count() in your benchmarks.

          • http://twitter.com/jmikola Jeremy Mikola

            I came to the same conclusion with this benchmark: https://gist.github.com/jmikola/5208918

            For covered-index queries, findOne() was a tiny bit faster, but both methods perform roughly the same. The projection is actually irrelevant for find()->limit(1)->count(true) case, because it’s going to issue a count command. A non-covered query using findOne() will always perform worse, and very much so in my benchmark where each document contains a 4MB string.

          • http://blog.serverdensity.com/ David Mytton

            So the important thing is using a covered index or not, which applies generally anyway. But in the case where you just want to find whether a document exists it’s particularly important.

          • http://twitter.com/jmikola Jeremy Mikola

            The MongoCursor is constructed locally, and the driver will not actually communicate with the server until you begin iterating, count, or explain. Along those lines, you’re allowed to apply sort, limit, skip and other such flags up until the point we hit the server and actually execute something.

  • Cang Do

    how Can i check document is exists with find command ? with findOne I need check is null. But With find I need hasNext or else ?

    Do you know that hasNext is also very heavy ?
    Please help

  • http://craftycode.co.za/ Pieter Willem Jordaan

    Is this faster even if there is a significant probability that the document will exist?

    • http://blog.serverdensity.com/ David Mytton

      Not really sure that makes a difference if you still have to check.