Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Blog   >   MongoDB   >   MongoDB Monitoring: db.serverStatus()

MongoDB Monitoring: db.serverStatus()

Tokyo traffic map

There are a number of built in tools and commands which can be used to get important information from MongoDB but because it is relatively new, it can be difficult to know what you need to be doing from an operational perspective to ensure that everything runs smoothly.

This is the 3rd in a series of 6 posts about MongoDB monitoring based on a talk I gave at the MongoSV 2010 conference. View the series index here.

MongoDB Monitoring dashboard + alerting

We provide a MongoDB monitoring service to keep an eye on the health and performance of your MongoDB database cluster automatically and with alerting. Find out more here.

db.serverStatus()

The server status command provides a lot of different statistics that can help you, like the map of traffic in central Tokyo (above).

> db.serverStatus()
{
	"version" : "1.7.3-pre-",
	"uptime" : 3250008,
	"uptimeEstimate" : 3235318,
	"localTime" : ISODate("2010-12-17T21:42:43.104Z"),
	"globalLock" : {
		"totalTime" : 3250006656275,
		"lockTime" : 362596659531,
		"ratio" : 0.11156797443196338,
		"currentQueue" : {
			"total" : 0,
			"readers" : 0,
			"writers" : 0
		},
		"activeClients" : {
			"total" : 2,
			"readers" : 1,
			"writers" : 1
		}
	},
	"mem" : {
		"bits" : 64,
		"resident" : 6895,
		"virtual" : 479302,
		"supported" : true,
		"mapped" : 450709
	},
	"connections" : {
		"current" : 2510,
		"available" : 5490
	},
	"extra_info" : {
		"note" : "fields vary by platform",
		"heap_usage_bytes" : 168490704,
		"page_faults" : 86613306
	},
	"indexCounters" : {
		"btree" : {
			"accesses" : 51390266,
			"hits" : 51330074,
			"misses" : 60192,
			"resets" : 0,
			"missRatio" : 0.001171272396216046
		}
	},
	"backgroundFlushing" : {
		"flushes" : 54164,
		"total_ms" : 46544657,
		"average_ms" : 859.3282807768998,
		"last_ms" : 262,
		"last_finished" : ISODate("2010-12-17T21:42:06.728Z")
	},
	"cursors" : {
		"totalOpen" : 3,
		"clientCursors_size" : 3,
		"timedOut" : 2488
	},
	"repl" : {
		"setName" : "set1",
		"ismaster" : true,
		"secondary" : false,
		"hosts" : [
			"rs1a:27018",
			"rs1d:27018",
			"rs1c:27018",
			"rs1b:27018"
		],
		"arbiters" : [
			"rs1arbiter:27018"
		]
	},
	"opcounters" : {
		"insert" : 171934553,
		"query" : 4774811,
		"update" : 76778015,
		"delete" : 10308,
		"getmore" : 157244849,
		"command" : 86606450
	},
	"asserts" : {
		"regular" : 0,
		"warning" : 11,
		"msg" : 0,
		"user" : 2311130,
		"rollovers" : 0
	},
	"writeBacksQueued" : true,
	"ok" : 1
}

Connections

Every connection to the database has an overhead. You want to reduce this number by using persistent connections through the drivers. The total number of available connections is determined by the file descriptors, which can be calculated based on this formula:

Total file descriptors needed = ( 1 + Total Nodes ( 3*4) ) * incoming connections

If you run out of available connections then you’ll have a problem, which will look like this in the logs.

Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname 
Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } 
MessagingPort say send() errno:9 Bad file descriptor (NONE) 
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception 
Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) 
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception 
Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) 
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception 
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname 
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018 
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname 
Fri Nov 19 17:24:34 [conn2343] reconnect rs2d:27018 failed 
Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE)

On Red Hat systems by default this limit is set at 1024 which can be too low. MongoDB has documentation about this but you can set your limit higher on Red Hat systems by editing the /etc/security/limits.conf file. You also need to set UsePAM to yes in /etc/ssh/sshd_config to have it take effect when you log in as your user.

Index counters

	"indexCounters" : {
		"btree" : {
			"accesses" : 51390266,
			"hits" : 51330074,
			"misses" : 60192,
			"resets" : 0,
			"missRatio" : 0.001171272396216046
		}
	},

The miss ratio is what you’re looking at here. If you’re seeing a lot of index misses then you need to look at your queries to see if they’re making optimal use of the indexes you’ve created. You should consider adding new indexes and seeing if your queries run faster as a result. You can use the explain syntax to see which indexes queries are hitting, and the total execution time so you can benchmark them before and after.

Background flushing

	"backgroundFlushing" : {
		"flushes" : 54164,
		"total_ms" : 46544657,
		"average_ms" : 859.3282807768998,
		"last_ms" : 262,
		"last_finished" : ISODate("2010-12-17T21:42:06.728Z")
	},

The server status output allows you to see the last time data was flushed to disk, and how long that took. This is useful to see if you’re causing high disk load but also so you can monitor how often data is being written. Remember that while it isn’t synced to disk, you could experience data loss in the event of a crash or power outage.

Alethiometer

Opcounters

The op counters – inserts, updates, deletes and queries – are fun to look at, especially if the numbers are high. But you have to be careful these are not just vanity metrics. There are some things you can use them for though.

If you have a high number of inserts and updates, i.e. writes, then you may want to look at your fsync time setting. By default this will flush to disk every 60 seconds but if you’re doing thousands of writes per second you might want to do this sooner for durability. Of course you can also ensure the write happens from within the driver by using the safe option and calling getLastError.

Queries can show whether you need to load off reads to your slaves, which can be done through the drivers, so that you’re spreading the load across your servers and only writing to the master.

Deletes can also cause concurrency problems if you’re doing a large number of them and the database keeps having to yield.

The others

We’ve missed a few sections here – globalLock and cursors are more useful to see in real time using mongostat and repl is covered by the rs.status() command. All of these are covered in different posts.

Stay tuned

This is the 3rd post in a series of 6 on MongoDB Monitoring. View the full post index and don’t forget to subscribe via RSS or Twitter to be notified of new posts and other cool stuff!