How to Monitor Memcached
Last Modified: 12th Jul 2016
Memcached is one of the most popular key-value databases out there. Its main use-case is to serve as the intermediary “cache” between front end (apps and APIs) and backend (external databases).
Unlike other key-value databases like Redis, Memcached is completely volatile, i.e. all data goes straight to memory. As a direct consequence of this, the entire database is emptied when you restart the Memcached service.
With that in mind, here is our very own checklist of best practices (#howto), including key Memcached metrics and alerts we monitor with Server Density.
#howto Monitor Memcached: Metrics and Alerts
Memcached classifies as one of the simpler databases, by far. Even so, there is no shortage of metrics you can monitor. A prerequisite for successful monitoring is to select those very few metrics you care about enough to let them pester you with alerts and notifications.
Our rule of thumb here at Server Density is, “collect all metrics that help with troubleshooting, alert only on those that require an action.”
Same as with any other database, you need to monitor some broad conditions:
- All required processes are running as expected
- Resource usage is within limits
- Queries are executed successfully
- Typical failure points
Let’s take a look at each of those categories and “flesh them out” with some specifics.
Memcached process running
|Memcached process||Is the right binary daemon process running?||When process /usr/bin/Memcached count != 1.|
The metrics listed below are the “usual suspects” behind most issues and bottlenecks. They also correspond to the top system resources you should monitor on pretty much any database server.
|Load||An all-in-one performance metric.||When load is > factor x (number of cores). Our suggested factor is 4.|
|CPU usage||A high CPU usage is not necessarily a bad thing (as long as you don’t reach the limit).||None|
|Memory usage||Memcached runs entirely on RAM memory. Make sure you book at least limit_maxbytes of RAM for it.||None|
|Swap usage||If your system resorts to disk swapping, you are losing out on the main benefit of Memcached. Don’t swap.||When used swap is > 128MB.|
|Network bandwidth||Memcached servers can potentially incur a high network usage. Keep an eye on this, especially if you notice any performance degradation. Also look out for dropped packet errors.||None|
|Disk usage||Make sure you have free space for system temporary files, snapshots and backups.||When disk is > 85% usage.|
Hard disk is the most common bottleneck in database servers. That is not the case for Memcached, of course, since everything is stored in RAM. Still, we need to keep memory usage under control. If RAM is depleted then the server resorts to disk caching, an operation that takes a serious performance toll on the entire server.
Monitoring Memcached availability and connections
These metrics will inform you if Memcached is working and accepting commands as expected.
|uptime||Seconds since the server was started. You can use this to detect respawns.||When uptime is < 180|
|curr_connections||Number of connected clients. If none (or too many) then something is wrong.||None|
|listen_disabled_num||Quite an elusive name, don’t you think? In any case, this metric counts the number of times:
i) the max number of connections is reachedand:ii) a new connection had to wait in the queue as a result.
Btw, accepting_conns is another metric with the same goal.
|When new queued connections per minute > 5
(to be adjusted depending your latency requirements).
|conn_yields||Number of times a client connection was throttled.
When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation.
|When yielded connections per minute > 5
(to be adjusted depending your latency requirements).
Memcached is a simple service with just a few failure points. Make sure you have enough allocated RAM (limit_maxbytes). Also keep an eye on network traffic and errors.
Monitoring Memcached queries
These metrics track whether the database does what it’s meant to do: cache keys and values (no surprises there).
|Commands||Number of executed commands (GET/SET)||None|
|Read / Writes||Bytes read and written in cache.||None|
Monitoring Memcached cache performance
Here are some key performance metrics to keep an eye on. Have we missed a metric? Let us know in the comments.
|Hit rate||This is a calculated metric: get_hits / cmd_get. It indicates how efficient your Memcached server is.||None|
|Evictions||An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated.
The item is selected with a pseudo-LRU mechanism.
A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again.
|Fill percent||This is a calculated metric: used bytes / limit_maxbytes. If you are getting close to 100% you will probably start experiencing evictions. Consider increasing cache size (although you can still get evictions as the slab is full, and there is free space in other slabs).||None|
|Command Flush||The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn’t take place in production, so check your debug scripts.||None|
Memcached Monitoring Tools
There are several contenders out there. Below are the most popular ones.
Connect to your Memcached server via telnet and run the stats command. What you get in response, is the main metrics:
$ telnet localhost 11211 Trying ::1... Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. stats STAT pid 1602 STAT uptime 3995057 [...]
This might suffice. For anything more advanced, however, our humble “poor man’s script” won’t cut it.
So let’s take a look at some tools and solutions the Memcached community has come up with.
memcache-top displays statistics about items usage, hit rates, connections, wait times, evicts, commands and read/writes in a top-like interface.
memcache-top v0.6 (default port: 11211, color: on, refresh: 3 seconds) INSTANCE USAGE HIT % CONN TIME EVICT/s GETS/s SETS/s READ/s WRITE/s 10.50.11.5:11211 88.9% 69.7% 1661 0.9ms 0.3 47 9 13.9K 9.8K 10.50.11.5:11212 88.8% 69.9% 2121 0.7ms 1.3 168 10 17.6K 68.9K 10.50.11.5:11213 88.9% 69.4% 1527 0.7ms 1.7 48 16 14.4K 13.6K 10.50.12.5:11211 89.4% 81.9% 1406 1.6ms 1.0 26 11 7800 4059 10.50.12.5:11212 89.5% 69.5% 2066 1.8ms 0.7 149 8 8892 153.8K 10.50.12.5:11213 89.4% 69.4% 1430 1.4ms 2.0 25 12 6564 6386 10.50.15.5:11211 89.5% 71.9% 2359 0.8ms 1.3 46 11 13.4K 18.5K 10.50.15.5:11212 89.5% 69.3% 1298 0.8ms 1.0 24 5 6976 9140 10.50.15.5:11213 89.4% 85.0% 1412 0.9ms 2.3 30 15 13.6K 26.4K 10.50.9.90:11211 88.1% 68.3% 1471 0.7ms 3.7 39 14 22.5K 16.0K 10.50.9.90:11212 64.4% 91.2% 2321 0.7ms 0.0 191 11 28.4K 16.5K 10.50.9.90:11213 61.0% 58.7% 1380 0.7ms 0.0 32 12 9707 21.1K AVERAGE: 84.7% 72.9% 1704 1.0ms 1.3 69 11 13.5K 30.3K TOTAL: 19.9GB/ 23.4GB 20.0K 11.7ms 15.3 826 132 162.6K 363.6K
Press ctrl-c to quit.
Not enough? Need more details? Here’s the good news: the monitoring hackers at Etsy developed mctop. It provides a top-like way to monitor calls and bandwidth usage for the most used Memcached keys. The not-so-good news is that it doesn’t look like they maintain it anymore.
Looking for something more graphical? You probably know PhpMyAdmin for MySQL and PhpPgAdmin for PostgreSQL. Well, someone decided to build PhpMemcachedAdmin for Memcached. What you get is a web interface featuring performance statistics and graphs for single servers and clusters alike.
Last but certainly not least, if you are looking for a hosted, robust, monitoring solution that covers your entire infrastructure (including Memcached), you should sign up for a 2-week trial of Server Density (yes, that’s us!)
Did this article pique your interest in Memcached? Nice, keep reading. We’ve found Memcached for dummies to be a great starting point. It provides a solid understanding of memory allocation, so we think it’s a must read. Scaling Memcached at Facebook is a good read for the more advanced users out there.
What about you? Do you have a checklist of best practices for monitoring Memcached? What databases do you have in place and how do you monitor them? Any books you can suggest?