How to Monitor Memcached

howto-monitor-memcached

Last Modified: 12th Jul 2016

Memcached is one of the most popular key-value databases out there. Its main use-case is to serve as the intermediary “cache” between front end (apps and APIs) and backend (external databases).

Unlike other key-value databases like Redis, Memcached is completely volatile, i.e. all data goes straight to memory. As a direct consequence of this, the entire database is emptied when you restart the Memcached service.

With that in mind, here is our very own checklist of best practices (#howto), including key Memcached metrics and alerts we monitor with Server Density.

#howto Monitor Memcached: Metrics and Alerts

Memcached classifies as one of the simpler databases, by far. Even so, there is no shortage of metrics you can monitor. A prerequisite for successful monitoring is to select those very few metrics you care about enough to let them pester you with alerts and notifications.

Our rule of thumb here at Server Density is, “collect all metrics that help with troubleshooting, alert only on those that require an action.”

Same as with any other database, you need to monitor some broad conditions:

  1. All required processes are running as expected
  2. Resource usage is within limits
  3. Queries are executed successfully
  4. Typical failure points

Let’s take a look at each of those categories and “flesh them out” with some specifics.

Memcached process running

Metric Comments Suggested Alert
Memcached process Is the right binary daemon process running? When process /usr/bin/Memcached count != 1.

System Metrics

The metrics listed below are the “usual suspects” behind most issues and bottlenecks. They also correspond to the top system resources you should monitor on pretty much any database server.

Metric Comments Suggested Alert
Load An all-in-one performance metric. When load is > factor x (number of cores). Our suggested factor is 4.
CPU usage A high CPU usage is not necessarily a bad thing (as long as you don’t reach the limit). None
Memory usage Memcached runs entirely on RAM memory. Make sure you book at least limit_maxbytes of RAM for it. None
Swap usage If your system resorts to disk swapping, you are losing out on the main benefit of Memcached. Don’t swap. When used swap is > 128MB.
Network bandwidth Memcached servers can potentially incur a high network usage. Keep an eye on this, especially if you notice any performance degradation. Also look out for dropped packet errors. None
Disk usage Make sure you have free space for system temporary files, snapshots and backups. When disk is > 85% usage.

Hard disk is the most common bottleneck in database servers. That is not the case for Memcached, of course, since everything is stored in RAM. Still, we need to keep memory usage under control. If RAM is depleted then the server resorts to disk caching, an operation that takes a serious performance toll on the entire server.

Memcached Metrics

Monitoring Memcached availability and connections

These metrics will inform you if Memcached is working and accepting commands as expected.

Metric Comments Suggested Alert
uptime Seconds since the server was started. You can use this to detect respawns. When uptime is < 180
curr_connections Number of connected clients. If none (or too many) then something is wrong. None
listen_disabled_num Quite an elusive name, don’t you think? In any case, this metric counts the number of times:
i) the max number of connections is reachedand:ii) a new connection had to wait in the queue as a result.
Btw, accepting_conns is another metric with the same goal.
When new queued connections per minute > 5

(to be adjusted depending your latency requirements).

conn_yields Number of times a client connection was throttled.
When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation.
When yielded connections per minute > 5

(to be adjusted depending your latency requirements).

Typical Problems

Memcached is a simple service with just a few failure points. Make sure you have enough allocated RAM (limit_maxbytes). Also keep an eye on network traffic and errors.

Monitoring Memcached queries

These metrics track whether the database does what it’s meant to do: cache keys and values (no surprises there).

Metric Comments Suggested Alert
Commands Number of executed commands (GET/SET) None
Read / Writes Bytes read and written in cache. None

Monitoring Memcached cache performance

Here are some key performance metrics to keep an eye on. Have we missed a metric? Let us know in the comments.

Metric Comments Suggested Alert
Hit rate This is a calculated metric: get_hits / cmd_get. It indicates how efficient your Memcached server is. None
Evictions An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated.
The item is selected with a pseudo-LRU mechanism.
A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again.
None
Fill percent This is a calculated metric: used bytes / limit_maxbytes. If you are getting close to 100% you will probably start experiencing evictions. Consider increasing cache size (although you can still get evictions as the slab is full, and there is free space in other slabs). None
Command Flush The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn’t take place in production, so check your debug scripts. None

Monitoring Memcached cache performance

Memcached Monitoring Tools

There are several contenders out there. Below are the most popular ones.

Memcached stats

Connect to your Memcached server via telnet and run the stats command. What you get in response, is the main metrics:

$ telnet localhost 11211

Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
stats
STAT pid 1602
STAT uptime 3995057
[...]

This might suffice. For anything more advanced, however, our humble “poor man’s script” won’t cut it.

So let’s take a look at some tools and solutions the Memcached community has come up with.

memcache-top

memcache-top displays statistics about items usage, hit rates, connections, wait times, evicts, commands and read/writes in a top-like interface.

memcache-top v0.6       (default port: 11211, color: on, refresh: 3 seconds)

 

INSTANCE                USAGE   HIT %   CONN    TIME    EVICT/s GETS/s  SETS/s  READ/s  WRITE/s

10.50.11.5:11211        88.9%   69.7%   1661    0.9ms   0.3     47      9       13.9K   9.8K

10.50.11.5:11212        88.8%   69.9%   2121    0.7ms   1.3     168     10      17.6K   68.9K

10.50.11.5:11213        88.9%   69.4%   1527    0.7ms   1.7     48      16      14.4K   13.6K

10.50.12.5:11211        89.4%   81.9%   1406    1.6ms   1.0     26      11      7800    4059

10.50.12.5:11212        89.5%   69.5%   2066    1.8ms   0.7     149     8       8892    153.8K

10.50.12.5:11213        89.4%   69.4%   1430    1.4ms   2.0     25      12      6564    6386

10.50.15.5:11211        89.5%   71.9%   2359    0.8ms   1.3     46      11      13.4K   18.5K

10.50.15.5:11212        89.5%   69.3%   1298    0.8ms   1.0     24      5       6976    9140

10.50.15.5:11213        89.4%   85.0%   1412    0.9ms   2.3     30      15      13.6K   26.4K

10.50.9.90:11211        88.1%   68.3%   1471    0.7ms   3.7     39      14      22.5K   16.0K

10.50.9.90:11212        64.4%   91.2%   2321    0.7ms   0.0     191     11      28.4K   16.5K

10.50.9.90:11213        61.0%   58.7%   1380    0.7ms   0.0     32      12      9707    21.1K

 

AVERAGE:                84.7%   72.9%   1704    1.0ms   1.3     69      11      13.5K   30.3K

 

TOTAL:          19.9GB/ 23.4GB          20.0K   11.7ms  15.3    826     132     162.6K  363.6K

Press ctrl-c to quit.

Not enough? Need more details? Here’s the good news: the monitoring hackers at Etsy developed mctop. It provides a top-like way to monitor calls and bandwidth usage for the most used Memcached keys. The not-so-good news is that it doesn’t look like they maintain it anymore.

phpMemcachedadmin

Looking for something more graphical? You probably know PhpMyAdmin for MySQL and PhpPgAdmin for PostgreSQL. Well, someone decided to build PhpMemcachedAdmin for Memcached. What you get is a web interface featuring performance statistics and graphs for single servers and clusters alike.

phpMemcachedadmin

Last but certainly not least, if you are looking for a hosted, robust, monitoring solution that covers your entire infrastructure (including Memcached), you should sign up for a 2-week trial of Server Density (yes, that’s us!)

Further Reading

Did this article pique your interest in Memcached? Nice, keep reading. We’ve found Memcached for dummies to be a great starting point. It provides a solid understanding of memory allocation, so we think it’s a must read. Scaling Memcached at Facebook is a good read for the more advanced users out there.

Summary

What about you? Do you have a checklist of best practices for monitoring Memcached? What databases do you have in place and how do you monitor them? Any books you can suggest?

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time