How to Monitor Redis

By David Mytton,
CEO & Founder of Server Density.

Published on the 9th June, 2016.

Redis is a key-value database, and one of the most popular NoSQL databases out there. Redis (REmote DIctionary Server) works in a similar fashion to memcached, albeit with a non-volatile dataset.

The dataset is stored entirely in memory (one of the reasons Redis is so fast) and it is periodically flushed to disk so it remains persistent.

Redis also provides native support for manipulating and querying data structures such as lists, sets and hashes.

There are several well-known companies using Redis, including Twitter, GitHub, and Snapchat. While Redis is open source, there is good commercial support for it and some companies offer it as a fully managed service.

Typical use cases include:

  • Leaderboards/Counting: Redis is effective at incrementing scores or presenting the hall of fame in games. Here at Server Density we use it to set security limits on our APIs endpoints.
  • Queues: Redis is often used to build message/job queues either with the native RPOPLPUSH command or with a language specific library like RestMQ, PythonRQ, and RedisMQ.
  • Session cache: Redis has a LRU (Least Recently Used) key eviction policy.
  • Full page cache: PHP platforms such as Magento often use Redis in addition to an OpCode cache such as Zend OpCache.

We will now take a look at the most important metrics and alerts for monitoring Redis, using free tools, or Server Density.

Monitor Redis: metrics and alerts

Even in simple services like Redis server, there is no shortage of possible metrics you can monitor. The key to successful monitoring is to select those very few ones we care about; and care enough to let them pester us with alerts and notifications.

Our rule of thumb here at Server Density is, “collect all metrics that help with troubleshooting, alert only on those that require an action.”

Same as with any other database, you need to monitor some broad conditions:

  1. Required processes are running as expected
  2. System resources usage is within limits
  3. Queries are executed successfully
  4. Service is performing properly
  5. Typical failure points

Let’s take a look at each category and flesh them out with some specifics.

1. Redis process running

These alerts will let us know if something basic is not in place, like a daemon not running or respawning all the time.

Metric Comments Suggested Alert
redis process Right binary daemon process running. When process /usr/sbin/redis count != 1.
uptime We want to make sure the service is not respawning all the time. When uptime < 300s.

2. System Metrics

The metrics listed below are the “usual suspects” behind most issues and bottlenecks. They also correspond to the top system resources you should monitor on pretty much any in-memory DB server.

Metric Comments Suggested Alert
Load An all-in-one performance metric. A high load will lead to performance degradation. When load is > factor x (number of cores). Our suggested factor is 4.
CPU usage High CPU usage is not a bad thing as long as you don’t reach the limit. None
Memory usage RAM usage depends on how many keys and values we keep in memory. Redis should fit in memory with plenty of room to spare for the OS. None
Swap usage Swap is for emergencies only. Don’t swap. A bit of RAM is always in use, but if that grows, it’s an indicator for performance degradation. When used swap is > 128MB.
Network bandwidth Traffic is related to the number of connections and the size of those requests. Used for for troubleshooting but not for alerting. None
Disk usage Make sure you always have free space for new data, logs, temporary files, snapshot or backups. When disk is > 85% usage.

Hard disk I/O is the most common bottleneck in database servers. Thankfully, that is not the case for Redis, since all operations are performed in memory and only occasionally written asynchronously to permanent storage.

3. Monitoring Redis availability and queries

These metrics will inform you if Redis is working as expected.

Metric Comments Suggested Alert
connected_clients Number of clients connected to Redis. Typically your application nodes rather than final users. When connected_clients < minimum number of application/consumers on your stack.
keyspace Total number of keys in your database. Useful when compared to hit_rate in order to help troubleshoot  any misses. None
instantaneous_ops_per_sec Number of commands processed per second. None
hit rate (calculated) keyspace_hits / (keyspace_hits + keyspace_misses) None
rdb_last_save_time Unix timestamp for last save to disk, when using persistence. When rdb_last_save_time is > 3600 seconds (or your acceptable timeframe)
rdb_changes_since_last_save Number of changes to the database since last dump. Data that you would lose upon restart. None
connected_slaves Number of slaves connected to this master instance When connected_slaves != from the number of slaves in your cluster.
master_last_io_seconds_ago Seconds since last interaction between slave and master When master_last_io_seconds_ago is > 30 seconds (or your acceptable timeframe)

4. Monitoring Redis performance

Metric Comments Suggested Alert
latency Average time it takes Redis to respond to a query. When latency is > 200ms (or your max acceptable).
used_memory Memory used by the Redis server. If it exceeds physical memory, system will start swapping causing severe performance degradation. You can configure a limit with Redis maxmemory configuration setting for cache scenarios (you don’t want to evict keys on database or queues scenarios!) None
mem_fragmentation_ratio Compares Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks). A high ratio will lead to swapping and  performance degradation. When mem_fragmentation_ratio is > 1.5
evicted_keys Number of keys removed (evicted) due to reaching maxmemory. Too many evicted keys means that new requests need to wait for an empty space before being stored in memory. When that happens, latency will increase. None, but when using TTL for expiring keys and you don’t expect evictions you could configure when evicted_keys is > 0.
blocked_clients Number of clients waiting on a blocking call (BLPOP, BRPOP, BRPOPLPUSH). None

5. Monitoring Redis errors

Metric Comments Suggested Alert
rejected_connections Number of connections rejected due to hitting maxclient limit (remember to control max/used OS file descriptors) When rejected_connections > XX, depending on the number of clients you might have.
keyspace_misses Number of failed lookups of keys Only when not using blocking calls (BRPOPLPUSH, BRPOP and BLPOP), when keyspace_misses > 0.
master_link_down_since_seconds Time (in seconds) that the link between master and slave was down. When a new reconnect happens, the slave will send SYNC commands which will impact master performance. When master_link_down_since_seconds is > 60 seconds.

Redis Monitoring Tools

There are quite a few options out there. These are the ones we know of. Please chime in if we’ve missed something obvious here:

redis-cli info command

redis-cli comes with the INFO command, providing the most important information and statistics about the Redis server.

As the output is pretty long, it has been divided in several sections:

  • server: General information about the Redis server
  • clients: Client connections information
  • memory: Memory usage information
  • persistence: Persistence (RDB and AOF) related information
  • stats: General statistics
  • replication: Master/slave replication information
  • cpu: CPU usage statistics
  • commandstats: Redis commands statistics
  • cluster: Redis Cluster information (if enabled)
  • keyspace: Database (key expiration) related statistics

An optional parameter can be used to select a specific section of information:

$ redis-cli

127.0.0.1:6379> INFO clients

# Clients
connected_clients:2
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

As you can see, every line will either contain a section name (starting with a # character) or a property. The meaning of each property field is described in the Redis documentation in detail. In this article we will look at the most important ones to monitor.

redis-cli monitor command

This one displays every command processed by the Redis server. It is used either for spotting bugs in an application, or for generic troubleshooting. It helps us understand what is happening to the database. It seriously affects performance, however, so it’s not something you want to run all the time.

While redis-cli info is a great interactive / realtime tool, you still need to set up some alerts so that you get notified when things go wrong. You may also want to record various metrics over time in order to identify trends. What follows is a list of available options for doing that.

redis-stat

redis-stat is a simple Redis monitoring tool written in Ruby. It tracks Redis performance in a terminal output in a vmstat like format or as a web based dashboard. It is based on the INFO command, which means it shouldn’t impact the performance of the Redis server. redis-stat shows CPU and memory usage, commands, cache hits and misses, expires and evictions amongst other metrics.

Monitor-Redis-stat

redmon

Redmon is a simple Ruby dashboard based on the Sinatra framework. While its monitoring function is not as complete as redis-stats, Redmon comes with invaluable management features like server configuration and access to the CLI interface through the web UI.

Monitor-redis-Redmon

RedisLive

Redis Live is a monitoring dashboard written in Python and Tornado. It comes with a number of useful widgets, memory and various commands. It also shows the top used Redis commands and keys. Redis Live uses a database backend to store metrics over time, sqlite being the default choice.

monitor-redis-RedisLive

redis-faina

redis-faina is a query analyzer created by the folks at Instagram. It parses the MONITOR command for counter/timing stats, and provides aggregate stats on the most commonly-hit keys, the queries that took up the most amount of time, and the most common key prefixes as well. You can read more about redis-faina here.

collectd

The popular agent for metrics collection, collectd also has a couple of plugins for Redis: the upstream Redis Plugin, in C, and redis-collectd-plugin in Python. Both support multiple Redis instances/servers, but the Python version supports a few more metrics, especially replication lag per slave. collectd is just the agent part that can be connected to different monitoring systems.

Percona Monitoring Plugins

We mentioned the Percona Monitoring templates before, when we talked about MySQL monitoring. These templates add default graphing and alerting configuration to existing on-premise monitoring solutions like Nagios, Cacti or Zabbix.

Nagios

Nagios, Icinga and their likes also support Redis monitoring. The Nagios community share their creations on Monitoring Plugins, previously Nagios Exchange. You will find multiple plugins there, as different people write their own in Python, Ruby or Node.

Server Density

If all that sounds too onerous and if you have other, more pressing, priorities then maybe you should leave server monitoring to the experts and carry on with your business.

This is where we shamelessly toot our own horn.

Server Density offers a user interface (we like to think it’s very intuitive) that supports tagging, elastic graphs and advanced infrastructure workflows. It plays well with your automation tools and offers mobile apps too.
monitor-redis_dashboard

So if you don’t have the time to setup and maintain your own on-premise monitoring and you are looking for a hosted and robust monitoring that covers Redis (and the rest of your infrastructure), you should sign up for a 2-week trial of Server Density.

monitor-redis_alerts

Who monitors Redis with Server Density

One of our customers, Tooplay uses Redis for their ad serving and analytics measurements. They told us they almost never have to worry about response times because “Redis is blazing fast.”

They don’t use Redis persistence and their smallest Redis instance holds 10GB of data on memory. Old data is evicted upon expiration after 1 day (via TTL settings). Their Redis cluster is monitored with Server Density.

Further reading

What about you? Do you have a checklist of best practices for monitoring Redis? What memory databases do you have in your stack and how do you monitor them? Any books you can suggest?

PS: You should check out Matthias Meyer’s piece on Redis use cases and code examples.

Articles you care about. Delivered.

Help us speak your language. What is your primary tech stack?

Maybe another time