How to Monitor Redis
CEO & Founder of Server Density.
Published on the 9th June, 2016.
Redis is a key-value database, and one of the most popular NoSQL databases out there. Redis (REmote DIctionary Server) works in a similar fashion to memcached, albeit with a non-volatile dataset.
The dataset is stored entirely in memory (one of the reasons Redis is so fast) and it is periodically flushed to disk so it remains persistent.
Redis also provides native support for manipulating and querying data structures such as lists, sets and hashes.
There are several well-known companies using Redis, including Twitter, GitHub, and Snapchat. While Redis is open source, there is good commercial support for it and some companies offer it as a fully managed service.
Typical use cases include:
- Leaderboards/Counting: Redis is effective at incrementing scores or presenting the hall of fame in games. Here at Server Density we use it to set security limits on our APIs endpoints.
- Queues: Redis is often used to build message/job queues either with the native RPOPLPUSH command or with a language specific library like RestMQ, PythonRQ, and RedisMQ.
- Session cache: Redis has a LRU (Least Recently Used) key eviction policy.
- Full page cache: PHP platforms such as Magento often use Redis in addition to an OpCode cache such as Zend OpCache.
We will now take a look at the most important metrics and alerts for monitoring Redis, using free tools, or Server Density.
Monitor Redis: metrics and alerts
Even in simple services like Redis server, there is no shortage of possible metrics you can monitor. The key to successful monitoring is to select those very few ones we care about; and care enough to let them pester us with alerts and notifications.
Our rule of thumb here at Server Density is, “collect all metrics that help with troubleshooting, alert only on those that require an action.”
Same as with any other database, you need to monitor some broad conditions:
- Required processes are running as expected
- System resources usage is within limits
- Queries are executed successfully
- Service is performing properly
- Typical failure points
Let’s take a look at each category and flesh them out with some specifics.
1. Redis process running
These alerts will let us know if something basic is not in place, like a daemon not running or respawning all the time.
|redis process||Right binary daemon process running.||When process /usr/sbin/redis count != 1.|
|uptime||We want to make sure the service is not respawning all the time.||When uptime < 300s.|
2. System Metrics
The metrics listed below are the “usual suspects” behind most issues and bottlenecks. They also correspond to the top system resources you should monitor on pretty much any in-memory DB server.
|Load||An all-in-one performance metric. A high load will lead to performance degradation.||When load is > factor x (number of cores). Our suggested factor is 4.|
|CPU usage||High CPU usage is not a bad thing as long as you don’t reach the limit.||None|
|Memory usage||RAM usage depends on how many keys and values we keep in memory. Redis should fit in memory with plenty of room to spare for the OS.||None|
|Swap usage||Swap is for emergencies only. Don’t swap. A bit of RAM is always in use, but if that grows, it’s an indicator for performance degradation.||When used swap is > 128MB.|
|Network bandwidth||Traffic is related to the number of connections and the size of those requests. Used for for troubleshooting but not for alerting.||None|
|Disk usage||Make sure you always have free space for new data, logs, temporary files, snapshot or backups.||When disk is > 85% usage.|
Hard disk I/O is the most common bottleneck in database servers. Thankfully, that is not the case for Redis, since all operations are performed in memory and only occasionally written asynchronously to permanent storage.
3. Monitoring Redis availability and queries
These metrics will inform you if Redis is working as expected.
|connected_clients||Number of clients connected to Redis. Typically your application nodes rather than final users.||When connected_clients < minimum number of application/consumers on your stack.|
|keyspace||Total number of keys in your database. Useful when compared to hit_rate in order to help troubleshoot any misses.||None|
|instantaneous_ops_per_sec||Number of commands processed per second.||None|
|hit rate (calculated)||keyspace_hits / (keyspace_hits + keyspace_misses)||None|
|rdb_last_save_time||Unix timestamp for last save to disk, when using persistence.||When rdb_last_save_time is > 3600 seconds (or your acceptable timeframe)|
|rdb_changes_since_last_save||Number of changes to the database since last dump. Data that you would lose upon restart.||None|
|connected_slaves||Number of slaves connected to this master instance||When connected_slaves != from the number of slaves in your cluster.|
|master_last_io_seconds_ago||Seconds since last interaction between slave and master||When master_last_io_seconds_ago is > 30 seconds (or your acceptable timeframe)|
4. Monitoring Redis performance
|latency||Average time it takes Redis to respond to a query.||When latency is > 200ms (or your max acceptable).|
|used_memory||Memory used by the Redis server. If it exceeds physical memory, system will start swapping causing severe performance degradation. You can configure a limit with Redis maxmemory configuration setting for cache scenarios (you don’t want to evict keys on database or queues scenarios!)||None|
|mem_fragmentation_ratio||Compares Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks). A high ratio will lead to swapping and performance degradation.||When mem_fragmentation_ratio is > 1.5|
|evicted_keys||Number of keys removed (evicted) due to reaching maxmemory. Too many evicted keys means that new requests need to wait for an empty space before being stored in memory. When that happens, latency will increase.||None, but when using TTL for expiring keys and you don’t expect evictions you could configure when evicted_keys is > 0.|
|blocked_clients||Number of clients waiting on a blocking call (BLPOP, BRPOP, BRPOPLPUSH).||None|
5. Monitoring Redis errors
|rejected_connections||Number of connections rejected due to hitting maxclient limit (remember to control max/used OS file descriptors)||When rejected_connections > XX, depending on the number of clients you might have.|
|keyspace_misses||Number of failed lookups of keys||Only when not using blocking calls (BRPOPLPUSH, BRPOP and BLPOP), when keyspace_misses > 0.|
|master_link_down_since_seconds||Time (in seconds) that the link between master and slave was down. When a new reconnect happens, the slave will send SYNC commands which will impact master performance.||When master_link_down_since_seconds is > 60 seconds.|
Redis Monitoring Tools
There are quite a few options out there. These are the ones we know of. Please chime in if we’ve missed something obvious here:
redis-cli info command
redis-cli comes with the INFO command, providing the most important information and statistics about the Redis server.
As the output is pretty long, it has been divided in several sections:
- server: General information about the Redis server
- clients: Client connections information
- memory: Memory usage information
- persistence: Persistence (RDB and AOF) related information
- stats: General statistics
- replication: Master/slave replication information
- cpu: CPU usage statistics
- commandstats: Redis commands statistics
- cluster: Redis Cluster information (if enabled)
- keyspace: Database (key expiration) related statistics
An optional parameter can be used to select a specific section of information:
$ redis-cli 127.0.0.1:6379> INFO clients # Clients connected_clients:2 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
As you can see, every line will either contain a section name (starting with a # character) or a property. The meaning of each property field is described in the Redis documentation in detail. In this article we will look at the most important ones to monitor.
redis-cli monitor command
This one displays every command processed by the Redis server. It is used either for spotting bugs in an application, or for generic troubleshooting. It helps us understand what is happening to the database. It seriously affects performance, however, so it’s not something you want to run all the time.
While redis-cli info is a great interactive / realtime tool, you still need to set up some alerts so that you get notified when things go wrong. You may also want to record various metrics over time in order to identify trends. What follows is a list of available options for doing that.
redis-stat is a simple Redis monitoring tool written in Ruby. It tracks Redis performance in a terminal output in a vmstat like format or as a web based dashboard. It is based on the INFO command, which means it shouldn’t impact the performance of the Redis server. redis-stat shows CPU and memory usage, commands, cache hits and misses, expires and evictions amongst other metrics.
Redmon is a simple Ruby dashboard based on the Sinatra framework. While its monitoring function is not as complete as redis-stats, Redmon comes with invaluable management features like server configuration and access to the CLI interface through the web UI.
Redis Live is a monitoring dashboard written in Python and Tornado. It comes with a number of useful widgets, memory and various commands. It also shows the top used Redis commands and keys. Redis Live uses a database backend to store metrics over time, sqlite being the default choice.
redis-faina is a query analyzer created by the folks at Instagram. It parses the MONITOR command for counter/timing stats, and provides aggregate stats on the most commonly-hit keys, the queries that took up the most amount of time, and the most common key prefixes as well. You can read more about redis-faina here.
The popular agent for metrics collection, collectd also has a couple of plugins for Redis: the upstream Redis Plugin, in C, and redis-collectd-plugin in Python. Both support multiple Redis instances/servers, but the Python version supports a few more metrics, especially replication lag per slave. collectd is just the agent part that can be connected to different monitoring systems.
We mentioned the Percona Monitoring templates before, when we talked about MySQL monitoring. These templates add default graphing and alerting configuration to existing on-premise monitoring solutions like Nagios, Cacti or Zabbix.
Nagios, Icinga and their likes also support Redis monitoring. The Nagios community share their creations on Monitoring Plugins, previously Nagios Exchange. You will find multiple plugins there, as different people write their own in Python, Ruby or Node.
If all that sounds too onerous and if you have other, more pressing, priorities then maybe you should leave server monitoring to the experts and carry on with your business.
This is where we shamelessly toot our own horn.
Server Density offers a user interface (we like to think it’s very intuitive) that supports tagging, elastic graphs and advanced infrastructure workflows. It plays well with your automation tools and offers mobile apps too.
So if you don’t have the time to setup and maintain your own on-premise monitoring and you are looking for a hosted and robust monitoring that covers Redis (and the rest of your infrastructure), you should sign up for a 2-week trial of Server Density.
Who monitors Redis with Server Density
One of our customers, Tooplay uses Redis for their ad serving and analytics measurements. They told us they almost never have to worry about response times because “Redis is blazing fast.”
They don’t use Redis persistence and their smallest Redis instance holds 10GB of data on memory. Old data is evicted upon expiration after 1 day (via TTL settings). Their Redis cluster is monitored with Server Density.
What about you? Do you have a checklist of best practices for monitoring Redis? What memory databases do you have in your stack and how do you monitor them? Any books you can suggest?
PS: You should check out Matthias Meyer’s piece on Redis use cases and code examples.