Monitoring & Slowlog - Redis

When Redis acts up in production, you need fast ways to figure out what’s wrong without making it worse. Redis ships with four main introspection tools: INFO, MONITOR, SLOWLOG, and LATENCY. Each has a sweet spot — and some have nasty footguns.

INFO: the dashboard

INFO dumps server stats by section. Don’t run it without a section filter in scripts — the full output is large.

INFO server        # version, uptime, mode, OS
INFO clients       # connected clients, blocked clients
INFO memory        # used_memory, peak, fragmentation, evicted_keys
INFO stats         # ops/sec, total commands, keyspace hits/misses
INFO replication   # role, master link, lag
INFO commandstats  # per-command call count + latency
INFO keyspace      # key count + expires per db

Key fields to alert on:

Field	Why it matters
`used_memory_rss` vs `used_memory`	Fragmentation ratio. > 1.5 is concerning.
`evicted_keys`	If non-zero with non-cache workload — bad.
`instantaneous_ops_per_sec`	Throughput baseline.
`keyspace_misses` / `(hits + misses)`	Cache miss rate.
`connected_clients`	Connection leak detection.
`master_link_status`	Replica health.
`mem_fragmentation_ratio`	Memory health.

# Useful one-liner
redis-cli INFO stats | grep -E "ops_per_sec|hits|misses|evicted"

MONITOR: powerful and dangerous

MONITOR streams every command Redis executes. Great for debugging “what is this app actually sending?”. Terrible for sustained use.

MONITOR
# 1716700000.123456 [0 127.0.0.1:54321] "GET" "user:42"
# 1716700000.124001 [0 127.0.0.1:54322] "SET" "session:abc" "..."

The catch: MONITOR is expensive. Redis has to serialize every command for every subscriber. Throughput can drop by 50% or more while it’s active. Never leave it running, never wire it into normal monitoring. It’s a “during incident only, briefly” tool.

For production-grade observability use INFO commandstats (per-command rates and latencies, sampled).

SLOWLOG: catching slow commands

SLOWLOG records commands that exceed a configured execution time, without the throughput hit of MONITOR.

slowlog-log-slower-than 10000   # microseconds (10ms)
slowlog-max-len 128             # ring buffer size

SLOWLOG GET 10        # 10 most recent slow entries
SLOWLOG LEN
SLOWLOG RESET

Each entry has: id, timestamp, duration in microseconds, the command + args, client info. Set slowlog-log-slower-than to something like 10ms (10000) in production and review the log when latency spikes.

What ends up here in real life:

KEYS * on big keyspaces (please never)
HGETALL / SMEMBERS / LRANGE 0 -1 on big keys
Long Lua scripts
DEL on a multi-MB value (use UNLINK)

LATENCY: end-to-end measurements

SLOWLOG only times the command itself — not blocking inside the event loop. LATENCY measures latency spikes from the inside of Redis, including things like fork pauses, AOF fsync, slow system calls.

Enable it:

latency-monitor-threshold 100   # ms, 0 disables

LATENCY LATEST       # most recent spike per event
LATENCY HISTORY fork # recent spike history for "fork"
LATENCY DOCTOR       # human-readable analysis with suggestions
LATENCY RESET

Common events you’ll see:

fork — RDB or AOF rewrite forking. On huge datasets this can pause Redis for seconds.
aof-fsync-always — fsync-blocking writes.
expire-cycle — active expiration scanning.
command — a slow command (correlate with SLOWLOG).

Putting it together: a triage flow

Redis latency complaint — what to check

1. INFO

memory, clients, evictions, ops/sec, hit rate

2. SLOWLOG GET

find expensive commands

3. LATENCY DOCTOR

spot fork/fsync/expire pauses

4. --bigkeys/--hotkeys

find data-shape culprits

5. MONITOR (briefly!)

confirm app behavior, then stop

Other useful tools

CLIENT LIST              # all connected clients, idle time, last command
CLIENT KILL ID <id>      # kill a misbehaving client
DEBUG SLEEP 5            # simulate a slow command (dev only)
redis-cli --latency      # continuous latency sampling from outside
redis-cli --latency-history -i 1
redis-cli --stat         # ops/sec, memory, clients summary

redis-cli --latency is great for tracking down network problems vs server problems — it measures end-to-end ping from the client side.

Production setup

Prometheus + redis_exporter scraping INFO is the de facto standard.
Alert on: evictions, master_link_down, fragmentation > 1.5, slowlog growth rate, hit ratio drops, connection count climbs.
Set latency-monitor-threshold and slowlog-log-slower-than in your base config.
Have runbooks for the common incidents: big-key DEL, fork pause during BGSAVE, replica lag.

Quick recap

INFO for stats. Always filter by section. Wire into Prometheus.
MONITOR for live debugging. Costly. Use briefly.
SLOWLOG for commands above N microseconds. Free, sampled into a ring buffer.
LATENCY for Redis-internal pauses (fork, fsync). Configure threshold.
redis-cli --latency for outside-in measurements.