Redis Caching vs a Local In-Memory Cache: When You Actually Need the Network Hop

TL;DR - After you index the slow query, the next reflex is a cache. The question that matters is not “Redis or not” but where the cache lives.

Local in-process cache (Caffeine, a plain map, Guava): nanoseconds to read, no network, no serialization. Right for one instance, read-mostly data that fits in heap, where each node holding a slightly different copy is fine.

Redis / distributed cache: one source of truth shared across every instance, survives a deploy, holds more than fits in heap, and lets one node invalidate what the others cached. You pay a network round trip and serialization on every read.

The trap: a local cache plus horizontal scaling equals N divergent caches. Users see different data depending on which instance answered. This is the bug that sends teams to Redis, usually after it has already shipped.

Caching is the second move, not the first

When a page is slow, caching feels like the obvious fix: store the answer, skip the work next time. It is the right move often enough that the habit sticks. It is also the move people reach for before they have read the query plan, and a cache wrapped around a query that should have been indexed just hides the problem behind a hit rate, then falls over the first time the cache is cold or the key space is wider than you guessed.

So fix the query first. I wrote a whole post on the indexes that actually help. Once the underlying work is as cheap as it is going to get and the read volume still hurts, caching earns its place.

And then the question almost everyone gets wrong on the first try is not “should I use Redis.” It is where the cache lives. That single decision sets your latency floor, your failure modes, and whether the cache is correct once you run more than one copy of your app.

The two kinds of cache

There are two places a cache can sit, and they behave nothing alike.

A local in-process cache lives inside your application’s own memory. In Java that is Caffeine or a ConcurrentHashMap; in Go a map behind a mutex or a library like Ristretto; in Node an LRU object. A read is a pointer lookup in the heap your process already owns. No socket, no serialization, no other machine involved.

A distributed cache lives in a separate process you talk to over the network, and for most teams that process is Redis. Every instance of your app connects to the same Redis, so they all see the same cached values. A read leaves your process, crosses the network, comes back, and gets deserialized into an object on the way in.

That difference is the whole post. Everything below follows from “in my heap” versus “over the wire.”

The latency truth

A read from a local in-process cache is a memory access. It is measured in nanoseconds, and it never leaves the CPU’s reach.

A read from Redis is a network round trip plus serialization on both ends. On the same host or a warm link in the same availability zone that round trip is a few tenths of a millisecond. Across availability zones it climbs past a millisecond. None of that is slow by the standards of the database query you are avoiding. But it is thousands of times slower than the local lookup, and on a hot path that runs tens of thousands of times a second the difference stops being academic.

So if speed were the only axis, local wins every time and there would be no decision. Speed is not the only axis. Correctness across instances is, and that is where local loses.

What the benchmark actually shows

I did not want to argue this from intuition, so I measured it. One Spring PetClinic JAR, three cache backends switched by Spring profile and nothing else: NoOpCacheManager (no cache), CaffeineCacheManager (in-process, in-heap), and RedisCacheManager (out-of-process, over a Lettuce connection pool). Load driven by k6, run both locally (512MB, 1 cpu, localhost Postgres and Redis) and on AWS, where the app sat on one EC2 box, Redis in a container beside it, k6 on a separate box, and Postgres on a network-distant RDS instance. The AWS run is the one that matters, because that is the shape of production.

The first result is the one nobody expects: whether Redis helps at all depends on how far away your database is.

GET /vets warm cache-hit read p50: locally Redis at 0.86ms ties no-cache at 0.81ms while Caffeine leads at 0.41ms, but against a network-distant RDS the same query is 3.35ms with no cache and Redis at 1.34ms becomes a clear win

Take the smallest possible query, PetClinic’s 6-row GET /vets, measured as a warm cache hit. Locally, Redis at 0.86ms is no faster than hitting the database at 0.81ms: the hop to Redis costs about what the cheap query costs, so the cache buys nothing. Move the database a real network away and the same query takes 3.35ms, and now Redis at 1.34ms is a clear win. Nothing in the application changed. The distance to your database is what flips an external cache from pointless to worthwhile. Caffeine, an in-heap lookup with no network and no serialization, is fastest in both environments, and at peak it ran 1.5 to 2 times the requests per second of Redis.

When the cached work is genuinely expensive, the backend choice almost stops mattering, because anything beats recomputing. The stats endpoint runs a COUNT(DISTINCT ...) over a left join of pets and visits on a seeded dataset of 100k owners, 300k pets, and 2M visits: a small result that is expensive to compute.

stats endpoint throughput under AWS peak load on a log scale: no cache serves about 1.4 requests per second, Caffeine 8,298 and Redis 5,469, roughly four to six thousand times more

Under AWS peak load, no cache serves about 1.4 requests a second. Both caches serve thousands, Caffeine 8,298 and Redis 5,469, at a sub-5ms p50. That is roughly four to six thousand times the throughput. When the work you are avoiding is that expensive, having a cache is the decision. Which cache is a rounding error next to that.

The mirror image is the cheap-payload case, where Redis can be worse than no cache at all.

cache-warm hit read p50 by payload size on a log scale: Redis rises from 1.74ms at 20KB to 80.8ms at 100KB to 206ms at 400KB, staying above both no-cache and Caffeine until all three converge at 400KB

Caching a synthetic report locally and varying only the payload size, Redis loses to no cache at every size, and its cost grows with the payload: 1.74ms at 20KB, 80.8ms at 100KB, 206ms at 400KB. Deserializing and transferring the value costs more than rebuilding a cheap in-memory one. Caffeine still wins because it skips both. Only at 400KB do the three converge, when serializing the large JSON response becomes a floor they all pay.

The law that explains all of it

Every one of those results is the same equation:

cache payoff = the work you avoid minus the cost to reach the cache.

Cheap work plus a fast local database: Caffeine is marginal and Redis is net negative, because reaching the cache costs more than the work it saved.
Network-distant database: Redis flips to net positive even for a trivial query, because the work you avoid (a far-away round trip) finally outweighs the near round trip to Redis.
Expensive query: both caches win enormously and the backend is secondary. Pick Caffeine for raw latency on a single node, Redis when you need the cache shared across instances, surviving restarts, or larger than heap.

If you remember one thing, remember that the payoff is a subtraction, not a constant. A cache is not free, and on cheap work against a close database the access cost can be the larger term.

The war story: one Redis connection that ate everything

The benchmark almost shipped with a wrong conclusion. The first heavy-payload Redis run, 400KB values on default client settings, did not just run slow, it collapsed: about 20 requests per second, p99 around 60 seconds, with malformed-HTTP responses and a timeout cascade. It looked like Redis simply could not handle large values.

400KB-value throughput before and after enabling a connection pool: default Lettuce serves 20 requests per second, the pooled client serves 2,071, about a hundredfold from one config change

It could. The default Lettuce client multiplexes every command over a single connection, so one large value in flight head-of-line-blocks every other request behind it. The fix was configuration, not code: add commons-pool2 and turn on the Lettuce pool (spring.data.redis.lettuce.pool.*). Throughput went from 20 to 2,071 requests per second and the cliff disappeared.

The lesson outlives the benchmark. The out-of-the-box Spring plus Lettuce Redis cache will quietly fall over the day your cached values get large, and the symptom (timeouts, malformed responses) looks nothing like the cause (one shared connection). A connection pool is not a tuning nicety here. It is mandatory the moment you cache anything bigger than a small object.

What the benchmark does not prove

Honesty about the limits, because the numbers are easy to overquote:

It is single-node. The strongest argument for Redis, one warm shared cache across N instances versus N divergent local caches that drift and stampede, is the reasoning in the section below. It is not what these numbers measured. Take it as the architecture argument it is, not as a benchmarked result.
The heavy-query no-cache latency is contention, not a query time. Its p50 climbs to tens of seconds (33 seconds at AWS peak) because a small 2-vCPU RDS instance is buckling under 20 to 50 concurrent aggregations. A single run of that aggregation is about 2.7 seconds, and the sample was small. The direction is unambiguous, but do not quote “the query takes 33 seconds.”
Redis used JDK serialization, Spring’s default. Switching to JSON would change the serialization cost somewhat, mostly in the payload-heavy cases.

When a local cache is the right call

Reach for in-process caching when all of these hold:

You run a single instance, or you genuinely do not care that each instance holds its own copy.
The data is read far more than it is written. Reference data, config, feature flags, a lookup table, the kind of thing that changes a few times a day.
It fits in heap with room to spare, and you have set a size bound and an eviction policy so it cannot grow without limit and trigger a garbage-collection pause or an out-of-memory kill.
A little staleness is acceptable, because a short TTL is how you keep it current.

For that shape, a local cache is faster than Redis, has no extra moving part to run or pay for, and cannot take your app down by being unreachable. Use Caffeine with a maximumSize and an expireAfterWrite, and move on. Pulling in Redis here is adding a network dependency to make something slower. I have seen teams stand up a Redis cluster to cache a 200-row country table that never changes. That is a server to patch and pay for in exchange for negative performance.

The trap that sends everyone to Redis

Here is the failure that is almost a rite of passage.

You build with a local cache because it is simple and fast, and it works. Then traffic grows and you scale out horizontally, from one instance to three behind a load balancer. Nobody revisits the cache, because nobody thinks of the cache as stateful. It is “just a cache.”

Now you have three caches, one per instance, and they drift. A user updates their profile. The write hits the database and invalidates the cache on the instance that served the request. The other two instances never heard about it and keep serving the old profile from their own copies. The user refreshes, the load balancer sends them to a different instance, and their change has vanished. Refresh again, land on the first instance, and it is back. Same data, different answer, depending on which machine the load balancer picked.

This is maddening to debug because it is not reproducible on demand and it never shows up with one instance running, which is exactly how your staging environment is configured. I have watched a company I worked with chase this for days. The data was right in the database every time. The bug was that “the cache” was really three caches that disagreed, and the symptom only appeared under the load that justified the second and third instance in the first place.

The moment you have more than one instance and the cached data has to be consistent across them, a local cache is no longer a caching choice. It is a correctness bug waiting for traffic. That is the line where Redis stops being optional.

When Redis (a distributed cache) is the right call

A shared cache like Redis is the answer when:

State must be consistent across instances. Sessions, rate-limit counters, anything where the answer cannot depend on which node you hit. One Redis, one truth.
The cache must survive a deploy or a restart. A local cache dies with the process, so every rolling deploy starts cold and stampedes the database. Redis keeps its contents across your app restarts and warms instantly.
The dataset is bigger than you want in each app’s heap. Ten gigabytes of cached objects on one Redis is fine. Ten gigabytes inside every JVM is a fleet of garbage-collection problems.
One instance needs to invalidate what others cached. A write on node 1 can delete the key in Redis and every node sees the deletion on its next read. With local caches that cross-instance invalidation needs its own messaging layer, which is most of the way to reinventing Redis.
You need more than a key-value cache. Redis also gives you atomic counters for rate limiting, sorted sets for leaderboards and queues, pub/sub, and distributed locks. Once you need one of those, it is already in your stack.

The price is real and worth naming. Redis is another service to run, secure, and monitor, a network hop on every read, serialization cost, and a new failure mode: what does your app do when Redis is slow or down? If the answer is “every request blocks on a dead Redis,” you have turned a cache into a single point of failure. Cache reads should fall back to the source on a Redis miss or timeout, not hang.

The hybrid almost nobody sets up but should consider

These two are not mutually exclusive, and the strongest setup at scale uses both. A small local cache in front of Redis, often called a near cache or an L1/L2 cache, serves the hottest keys from heap in nanoseconds and falls through to Redis for everything else. Most reads never touch the network, and Redis stays the shared source of truth.

The catch is invalidation, which is just the original trap wearing a hat. The local L1 layer can go stale the same way, so you keep its TTL short, seconds not minutes, and lean on Redis pub/sub to tell every node to drop a key when it changes. Worth it on a genuinely hot path. Overkill for everything else, and a real source of bugs if you reach for it before you need it.

Invalidation is the hard part, wherever the cache lives

“There are only two hard things in computer science: cache invalidation and naming things.” The joke survives because the first one is true. Choosing where the cache lives is the easy decision. Keeping it correct is the work, and it bites in both designs.

A few rules that save real outages:

Always set a TTL. A bounded TTL is your backstop for every invalidation bug you forgot. Even data you invalidate explicitly should expire on its own, so a missed invalidation self-heals in minutes instead of serving wrong data until the next deploy.
Prefer deleting a key over updating it. On a write, delete the cached value and let the next read repopulate from the source. Trying to keep the cache in lockstep with the database on every write is a race you will lose under concurrency.
Plan for the stampede. When a hot key expires, every request that wanted it hits the database at once. A LIMIT 1 query that was cheap becomes a thundering herd. Stagger TTLs with a little jitter so a whole class of keys does not expire on the same second, and on the hottest keys let one request rebuild while the rest briefly serve the stale value.
Set an eviction policy. Redis with no maxmemory-policy will fill up and start refusing writes. For a pure cache, allkeys-lru is usually what you want, so the least recently used keys are dropped under pressure rather than the whole instance falling over.

How I actually decide

The checklist I run in my head:

Did I fix the query first? If not, the cache is a bandage. Read the plan, fix the index, then reconsider whether you even need the cache.
How many instances, now and in a year? One forever, and a local cache is the simpler, faster answer. More than one, and the data must agree across them, go straight to Redis and skip the trap.
Does it fit in heap and tolerate per-node staleness? Yes to both points back at local. No on either points at Redis.
Do I need counters, locks, pub/sub, or cross-instance invalidation? Any yes, and Redis is already justified beyond caching.
Is the database close or far? The benchmark was blunt about this: a cache in front of a cheap query against a close database can cost more than it saves. The further away the database, the more even a trivial cached query earns the hop to Redis.

Most of the time the honest answer for a single-service app is “local cache with a short TTL, and reach for Redis the day you scale out or need shared state.” The teams that get burned are the ones who picked local for the speed, scaled out for the traffic, and never connected the two decisions.

Summary

A cache is the second move, after you have made the underlying work cheap. The decision that matters is where it lives. A local in-process cache is faster and simpler with nothing extra to run, and it is correct right up until you run a second instance, at which point it quietly serves divergent data. A distributed cache like Redis costs you a network hop, serialization, and another service to keep alive, and buys you one shared source of truth across every instance, survival across deploys, room beyond heap, and real cross-node invalidation. Pick local when you are one instance and a little staleness is fine. Pick Redis when correctness across instances is on the line. And whichever you pick, set a TTL, because the cache being wrong is worse than the cache being slow.

Caching is one fix among several, and the wrong one if you have not fixed the query first. For where it fits in the bigger picture, see database optimization: find the bottleneck, fix the cheapest thing first.

Pages still slow after you have cached the obvious things? Caching badly is its own performance problem, and I untangle it as part of Performance Engineering. Book a free 30-minute call and we will find where the time is actually going.