Prevent Cache Stampedes with asyncio Events

4 minute read

Learn how a two-layer asyncio.Event and Redis lock strategy eliminates cache-miss stampedes, cutting thousands of redundant Redis calls at market open.

My miniature nano cow stampeding herd, heading straight for Redis at market open. Every. Single. Morning.

Wave 1 is done. The second I shipped it, I turned to the thing quietly living on my mental whiteboard: a cache-miss stampede in the MarketDataCache (MDC).

Quick disambiguation: I mentioned a stampeding herd in an earlier post — that was a message backpressure mechanism rendered useless problem. This is a different herd. Same name, different cattle.

The Scenario

The MDC is a Redis cache-aside layer between the platform’s analyzers and the Massive.com REST API. Check Redis first, call the API on a miss.

The incoming feed is per-second stock aggregates — OHLC, volume — for the entire market, peaking around 1,500 msg/s at close. At market open, the message rate jumps from roughly 30 msg/s to 800 msg/s within seconds. The cache isn’t just cold — it’s been reset, because opening prices invalidate the previous session’s data. Per-second aggregates only fire when a stock actually updates, so those 800 messages aren’t spread evenly across the market; they’re concentrated in the 100–200 high-volume stocks volatile enough to update every single second. Those are the tickers that hammer Redis hardest.

In the happy path, Massive.com responds in ~80ms. Fast enough that in most cases the cache is warm well before the next message arrives. The stampede is really a cold-start burst problem: multiple analyzers simultaneously requesting the same ticker, all within that 80ms window.

The ugly case is a timeout. The underlying RESTClient does retries with exponential backoff — a degraded API response doesn’t just cost 10 seconds, it can stack well past 30.

Why a Redis Lock Alone Isn’t Enough

The obvious fix is a distributed lock — one coroutine grabs it, fetches, the rest wait. But look at what await lock.acquire() actually does inside redis.asyncio:

# Simplified from redis/asyncio/lock.py  
while True:  
    if await self.do_acquire():  # SET NX  
        return True  
    await asyncio.sleep(self.sleep)  # polls every 100ms

Every waiting coroutine independently hammers Redis with SET NX every 100ms. In the happy path at ~80ms, that’s roughly one poll per waiter — annoying but not painful. In the timeout case, that’s 100 polls per waiter per 10-second attempt, multiplied by retry attempts, multiplied by N waiters across 150 hot tickers. The event loop stays healthy — each asyncio.sleep yields — but Redis is absorbing O(N) poll traffic for absolutely nothing.

The Fix: Two Layers

Layer 1: asyncio.Event collapses in-process contention to zero network traffic.
Layer 2: Redis lock handles cross-pod contention.

async def get_ticker_snapshot(self, ticker: str) -> TickerSnapshot:  
    cache_key = f"{MarketDataCacheKeys.TICKER_SNAPSHOTS.value}:{ticker}"

Waiting coroutines do a true await event.wait() — zero network, zero polling, event-loop-native. The loop wakes them via epoll/kqueue when the event fires. Not a timer. Whether the API responds in 80ms or grinds through retries for 30+ seconds, in-process waiters generate exactly zero Redis traffic while they wait.

The Redis lock in _fetch_snapshot_with_lock handles what asyncio.Event can’t — multiple pods competing across process boundaries:

async def _fetch_snapshot_with_lock(self, ticker, cache_key):  
    lock_key = f"{MarketDataCacheKeys.TICKER_SNAPSHOT_LOCK.value}:{ticker}"  
    lock = self.redis_client.lock(  
        lock_key,  
        timeout=MarketDataCacheTTL.TICKER_SNAPSHOT_LOCK.value,  # 30s  
    )  
    try:  
        await lock.acquire()  
        result = await self.read(cache_key=cache_key)  # double-check  
        if result:  
            return TickerSnapshot.from_dict(result)  
        start = time.monotonic()  
        snapshot = self.rest_client.get_snapshot_ticker(  
            market_type="stocks", ticker=ticker,  
        )  
        duration = time.monotonic() - start  
        self.snapshot_api_duration.record(duration)  # OpenTelemetry histogram  
        data = ticker_snapshot_to_dict(snapshot)  
        await self.write(data=data, cache_key=cache_key,  
                         cache_ttl=MarketDataCacheTTL.TICKER_SNAPSHOTS.value)  
        return snapshot  
    finally:  
        if await lock.locked():  
            await lock.release()

The double-check read after lock.acquire() handles the cross-pod version of the same problem: another pod may have already populated the cache while this one was waiting.

When the Leader Dies

The finally block is load-bearing. Three distinct failure modes:

Leader throws a non-retry exception (process still alive): event.set() fires immediately. In-process waiters wake up, find the cache empty, fall through, and one steps up as leader. Without the finally guarantee, they’d block until the Redis lock TTL expired.

Leader pod crashes entirely: The asyncio.Event dies with it — in-process waiters are already dead too. Cross-pod waiters are stuck on the Redis lock, and the 30-second TTL is their only backstop. It auto-expires, one pod grabs the lock, and we’re back in business.

Leader’s retries outlive the 30-second TTL: This is the interesting one. The lock auto-expires. A cross-pod waiter grabs it, checks the cache — miss, because the original thread hasn’t written yet — and fires another API call. The original thread eventually succeeds, tries to release a lock it no longer owns, and if await lock.locked() quietly saves us from the error. The duplicate API call already happened though.

30 seconds isn’t outrageous given the retry behavior, but it’s also not right-sized. That’s the whole point of the OpenTelemetry histogram — once I have real p99 data including retry scenarios, I can set a TTL that covers the realistic worst case without leaving cross-pod waiters in limbo longer than necessary.

Same Pattern, Three Methods

get_avg_volume and get_free_float use the identical two-layer pattern — their own _pending_* dicts, their own lock keys, their own histograms. Nothing exotic, just applied consistently.

The Scorecard

v0.2.27 ships with 425 passing tests — 57 of them in test_market_data_cache.py — 99% overall coverage, 100% on market_data_cache.py (254 statements, 48 branches). Flake8 clean. Full source on GitHub.

Share on

X Facebook LinkedIn Bluesky

Tom Pounders

Prevent Cache Stampedes with asyncio Events

The Scenario

Why a Redis Lock Alone Isn’t Enough

The Fix: Two Layers

When the Leader Dies

Same Pattern, Three Methods

The Scorecard

Share on

You May Also Enjoy

Why I’m Betting on a SaaS Rally in 2026.

I Knew State Management Was a Mess. I Shipped It Anyway.

The Stock Scanner Dashboard Finally Looks Like a Real Trading Tool

Who Is Bishop?