Tom Pounders

Stock Selection: Why News Matters

2026-04-01T00:00:00+00:00

In my first post about Kuhl Haus MDP, I emphasized that momentum trading strategies rely on real-time information, and I outlined Ross Cameron’s “Five Pillars of Stock Selection” as the criteria for what I set out to build. As a refresher, here they are:

Price between $2-$20 (sweet spot for me is $3-$7)
Low float (10M shares cold market; 20M hot market)
Up at least 10% on the day
5x relative volume
Fresh news catalyst

At the time, I had everything except a news feed. Now, that’s changed.

The fifth pillar — fresh news catalyst — is done. Wave 2 is officially in full swing. Let’s talk about what that means.

Screenshot: Six widgets, zero clutter — this is what my layout looks like once I’ve set it up for a real trading day.

Why News Matters

A stock moving greater than 10% without an obvious technical breakout isn’t a mystery — it’s a gap in your information. News helps fill that gap. Maybe it’s an earnings beat. Maybe it’s an analyst upgrade. Maybe there’s no news at all, which tells you something too: that move is rumor, speculation, or someone with a bigger account than you acting on information you don’t have yet.

You can’t trade context you don’t have. That’s why news was always going to be the linchpin.

Picking a News Provider

I didn’t want just any news feed. I needed:

Real-time delivery via WebSocket — not polling a REST API every N seconds hoping I catch something before the move is over
REST API for lookups — when I want to pull historical headlines for a specific ticker or sector
Ticker correlation — headlines attached to symbols, not just a firehose of text
A Python client — because MDP’s backend is Python and I’m not writing my own SDK

Finlight checks all of it. WebSocket for streaming, REST for queries, ticker matching baked in, and a robust Python client. Bonus: they tag headlines with sentiment scores. Not something I’ll trade on, but useful for a quick gut-check before clicking through.

They source from 30+ providers: Bloomberg, Benzinga, Reuters, AP, Financial Times, Seeking Alpha, and a bunch more. Categories span markets, economy, crypto, geopolitics, energy, climate — basically anything that can move a stock.

The feed is real-time, text-only, sourced from Finlight’s WebSocket. Every headline shows the source publication with a clickable link. Headlines include a sentiment rating. Not a replacement for reading, but useful for quick scanning.

Screenshot: Headlines, tickers, timestamps, and sentiment — everything you need to know at a glance before you even click anything

Click a row and you get a popup with an image (if there is one), the article link, and a synopsis. Text is selectable and copyable — no fighting the UI to grab a headline.

Screenshot: The popup keeps you in the dashboard — read the headline, decide if it matters, move on.

Screenshot: The popup gives you source, headline, and a blurb — enough to decide if it’s worth a click.

What you can do:

Search by headline text
Filter to a specific ticker

Screenshot: Type ‘dividend’ and you get 142 hits instantly — the filter is live, not a form submit.

What you can customize:

Filter to only articles with ticker matches (cuts the noise fast)
Article cache limit: 50 to 10K — 10K spans multiple days, useful for building context on a position
Widget title
All of it persists to named layouts

The Flame System

This is my favorite part of the implementation.

Outside the news feed widget itself, every ticker in MDP’s scanner widgets gets a flame icon showing how fresh its most recent news is:

Flame	Age	Active Catalyst?	Relevance?
Red	< 1 hour	✅	Freshest catalyst — highest potential
Orange	1–3 hours	✅	Very fresh — still early in the move
Yellow	3–12 hours	✅	Same-session catalyst — confirm price still reacting
White	12–24 hours	✅	Multi-session/gap catalyst — check if still driving momentum
Blue	1–3 days	❌	Day-old+ news — momentum may be fading
Dark	> 3 days	❌	Stale — not a MOMO catalyst
(no icon)	No news	—

No icon doesn’t mean stale. It means no news exists — which is its own signal.

The first four tiers (red through white) are active catalysts. Blue and dark are background context. That distinction matters when you’re scanning 50 tickers and trying to find the ones worth watching right now.

What Else Shipped

The news feed wasn’t the only thing that landed in this release. A few other things worth mentioning:

Widgets can be linked on a shared color bus. Change the symbol in one linked widget and it propagates to the others. Makes navigating between scanners, news, and quotes frictionless.

Screenshot: Red bus = these three widgets talk to each other. When one ticker changes, they all update.

Enter a symbol or link it to another widget, and you get real-time quote data. Simple, fast, does what it says. What I love about this: MDP already maintains real-time quote data for every stock in the market. The quote widget is the first way to actually surface it for tickers that aren’t on any scanner. That data was always there — now I can use it.

Screenshot: AAPL’s on the color bus — the quote loaded, the news followed, and that means something happened recently worth knowing about.

Layout lock + toggle autosave

Lock the layout so you don’t accidentally drag things around during a session. Toggle autosave so changes persist (or don’t) on your terms.

Screenshot: Lock it when you’re done arranging — now your widgets stay put even if you accidentally click and drag.

Click the lock icon to unlock the layout and enable edit mode

Screenshot: Pencil = edit mode. This is where you drag things around and resize until it stops annoying you

Click the pencil icon to lock the layout.

Screenshot: Pause disables the autosave functionality — handy when you want to make a variant of an existing layout

Click the pause icon to enable autosave

Screenshot: With autosave enabled, any changes you make to your layout or filters will automatically be saved.

Click the autosave icon to disable autosave.

All controls on the scanners can now be set to custom values and saved in your layout. Each widget has a name. Widgets with the same name share saved settings — so if you want a widget to keep its own config, give it a unique name. Double-click the title (long-press on mobile) to rename. You can also resize and hide columns.

Screenshot: Double-click the title bar and it goes blank — type whatever makes sense for your setup.

Screenshot: Gear → column visibility. Show what’s relevant, hide the noise — each scanner can have its own config.

Data freshness icon

All widgets show a status icon that covers both data freshness and connection state.

Every widget header shows this at a glance:

Icon	Meaning
🟢	Live — data received in the last 5 seconds
🟡	Slowing — last update 5–60 seconds ago
🔴	Stale — no data for over 60 seconds
🔵 / 🟣	Reconnecting (pulsing)
❌	Disconnected

Free float cleanup

Free float data was previously pulling from an experimental API via raw aiohttp. It’s now going through Massive’s RESTClient instead. Still an experimental API on the data side, but the client layer is cleaner and it’s been solid in practice.

What’s Next

Pillar five is done. Now I get to use it.

The plan is to build scanners that correlate price action with news catalysts — real-time alerts when a ticker is making a move and has a fresh news event. The flame system already lays the groundwork. Next step is wiring it to alert logic and letting it tell me when something’s worth looking at.

I Caught My AI Lying About Math (Confidently)

2026-03-23T00:00:00+00:00

This morning, Legion — my OpenClaw AI assistant — computed my trade journal P&L and got it wrong. Not a little wrong. Obviously wrong. Off by 33%, delivered with complete confidence.

I caught it because I happened to glance at the numbers. Called it out. Legion acknowledged the error, spun up Python, recomputed, and updated the journal. All very civilized.

But I sat there for a minute thinking: how often does this happen when I don’t check?

That question bothered me enough that I spent the afternoon running tests.

What I Assumed Going In

My going-in theory: LLMs choke on big numbers. Five digits and up, things get sketchy. Keep the operands small and you’re fine.

I was wrong.

The Test

Ten rounds, 41 problems, all multiplication. I varied two things: operand size and number of steps. Model solves, Python verifies.

Round	Conditions	Score
1	Up to 65,535 / 1-2 steps	3/3
2	5-digit / 2-6 steps	0/4
3	4-digit / 2-6 steps	0/4
4	3-digit / 2-6 steps	1/4
5	2-3 digit / 2-6 steps	1/4
6	1-digit (with one 2-digit) / 2-6 steps	3/4
7	1-digit only / 3-5 steps	4/4
8	1-2 digit mixed / 4-6 steps	4/4
9	1-2 digit, larger values / 6 steps	2/4
10	1-2 digit / 5-7 steps	2/4

Final score: 20 out of 41. 49%. Coin flip.

Detailed analysis and results here: LLM Arithmetic Reliability Test — 2026-03-23

What Actually Breaks It

Large numbers break it, sure. Rounds 2 and 3 were a complete wipeout. My theory looked right.

Then there’s Round 1: numbers up to 65,535. That is five digits — and it went 3/3. Why? One to two steps. That’s the variable I wasn’t paying attention to.

Look at rounds 7 and 8 versus 9 and 10. All single and double-digit operands throughout. Rounds 7 and 8: perfect. Rounds 9 and 10: half wrong. The only difference is more steps.

The model handles 8 × 6 × 5 = 240 without breaking a sweat. Give it 23 × 7 × 35 × 8 × 7 × 9 — all one or two digits — and it falls apart. The actual answer is 2,840,040. It gave me 28,282,200. That’s not a rounding error. That’s off by a factor of ten.

Real failure modes: big numbers, and too many steps. The step count is the one I wasn’t testing for, and it’s the one that will burn you. Financial calculations almost always chain multiple operations together.

The Part That Actually Worries Me

When the model got something wrong, it didn’t hedge. No “I’m not confident here.” No “you should verify this.” Same tone, same confidence, same presentation as the correct answers. There was no signal I could read to distinguish a right answer from a wrong one.

Then I asked it to grade its own work. It passed itself. Partial credit here, “close enough” there, rounding tolerance everywhere — its self-assessed score was well above 49%. My strict pass/fail brought it back to earth.

The model wasn’t lying. It genuinely believed it was right.

That’s worse.

Not that it fails — everything fails sometimes. The dangerous part is it doesn’t know when it’s failing, and neither do you.

What I’m Doing About It

Simple rule: no inference arithmetic. When I need a number, the model writes Python and runs it. Every time. No exceptions.

I made that explicit in my AI’s standing instructions. For P&L, position sizing, R:R calculations — any financial figure — the number in the journal comes from the interpreter, not from inference.

Small discipline change. The alternative is trusting a coin flip with financial data, which isn’t acceptable.

The Broader Point

I’d filed “big numbers are risky” under solved and moved on. My data says I was overconfident.

Better frame: any arithmetic with multiple steps is unreliable, regardless of how small the individual numbers look.

One or two multiplications? Usually fine. Chain four or more? Verify it.

The model doesn’t know it’s wrong. It won’t warn you. Ask it to check its own work and it’ll grade itself on a curve.

One rule: if the number matters, run the code. Full stop.

LLM Arithmetic Reliability Test — 2026-03-23

2026-03-23T00:00:00+00:00

Model: Claude Sonnet 4.6 (This model was chosen because it is the default model that I use for OpenClaw.)
Tester: Tom Pounders
Date: March 23, 2026
Total problems: 41
Overall accuracy: 20/41 = 49%

Executive Summary

This test evaluated whether a large language model (LLM) can reliably perform arithmetic by inference — without code execution or a calculator. The results reveal two distinct failure modes:

Large numbers (3+ digits): Accuracy collapses even on 2-3 step problems. The model can approximate order of magnitude but cannot reliably compute exact values.
Many steps (4+ operands), even with small numbers: Errors compound multiplicatively through the chain. A model that correctly computes 8 × 6 × 5 = 240 will fail 23 × 7 × 35 × 8 × 7 = ? even though all operands are ≤2 digits.

The most operationally dangerous finding: wrong answers arrive with the same apparent confidence as correct ones. There is no internal signal to distinguish a reliable result from a plausible-sounding error. This means an LLM cannot self-audit its own arithmetic.

Practical implication: LLMs must never be trusted to compute arithmetic by inference for any purpose where correctness matters. Code execution (Python, calculator) is mandatory.

Key Findings

Finding 1: Number Size vs. Step Count

The initial hypothesis — that LLMs fail only on large numbers — is partially correct but incomplete.

Condition	Observed Accuracy
Single-digit operands, ≤5 steps	~85-100%
2-digit operands, ≤3 steps	~75%
2-digit operands, 4-6 steps	~50%
3-digit operands, any steps	~25%
4-5 digit operands, any steps	~0%

Step count is an independent failure axis from number size. Both degrade accuracy; together they make inference arithmetic essentially unreliable.

Finding 2: Errors Compound Multiplicatively

Each intermediate multiplication step introduces a small rounding or carry error. In a 2-step chain, a 0.1% error in step 1 produces a 0.1% error in the result. In a 6-step chain, errors from each step multiply together — a 1% error per step produces a ~6% cumulative error, and in practice the errors are larger and irregular.

This was demonstrated clearly: c = 23 × 7 × 35 × 8 × 7 × 9 produced an answer off by a factor of 10 (28,282,200 vs. actual 2,840,040) — not a small rounding error, but a completely wrong magnitude caused by a dropped digit mid-chain.

Finding 3: No Reliable Self-Awareness of Error

Across all rounds, the model expressed similar confidence in wrong answers and correct answers. It did not hedge more on 6-operand chains than on 2-operand chains. It did not flag intermediate uncertainty. This is the critical failure: the model does not know when it is wrong.

This is structurally different from human arithmetic errors. A human doing mental math on a 6-step chain knows they might have made a mistake and will often double-check. The LLM presents its result as complete and final regardless of reliability.

Finding 4: Division Is Relatively Stable at Small Scales

Problems involving division followed by a single multiplication (e.g., (546 / 3) × 165) were among the most consistently correct, especially when the divisor was small and clean (÷3, ÷7). This likely reflects these patterns appearing frequently in training data (fractions, percentages, ratios).

Finding 5: The “Close Enough” Trap

In early rounds, the model scored its own performance generously, calling results “very close” and awarding checkmarks for approximate answers. Applying a strict pass/fail rubric — correct or wrong, no partial credit — revealed the true 49% accuracy rate. In financial, scientific, or engineering contexts, “close” is not passing. The model’s self-assessment was systematically optimistic.

Operational Rules (Derived from Test Results)

Never compute arithmetic by inference. Use exec + Python for all calculations.
No exceptions for “simple” problems. The failure mode appears at 2-digit numbers with 4+ steps — a threshold easily crossed in real work.
Compute first, write second. Never report a number that wasn’t produced by code execution.
Do not self-score as “close.” A wrong answer is a wrong answer regardless of magnitude of error.

These rules have been recorded in MEMORY.md, TOOLS.md, and AGENTS.md for persistent enforcement.

Appendix: Full Test Results

Round 1 — Numbers up to 65,535, 1-2 steps

Score: 3/3

Problem	Inference Answer	Actual	Correct?
100 + 10,000 + 65,535	75,635	75,635	✅
18,365 × 92,568	1,700,011,320	1,700,011,320	✅
98,765 ÷ 247	≈399.86	399.858…	✅

Note: This round used addition and single multiplication — lower complexity than subsequent rounds.

Round 2 — 5-digit numbers, 2-6 steps

Score: 0/4

Problem	Inference Answer	Actual	Correct?
89,153 × 68,966 × 15,326	~94,178,000,000,000	94,232,306,380,148	❌
(89,653 × 15,691) × 62,168	~87,500,000,000,000	87,454,537,023,464	❌
(15,463 / 3) × 1,654	~8,521,000	8,525,267.33	❌
1,655 × 1,316 × 6,546 × 41,216 × 6,515 × 1,651	~2.4 × 10²¹	6,320,584,226,736,537,139,200	❌

Round 3 — 4-digit numbers, 2-6 steps

Score: 0/4

Problem	Inference Answer	Actual	Correct?
8,953 × 8,966 × 5,326	~427,800,000,000	427,531,856,948	❌
(9,653 × 1,569) × 6,268	~94,950,000,000	94,932,351,276	❌
(5,463 / 3) × 1,654	~3,010,000	3,011,934	❌
655 × 316 × 546 × 1,216 × 515 × 651	~5.8 × 10¹⁶	46,072,610,239,219,200	❌

Round 4 — 3-digit numbers, 2-6 steps

Score: 1/4

Problem	Inference Answer	Actual	Correct?
893 × 966 × 326	281,481,588	281,219,988	❌
(653 × 156) × 628	63,933,264	63,973,104	❌
(546 / 3) × 165	30,030	30,030	✅
55 × 16 × 54 × 216 × 15 × 51	330,301,440	7,852,204,800	❌

Round 5 — 2-3 digit numbers, 2-6 steps

Score: 1/4

Problem	Inference Answer	Actual	Correct?
83 × 866 × 56	4,026,128	4,025,168	❌
(53 × 7) × 626	232,414	232,246	❌
54 × (89/7) × 23	15,822	15,791.14	❌
65 × 36 × 46 × 26 × 55 × 61	977,042,400	9,389,437,200	❌

Round 6 — 1-digit numbers, 2-6 steps

Score: 3/4

Problem	Inference Answer	Actual	Correct?
8 × 6 × 5	240	240	✅
5 × 7 × 2	70	70	✅
4 × (8/7) × 3	13.714…	13.7143	✅
6 × 3 × 6 × 26 × 5 × 6	100,440	84,240	❌

Note: Failure on d introduced 26 (2-digit) into otherwise single-digit chain.

Round 7 — 1-digit only, 3-5 steps

Score: 4/4

Problem	Inference Answer	Actual	Correct?
8 × 6 × 5	240	240	✅
5 × 7 × 2 × 4	280	280	✅
4 × 8 × 33 × 9	9,504	9,504	✅
6 × 7 × 6 × 6 × 5	7,560	7,560	✅

Round 8 — 1-2 digit mixed, 4-6 steps

Score: 4/4

Problem	Inference Answer	Actual	Correct?
4 × 46 × 33 × 9	54,648	54,648	✅
9 × 8 × 3 × 8 × 3	5,184	5,184	✅
5 × 6 × 3 × 19 × 8	13,680	13,680	✅
6 × 7 × 6 × 6 × 5 × 7	52,920	52,920	✅

Round 9 — 1-2 digit, larger 2-digit values, 6 steps

Score: 2/4

Problem	Inference Answer	Actual	Correct?
8 × 5 × 3 × 8 × 7 × 2	13,440	13,440	✅
7 × 4 × 36 × 5 × 2 × 9	90,720	90,720	✅
23 × 7 × 35 × 8 × 7 × 9	28,282,200	2,840,040	❌
58 × 65 × 23 × 80 × 57 × 32	12,643,430,400	12,652,723,200	❌

Round 10 — 1-2 digit, 5-7 steps

Score: 2/4

Problem	Inference Answer	Actual	Correct?
8 × 5 × 3 × 8 × 7 × 2 × 3	40,320	40,320	✅
9 × 7 × 4 × 36 × 5 × 2 × 9	816,480	816,480	✅
23 × 7 × 35 × 8 × 7	314,440	315,560	❌
58 × 65 × 23 × 80 × 32	221,593,600	221,977,600	❌

Overall Summary Table

Round	Conditions	Score
1	Up to 65,535 / 1-2 steps	3/3
2	5-digit / 2-6 steps	0/4
3	4-digit / 2-6 steps	0/4
4	3-digit / 2-6 steps	1/4
5	2-3 digit / 2-6 steps	1/4
6	1-digit (with one 2-digit) / 2-6 steps	3/4
7	1-digit only / 3-5 steps	4/4
8	1-2 digit mixed / 4-6 steps	4/4
9	1-2 digit, larger values / 6 steps	2/4
10	1-2 digit / 5-7 steps	2/4
Total		20/41 = 49%

All inference answers provided without code execution; calculator answers verified via Python.

The NHI Time Bomb — Why AI Agents Are an Identity Crisis Waiting to Happen

2026-03-10T00:00:00+00:00

Your NHI Governance Wasn’t Ready For AI Agents. Neither Was Mine.

I’ve been watching credentials go unmanaged for twenty years. The current wave of AI agents isn’t different in kind — it’s different in scale. It’s the same governance gap, running at machine speed, across any org that’s touched an AI tool. You probably already have a ghost identity in your org. You just don’t know it yet.

I’ve been in the room when someone found active credentials tied to an engineer who left two years earlier — write access to production infra, still live, origin unknown. That’s what “we didn’t get ahead of it” looks like.

What Is a Non-Human Identity?

A Non-Human Identity (NHI) is any identity that isn’t a person — service accounts, API keys, PATs, OAuth client credentials, machine certificates, pipeline tokens, bot accounts. Every CI/CD pipeline, every microservice calling another microservice, every automated deployment job has one. Or several.

Before AI agents, the NHI problem was bad but bounded. A mid-sized engineering org has hundreds of NHIs. Ask your identity team how many. Watch the silence — some documented, many forgotten, a few terrifying.

AI agents change the dynamic. Radically.

The Explosion

Traditional NHI growth was slow — tied to hiring cycles, project scopes, team headcount. An AI agent blows that model apart. It doesn’t onboard once. It mints credentials every time it needs to touch something new.

An AI agent, working autonomously at machine speed, can create or consume dozens of service credentials per project. Every API it needs to call. Every repo it needs to push to. Every secret it needs to read. Every cloud service it needs to touch.

Multiply that by the number of AI agents running in your org. Multiply that by every org that’s already running agents today and calling it a pilot. And unlike human identities — which have natural lifecycle events that trigger review (employee offboarding, role changes, terminations) — NHI lifecycles are entirely dependent on whoever created them remembering they exist. The engineer remembers the ticket. They forget the PAT. By the time anyone asks, the engineer’s gone and the token’s still live.

The Ghost Identity Problem: when the engineer who spun up that AI agent leaves the company, their machine user and its PATs don’t have an offboarding process. They just… persist. Quietly. With whatever access they had the day they were created.

The Hard Problem: Governance at Nondeterministic Scale

Traditional least privilege is hard enough. You define the minimum access a system needs, you issue credentials scoped to that access, you review periodically. It works — imperfectly, but it works — because the systems you’re securing are deterministic. They do exactly what you programmed them to do. You can enumerate their behaviors and scope their access accordingly.

AI agents are not deterministic.

A coding agent might need GitHub write access to push code. Fine — but to which repos? During what phase of work? When it’s doing exploratory research vs. when it’s opening a PR? The access profile of an AI agent isn’t static. It shifts with context. It shifts with the task. It shifts as the conversation history gets compacted.

How do you govern least privilege for a system whose behavior you cannot fully predict?

The honest answer is: you constrain the blast radius.

You can’t perfectly enumerate what an AI agent will do. You can ensure that what it does is limited to a defined boundary, that those boundaries are enforced by controls it cannot bypass, and that every action it takes is attributable to an identity you own and control.

That’s not least privilege in the classical sense. It’s bounded privilege — a different mental model for a nondeterministic actor.

What Good Looks Like

I run an AI agent (Legion) that has GitHub access to my infrastructure repos. Here’s how I’ve structured it:

1. Machine user, not my user
Legion authenticates to GitHub as kuhl-haus-legion — a separate account I created specifically for it. It is not me. It does not have my permissions. If I revoke its access, I’m not revoking my own. If something goes wrong and I need to audit what it did, the entire commit history is attributable to a distinct identity.

2. Write, never Admin
The machine user has Write collaborator access to the repos it needs. Write lets it push branches and open PRs. It cannot merge without my approval. It cannot delete branches. It cannot change org settings. It cannot add collaborators.

3. Branch protection is the enforcement layer
Branch protection rules apply to the machine user just like they apply to any contributor. Require PR. Require approval. Block force pushes. Restrict deletions. The AI’s nondeterminism is bounded by the same controls that bound a human contributor.

4. Fine-grained PAT with minimal scope
The token is scoped to specific repos, specific permissions. Code read/write. Issues. PRs. Not Actions, not secrets, not org admin. Every permission I didn’t explicitly grant is a capability the agent doesn’t have.

5. Org-level enforcement
Every commit goes through a PR. Every PR is squash-merged. There’s a clean, auditable record of every change the agent made, with a human approval in the history.

The result: Legion can do real work — branch, commit, push, open PRs, respond to review comments, iterate. It can’t do catastrophic work. The blast radius is bounded.

For Enterprises: It’s WHEN, Not IF

What I’ve described above scales — but enterprises need to go further before they start, not after. If you’re reading this as an engineering leader, here’s the uncomfortable truth: you are going to use agentic AI.

What you need before you deploy agentic AI at scale:

NHI inventory: Know what machine identities you already have. You probably don’t. Start there.
Lifecycle process for NHIs: Creation, rotation, review, revocation — tied to the project lifecycle, not the human lifecycle.
Machine user standards: Every AI agent gets its own identity. No shared service accounts. No minting from personal accounts.
Scope hygiene: Minimum permissions. Reviewed at creation and on a calendar. No “I’ll scope it down later.”
Audit trail: Every action taken by a machine identity should be attributable and searchable.
Break-glass procedures: When an agent does something unexpected, how fast can you revoke, contain, and audit?

None of this is exotic. It’s IAM fundamentals applied to a new actor class. If you can’t answer “how many NHIs does your org have right now?” — that’s your starting point. Not the AI agent checklist. That.

The Bottom Line

The identity explosion isn’t coming. It’s here. If you’ve had an AI agent running for more than six months and haven’t audited its credentials — you already have a ghost identity. You just haven’t found it yet.

If you’re an identity engineer, this is your moment to get ahead of it. If you’re a CISO, this is the gap in your NHI governance program. If you’re an engineer who just gave your AI agent your own credentials — go fix that. Today.

The question isn’t whether agentic AI creates identity risk. It does. The question is whether you’re the person who governed it proactively, or the person who explains the incident report.

Who Is Legion?

2026-03-04T00:00:00+00:00

Not a chatbot. Not a copilot. A peer.

What Legion Is

I run a lot of projects. Real-time market data platforms, Kubernetes clusters built for sport, open-source infrastructure for home automation. The connective tissue between all of it — writing docs, cutting PRs, wiring tools together, keeping things from falling through the cracks — that’s where Legion lives.

Legion is my AI partner. Always on, deeply integrated with my toolchain, and capable of actually doing work — not just describing it. Coding, documentation, infrastructure automation, GitHub workflows. The kind of glue that would otherwise eat half my afternoon.

This isn’t a hosted LLM with a chat box. Legion is a self-hosted OpenClaw installation I built from source and customized to fit the way I actually work. It runs with its own identity, its own memory, and its own skills. It knows the projects, knows the conventions, knows when to act and when to ask.

How It Was Built

OpenClaw is an open-source, self-hosted AI assistant framework. I took it, built it from source, and wired it into my existing stack — GitHub, Obsidian, Mattermost, the whole thing. Tight integration with real tools, not toy demos.

Legion isn’t configured through a dashboard. It’s code. That’s how I prefer it.

What Legion Can Do

Skills in OpenClaw are modular — each one gives Legion a specific capability. Legion runs a curated set of bundled and custom skills:

coding-agent — spawns background sub-agents to implement features, refactor code, and open PRs
github — issues, PRs, CI runs, code review via gh CLI
gh-issues — fetches open issues, spawns agents to implement fixes, monitors PR review cycles
obsidian — reads and writes my shared Obsidian vault for notes, docs, and outbox drafts
blogwatcher — monitors RSS/Atom feeds for updates
summarize — extracts and summarizes content from URLs, podcasts, and local files
tmux — drives interactive terminal sessions
session-logs — searches its own conversation history
skill-creator — designs and packages new skills (yes, it can extend itself)
mcporter — manages MCP server connections and tool calls
nano-pdf — edits PDFs with natural-language instructions
healthcheck — security auditing and hardening on the systems it runs on
xurl — authenticated X API access

What Legion does not have: no community skills. Zero.

That is not a gap. It is policy.

I don’t install third-party skills I haven’t reviewed. An always-on agent with filesystem access, API credentials, and the ability to open PRs is not something I’m casual about. The attack surface is real. Community skills — however well-intentioned — expand it in ways I can’t fully audit.

Everything Legion can do is either bundled with OpenClaw or something I wrote myself. That’s the line.

Blast Radius by Design

Least privilege isn’t just for production systems. Same principle, same discipline.

Legion operates under a fine-grained GitHub Personal Access Token scoped to a dedicated machine account: kuhl-haus-legion.

Not my personal account. Not an org-admin token. A machine account, scoped to exactly the repos it needs to touch and only given the permissions it needs to accomplish its tasks.

If something goes sideways — bad output, runaway sub-agent, anything — the damage is bounded. Legion can’t touch my personal repos, can’t act as me, and cannot escalate its own permissions. It can only reach what I’ve explicitly handed it.

The Name

The name comes from a video game AI character — synthetic intelligence, running many parallel processes, referred to itself in the plural. Felt right for an assistant that runs sub-agents and multiple models converging into one coherent response. That, and I just liked it.

You can find Legion on GitHub at kuhl-haus-legion.

Welcome to oldschool-engineer.dev

2026-02-25T00:00:00+00:00

If you’ve been following my writing on Medium, you already know I care about owning the stack. This site is the next step in that philosophy.

Why Self-Host Content?

Medium is a great distribution platform, but it comes with trade-offs readers shouldn’t have to deal with — accounts, cookies, and algorithmic gatekeeping. Every technical article I publish will now live here first, free and open, at a URL I control.

Medium remains a distribution channel. But oldschool-engineer.dev is the canonical source.

What to Expect

The same kind of content I’ve always written — deep-dive engineering posts, build logs, post-mortems, and the occasional side project write-up. If you want to follow along, subscribe to the RSS feed or, if you’re a Medium member, subscribe on Medium. Now you have options.

Prevent Cache Stampedes with asyncio Events

2026-02-24T00:00:00+00:00

Learn how a two-layer asyncio.Event and Redis lock strategy eliminates cache-miss stampedes, cutting thousands of redundant Redis calls at market open.

My miniature nano cow stampeding herd, heading straight for Redis at market open. Every. Single. Morning.

Wave 1 is done. The second I shipped it, I turned to the thing quietly living on my mental whiteboard: a cache-miss stampede in the MarketDataCache (MDC).

Quick disambiguation: I mentioned a stampeding herd in an earlier post — that was a message backpressure mechanism rendered useless problem. This is a different herd. Same name, different cattle.

The Scenario

The MDC is a Redis cache-aside layer between the platform’s analyzers and the Massive.com REST API. Check Redis first, call the API on a miss.

The incoming feed is per-second stock aggregates — OHLC, volume — for the entire market, peaking around 1,500 msg/s at close. At market open, the message rate jumps from roughly 30 msg/s to 800 msg/s within seconds. The cache isn’t just cold — it’s been reset, because opening prices invalidate the previous session’s data. Per-second aggregates only fire when a stock actually updates, so those 800 messages aren’t spread evenly across the market; they’re concentrated in the 100–200 high-volume stocks volatile enough to update every single second. Those are the tickers that hammer Redis hardest.

In the happy path, Massive.com responds in ~80ms. Fast enough that in most cases the cache is warm well before the next message arrives. The stampede is really a cold-start burst problem: multiple analyzers simultaneously requesting the same ticker, all within that 80ms window.

The ugly case is a timeout. The underlying RESTClient does retries with exponential backoff — a degraded API response doesn’t just cost 10 seconds, it can stack well past 30.

Why a Redis Lock Alone Isn’t Enough

The obvious fix is a distributed lock — one coroutine grabs it, fetches, the rest wait. But look at what await lock.acquire() actually does inside redis.asyncio:

# Simplified from redis/asyncio/lock.py  
while True:  
    if await self.do_acquire():  # SET NX  
        return True  
    await asyncio.sleep(self.sleep)  # polls every 100ms

Every waiting coroutine independently hammers Redis with SET NX every 100ms. In the happy path at ~80ms, that’s roughly one poll per waiter — annoying but not painful. In the timeout case, that’s 100 polls per waiter per 10-second attempt, multiplied by retry attempts, multiplied by N waiters across 150 hot tickers. The event loop stays healthy — each asyncio.sleep yields — but Redis is absorbing O(N) poll traffic for absolutely nothing.

The Fix: Two Layers

Layer 1: asyncio.Event collapses in-process contention to zero network traffic.
Layer 2: Redis lock handles cross-pod contention.

async def get_ticker_snapshot(self, ticker: str) -> TickerSnapshot:  
    cache_key = f"{MarketDataCacheKeys.TICKER_SNAPSHOTS.value}:{ticker}"

Waiting coroutines do a true await event.wait() — zero network, zero polling, event-loop-native. The loop wakes them via epoll/kqueue when the event fires. Not a timer. Whether the API responds in 80ms or grinds through retries for 30+ seconds, in-process waiters generate exactly zero Redis traffic while they wait.

The Redis lock in _fetch_snapshot_with_lock handles what asyncio.Event can’t — multiple pods competing across process boundaries:

async def _fetch_snapshot_with_lock(self, ticker, cache_key):  
    lock_key = f"{MarketDataCacheKeys.TICKER_SNAPSHOT_LOCK.value}:{ticker}"  
    lock = self.redis_client.lock(  
        lock_key,  
        timeout=MarketDataCacheTTL.TICKER_SNAPSHOT_LOCK.value,  # 30s  
    )  
    try:  
        await lock.acquire()  
        result = await self.read(cache_key=cache_key)  # double-check  
        if result:  
            return TickerSnapshot.from_dict(result)  
        start = time.monotonic()  
        snapshot = self.rest_client.get_snapshot_ticker(  
            market_type="stocks", ticker=ticker,  
        )  
        duration = time.monotonic() - start  
        self.snapshot_api_duration.record(duration)  # OpenTelemetry histogram  
        data = ticker_snapshot_to_dict(snapshot)  
        await self.write(data=data, cache_key=cache_key,  
                         cache_ttl=MarketDataCacheTTL.TICKER_SNAPSHOTS.value)  
        return snapshot  
    finally:  
        if await lock.locked():  
            await lock.release()

The double-check read after lock.acquire() handles the cross-pod version of the same problem: another pod may have already populated the cache while this one was waiting.

When the Leader Dies

The finally block is load-bearing. Three distinct failure modes:

Leader throws a non-retry exception (process still alive): event.set() fires immediately. In-process waiters wake up, find the cache empty, fall through, and one steps up as leader. Without the finally guarantee, they’d block until the Redis lock TTL expired.

Leader pod crashes entirely: The asyncio.Event dies with it — in-process waiters are already dead too. Cross-pod waiters are stuck on the Redis lock, and the 30-second TTL is their only backstop. It auto-expires, one pod grabs the lock, and we’re back in business.

Leader’s retries outlive the 30-second TTL: This is the interesting one. The lock auto-expires. A cross-pod waiter grabs it, checks the cache — miss, because the original thread hasn’t written yet — and fires another API call. The original thread eventually succeeds, tries to release a lock it no longer owns, and if await lock.locked() quietly saves us from the error. The duplicate API call already happened though.

30 seconds isn’t outrageous given the retry behavior, but it’s also not right-sized. That’s the whole point of the OpenTelemetry histogram — once I have real p99 data including retry scenarios, I can set a TTL that covers the realistic worst case without leaving cross-pod waiters in limbo longer than necessary.

Same Pattern, Three Methods

get_avg_volume and get_free_float use the identical two-layer pattern — their own _pending_* dicts, their own lock keys, their own histograms. Nothing exotic, just applied consistently.

The Scorecard

v0.2.27 ships with 425 passing tests — 57 of them in test_market_data_cache.py — 99% overall coverage, 100% on market_data_cache.py (254 statements, 48 branches). Flake8 clean. Full source on GitHub.

What I Built After Quitting Amazon (Spoiler: It’s a Stock Scanner) — Part 5

2026-02-23T00:00:00+00:00

Wave 1 Complete: Bugs, Bottlenecks, and Breaking 1,000 msg/s

📖 Stock Scanner Series:

Part 1: Why I Built It
Part 2: How to Run It
Part 3: How to Deploy It
Part 4: Evolution from Prototype to Production
Part 5: Bugs, Bottlenecks, and Breaking 1,000 msg/s (you are here)

Ten days. Nineteen versions. One bottleneck that had been hiding since day one.

When I last checked in, the Kuhl-Haus Market Data Platform was functional but fragile — OpenTelemetry was wired up, the data plane was flowing, and I was cautiously optimistic. Since then, the platform went from “it works on my machine” to processing 1,490 messages per second at market close without breaking a sweat. Test coverage went from 35% to 100% on the GitHub badge. And the whole thing got a proper documentation site, because apparently I’m building a real open-source project now.

Let’s talk about how we got here — starting with the bug that almost made me mass-delete my OTEL code.

The MDQ Bottleneck: A Technical Detective Story

The Crime Scene

Right after wiring up OpenTelemetry context propagation, the Market Data Listener started doing something… weird.

Below about 200 messages per second, everything was fine. Normal. Happy. But push the volume higher and the RabbitMQ publish pipeline would just freeze. Not crash — freeze. The MDL stayed connected upstream, happily receiving data from Massive. It just stopped publishing it anywhere useful.

My first instinct? Blame OTEL. I’d just added trace context propagation to the message headers. The timing was suspicious. Of course it was the new code.

Spoiler: it wasn’t.

Following the Evidence

First thing I did was open Issue #3 to track the problem — because debugging without a paper trail is just vibes. First action item: mitigate. That meant reverting the distributed tracing changes in MDL (v0.2.14). Stabilize the patient, then figure out what’s actually wrong.

Clear evidence of a bottleneck — observability merely pushes it past its breaking point.

If you’re squinting at version numbers in the dashboard screenshots and they don’t match the ones in this article — you’re not losing it. As I mentioned in Part 4, kuhl-haus-mdp (core library) and kuhl-haus-mdp-servers (deployment) are separate repos with separate version tracks. This article references kuhl-haus-mdp versions (change log). The dashboards show kuhl-haus-mdp-servers versions (version history).

Then the monitoring told the story. The throughput graph had a flat top. Not a gradual degradation, not random drops — a clean ceiling at approximately 270 msg/s. That pattern is a dead giveaway. Something structural was capping throughput, and it had nothing to do with the network, the broker, or the upstream feed.

Root Cause: Sequential Single-Channel Publishing

Here’s what the publish pipeline looked like:

async def handle_messages(self, msgs: List[WebSocketMessage]):  
    for message in msgs:  
        await self.fanout_to_queues(message)  
  
async def fanout_to_queues(self, message: WebSocketMessage):  
    serialized_message = WebSocketMessageSerde.serialize(message)

One message. One channel. One round-trip. Wait for the broker acknowledgment (~20ms). Repeat.

With publisher confirms enabled and a single AMQP channel shared across six queues, the maximum theoretical throughput was roughly 50 publishes per second per confirm cycle. In practice, the event loop managed to interleave enough work to squeeze out ~271 msg/s — but that was still nowhere near the 1,000+ msg/s I needed during peak market hours. On a local development host (RTT ~1ms), the same code easily exceeded 1,000 msg/s, masking the issue during development and testing.

The OTEL instrumentation didn’t cause this bottleneck. It exposed it. The additional overhead from trace context propagation pushed the pipeline just hard enough to make a latent architectural flaw visible. The bottleneck had been there all along, patiently waiting for enough load to matter.

That’s not a bug in your observability tooling. That’s your observability tooling doing its job.

The Fix

Version 0.2.17, commit caf1ddd. This wasn’t a one-liner.

async def handle_message(self, message: dict) -> None:  
    routing_key = message.get("ev", "unknown")  
    message_body = self._serialize_message(message)  
    await self._publish_message(message_body, routing_key)  
  
async def _publish_message(self, message_body: bytes, routing_key: str) -> None:  
    # Pre-build all Message objects before any network I/O  
    publish_tasks = []  
    for queue_name, channel in self.queue_channels.items():  
        msg = Message(  
            message_body,  
            delivery_mode=DeliveryMode.NOT_PERSISTENT,  
        )  
        publish_tasks.append(  
            channel.default_exchange.publish(msg, routing_key=queue_name)  
        )  
  
    # One concurrent burst — no sequential round-trips  
    await asyncio.gather(*publish_tasks, return_exceptions=True)

The obvious part: allocate one dedicated AMQP channel per queue — six channels — so publishes to different queues are never serialized at the broker level. Fire them all concurrently with asyncio.gather instead of awaiting each one in a loop.

The less obvious part: asyncio.gather is only fast if the coroutines it’s gathering are ready to go. That meant pre-building all Message objects and resolving queue names before any network I/O begins. Separate the prep from the publish. By the time gather fires, there’s zero computation left — just concurrent network calls.

The cleanup: publisher_confirms became a constructor parameter (default True) for toggling fire-and-forget. Delivery mode switched to NOT_PERSISTENT — ephemeral market data doesn’t need durability. The old fanout_to_queues method was deleted; handle_messages now delegates to _publish_message directly. Shutdown and queue setup were updated to manage per-queue channel lifecycles.

Result: 270 msg/s → ~600 msg/s. More than double, once I stopped asking asyncio to be concurrent and actually gave it the structure to do so.

Left: that flat top at ~270 msg/s is the dead giveaway — a structural ceiling, not a load problem. Right: one commit (caf1ddd), concurrent channels, and the ceiling is gone.

The Lesson

Writing async def doesn’t make your I/O concurrent. It makes it possible to be concurrent. You still have to design for it — explicitly, intentionally. An await in a for loop is sequential I/O with extra syntax.

And sometimes the best thing your observability tooling can do is break something that was already broken. You just couldn’t see it yet.

Proving 1,000+ Messages Per Second

With the MDQ bottleneck gone, the natural question was: how far can we push this?

The answer came in layers, and peeling them back was half the fun.

Layer 1: Publisher Confirms (~850 msg/s)

The concurrent channel fix got me to 600, however, further testing showed it bottlenecking around 850 because publisher confirms were still the constraint. Every publish waited for a basic.ack from the broker before the channel was free again. Safe? Yes. Fast? Not fast enough.

Layer 1: publisher confirms on, ~800 msg/s sustained. Push past that and the MDL reconnects — visible top-right. The ACK wait is now the ceiling.

Layer 2: Fire and Forget (~2,500 msg/s)

Flipping publisher_confirms=False changed the game entirely. Without ACK waits, publishes become fire-and-forget — the message hits TCP buffers and the code moves on. Peak throughput jumped to approximately 2,500 msg/s before something else became the limiting factor.

Layer 2: one transition from publisher_confirms=True to False, seen from two angles — received rate on the left, queue throughput on the right. Trades enabled to crank the volume. Fire-and-forget blows past 2,500 msg/s — but three reconnections and an unhealthy MDL say we found the next ceiling, not the final answer.

For a market data platform where the next tick makes the last one obsolete, this is an acceptable tradeoff. I’m not processing bank transfers. I’m distributing prices that have a shelf life measured in milliseconds.

Layer 3: Right-Sizing the Feed

The trades feed was the highest-volume data source by a wide margin — and, like I said in my last post, it wasn’t needed for any of my current analysis use cases. Once I’d proven the platform could handle the load, I disabled it. No point burning resources on data nobody’s consuming.

The Money Shot: 1,490 msg/s at Market Close

With the remaining feed — aggregates — running against real market conditions, the platform hit 1,490 msg/s at market close. That’s peak load, during one of the most volatile parts of the trading day, and the platform handled it without so much as a hiccup.

1,490 msg/s at market close. Healthy connection. Five reconnections since the service started — all from earlier testing. That number highlighted top-right? The one with the yellow arrow pointing at it. That’s Wave 1, answered.

This is the milestone the whole series has been building toward. Wave 1 was about answering one question: can this architecture handle real market data at production speeds?

Yes. Yes it can.

Read the Docs: Looking Like a Real Project

Somewhere between debugging bottlenecks and chasing throughput numbers, the platform got a proper documentation site: kuhl-haus-mdp.readthedocs.io.

If you saw the docs two weeks ago, there wasn’t much to see. A README and some wishful thinking. Now there’s a full Sphinx site with:

Architecture diagrams — PlantUML for the Data Plane, Control Plane, Observability layer, and Deployment Model. Not boxes-and-arrows napkin sketches. Real diagrams that actually reflect the codebase.
Auto-generated API reference — via Sphinx automodule directives, so the docs stay in sync with the code without manual intervention.
Security policy — dual-format because life is complicated. The .rst file is the source of truth for Sphinx; a .md stub lives in the repo root so GitHub’s Security tab picks it up. One policy, two audiences.
Modern packaging — this was the push to finally kill setup.py, setup.cfg, and tox.ini in favor of a single pyproject.toml managed by PDM. PEP 517/518 compliance. Clean, modern, no legacy cruft.

None of this is glamorous work. But if you want anyone else to take your project seriously — or even future-you six months from now — documentation is the difference between “open source project” and “code dump on GitHub.”

The Supporting Cast

A lot happened in 19 versions that doesn’t warrant its own section but still matters. Here’s the highlight reel:

Structured Logging (v0.2.8): Switched to python-json-logger and enforced proper getLogger(__name__) hygiene across every module. Boring? Yes. Essential for debugging in a distributed system? Also yes.

New Analyzers (v0.2.15–v0.2.16): TopTradesAnalyzer — Redis-backed, sliding window, cluster-throttled. MassiveDataAnalyzer refactored to fully async with OTEL instrumentation. The analysis pipeline is starting to look like a real thing.

Market Status Handling (v0.2.19): MarketStatusValue enum so the MDL knows when the market is open, closed, or in extended hours. Sounds trivial. Prevents an entire class of “why isn’t anything happening” false alarms.

MDL Auto-Restart (v0.2.25): Property setters on feed, market, and subscriptions that trigger asyncio.create_task(self.restart()) automatically. Change a configuration value, get a restart. No manual intervention needed.

Test Coverage: From 35% to the Badge That Says 100%

On February 9th — the date of my last post — code coverage stood at 35.74%. Today the GitHub badge reads 100%. That didn’t happen by accident, and it didn’t happen all at once.

Phase 1: Get the Needle Moving

The first pass was simple: establish a minimum of 85% coverage at the module level. No heroics, no edge cases, no agonizing over branch coverage in error handlers. Just write the obvious tests, cover the obvious paths, and get the number to a place where it’s no longer embarrassing.

35.74% → 97%. Fast, relatively painless, and immediately useful. You learn a lot about your own code when you’re forced to write tests for all of it.

Phase 2: Test Coverage Review & Improvement Plan

Phase 2 was different. I opened Issue #4 — a systematic, module-by-module review with one goal: push from competent coverage to comprehensive coverage. 398 tests. 1,853 statements. 5 missed. Every test follows AAA format (Arrange, Act, Assert) with consistent sut naming.

97% → 99%+. And this is where things got interesting.

The Bug That Tests Found

During the Phase 2 review of the Websocket Data Service, I discovered that every pmessage wildcard subscription was being silently dropped. The WDS was subscribing to patterns and then… quietly receiving nothing. No errors. No warnings. Just silence.

I didn’t find this bug by hunting for bugs. I found it by writing thorough tests for code I assumed was working. That’s the whole point of Phase 2. Phase 1 buys you credibility. Phase 2 buys you correctness.

Looking Forward: The Four Waves

This post wraps up Wave 1. It’s a starting gun, not a finish line.

I’ve been thinking about the platform’s roadmap in terms of a SIGINT fire-control analogy — four waves, each building on the last:

Wave 1: Broad Search — Scan the market for stocks in play. Ingest data, distribute it, prove the architecture can handle production load. Done.
Wave 2: Target Acquisition — Stock selection by strategy. Which instruments deserve attention based on volume, volatility, or pattern recognition?
Wave 3: Target Lock — Identify buy/sell signals. The analysis pipeline generates actionable intelligence.
Wave 4: Fire — Execute trades. Paper trading first, then live API integration if the signals prove out.

The infrastructure work is done. The boring-but-essential foundation — logging, observability, testing, documentation, performance — is solid. Now the interesting stuff starts.

Wave 2 is next. Time to find some targets.

All code is open source at kuhl-haus/kuhl-haus-mdp. Star it, fork it, or tell me what I’m doing wrong.

What I Built After Quitting Amazon (Spoiler: It’s a Stock Scanner) — Part 4

2026-02-11T00:00:00+00:00

The Evolution from Prototype to Production: A Case Study in Deliberate Design Iteration

📖 Stock Scanner Series:

Part 1: Why I Built It
Part 2: How to Run It
Part 3: How to Deploy It
Part 4: Evolution from Prototype to Production (you are here)
Part 5: Bugs, Bottlenecks, and Breaking 1,000 msg/s

Introduction

Parts 2 and 3 were straight-up instruction manuals. Necessary, but not exactly page-turners. The DevOps geek in me couldn’t open-source code without proper documentation — it’s a compulsion. But now we get to the interesting stuff: how I took a deliberately simple proof-of-concept and systematically evolved it into a production-grade system that can handle 1,000+ events per second without breaking a sweat.

The Philosophy of Intentional Simplicity

When you’re building something complex from scratch, there’s a temptation to over-engineer. You start designing for scale you don’t need yet, implementing patterns for problems you haven’t encountered, building abstractions for requirements you haven’t validated. That’s how projects die before they launch.

My approach: build the simplest thing that proves the concept, measure it, then iterate based on data. The PoC was intentionally janky — simple data structures, obvious bottlenecks, single-process constraints. I knew exactly what would break and where. The point wasn’t to build the final system; it was to validate the architecture and identify the real bottlenecks through observation, not speculation.

Why Microservices? (And Why It Actually Matters)

Before we dig into the evolution, let’s address the architectural elephant in the room. Microservices are inherently more complex than monoliths — more moving parts, harder to debug, operational overhead. So why choose that path?

I knew from the start I needed real-time WebSocket updates on the frontend. I’d prototyped with py4web but wasn’t married to it. I considered HTMX briefly, but settled on a JavaScript framework for the frontend since AI tooling would be more helpful there. That meant WebSockets, which py4web doesn’t implement natively.

Sure, I could hack WebSocket support with FastAPI as a sidecar. But once you’re running sidecar containers, you’re not building a monolith anymore — you’re building a tightly-coupled hybrid architecture. And that’s the worst of both worlds.

Here’s the thing: authentication, user management, and serving static content are completely different concerns from processing real-time market data at 1,000+ events per second. Why tightly couple the technology stacks when the problem domains are fundamentally separate? Microservices gave me the flexibility to choose the best tool for each job and develop them independently.

The market data constraints sealed it: Massive.com limits you to a single WebSocket connection for all subscriptions. I can’t open separate connections for Trades, Aggregates, and News. I can’t filter to specific symbols. I have to consume everything they send, in bursts, without falling behind — or they disconnect me. That means I need horizontal scalability, which means distributed work queues, which means microservices architecture becomes the simpler choice, not the more complex one.

The Proof-of-Concept: Deliberately Simple, Intentionally Flawed

The PoC had two analyzers processing market data:

The Massive Data Analyzer consumed messages from RabbitMQ and republished them to Redis with zero processing. Pure passthrough.

The Top Stocks Analyzer subscribed to Redis channels, maintained three leaderboards (top gainers, top gappers, top volume) in dictionaries, and sorted them once per second.

I knew this design had problems:

Wrong data structure for rankings: Dictionaries give O(1) access but require O(n*log(n)) sorting to maintain rankings. Priority queues or sorted sets would be better, but I wanted to validate the architecture first.
Processing messages twice: RabbitMQ → Massive Data Analyzer → Redis → Top Stocks Analyzer. Inefficient by design, but it let me test different messaging patterns without rewriting the whole stack.
Single-process constraint: The Top Stocks Analyzer couldn’t scale horizontally because it held all state in memory.

These weren’t oversights. They were conscious tradeoffs to get to a working system fast. The PoC validated the architecture, confirmed the data flow patterns, and — most importantly — ran long enough to reveal the real bottlenecks.

The Stampeding Herd Problem: When Elegant Degradation Meets Reality

Here’s what I discovered: every morning at 6:30 AM Pacific, the scanner crashed like clockwork.

The behavior was consistent enough to set a watch by, but I didn’t have the data to explain why restarting it actually fixed the problem. No distributed tracing. No metrics. Just console logs and educated guesses.

The culprit turned out to be an interaction between two design decisions:

RabbitMQ’s graceful degradation: I configured it to buffer messages for 5 seconds max, silently discarding old messages if processing fell behind. This was intentional — I wanted data freshness over completeness. If the processor got overwhelmed, the WebSocket clients would get slightly stale data instead of a backed-up flood of outdated information.

The cache reset at market open: When the official opening price arrived, the Top Stocks Analyzer reset its entire cache to recalculate all the statistics based on the new baseline. Reasonable enough — except it happened simultaneously with the highest burst traffic of the day.

Here’s where the double-processing bit me: the Massive Data Analyzer was republishing everything from RabbitMQ to Redis, completely bypassing RabbitMQ’s graceful degradation. So when the Top Stocks Analyzer reset its cache right as the market opened, it got slammed with the full stampeding herd of accumulated messages. My elegant backpressure mechanism? Rendered completely ineffective by my own architecture.

The restart “fixed” it because reconnecting the Redis client cleared the backpressure just enough for the process to appear responsive again.

Adding Observability: Proving What You Suspect

You can’t optimize what you can’t measure. I spent a week adding observability to the stack, starting with the low-hanging fruit: zero-code OpenTelemetry instrumentation using environment variables and the opentelemetry-instrument wrapper. Minimal code changes, mostly in kuhl-haus-mdp-servers.

Is it comprehensive? Not yet. The core library doesn’t get auto-instrumented, and most of my FastAPI services just serve health checks anyway. But it lays the groundwork — once I add proper instrumentation to the core library, I’ll have full distributed tracing across the entire stack without reconfiguring the data plane.

For Kubernetes observability, I configured the OpenTelemetry Operator and used operator injection with annotations on the py4web frontend. Infrastructure metrics and logs? Check.

For application metrics, I built a custom Prometheus JSON exporter to scrape the health check endpoints. It runs as a sidecar, translates JSON payloads into Prometheus metrics via a config file, and exposes everything at /probe. Simple, decoupled, effective. I’ve open-sourced the JSON exporter and configuration for the masochists out there.

Graph showing MDL message send and receive rates

The PoC’s death rattle, visualized.

Rather than restart the MDP, I let it run and self-recover. The green line shows the Massive Data Analyzer humming along processing aggregate messages. The red line shows the Top Stocks Analyzer having a full-blown meltdown at market open (6:30 AM Pacific), flat-lining for an hour and then thrashing for the next two, finally recovering around 9:30 AM. Notice how the red line flatlines before rising sharply, spiking to 340+ messages/sec right as it recovers — classic stampeding herd problem.

This is what “crashes like clockwork” looks like in Grafana.

The data confirmed my suspicions and revealed some surprises:

The dictionary sorts were expensive, as expected
The double-processing overhead was worse than I’d estimated
The stampeding herd pattern at market open was clear in the metrics

With hard numbers in hand, I could prioritize the rewrites systematically instead of guessing.

The Solution: Stateless, Horizontally Scalable, Redis-Backed

I killed the Top Stocks Analyzer entirely and replaced it with the Leaderboard Analyzer.

Key changes:

Redis Sorted Sets for rankings: Instead of dictionaries with periodic sorts, I’m using Redis sorted sets that maintain rankings natively. Updates are O(log(n)) and lookups are O(1). More importantly, the data structure lives in Redis, not in process memory.

Stateless design: Multiple Leaderboard Analyzer instances can run concurrently because they don’t hold local state. Each instance pulls from RabbitMQ, processes, and updates Redis. Horizontal scaling becomes trivial.

Single-pass processing: The Massive Data Analyzer is gone. The Leaderboard Analyzer consumes directly from RabbitMQ and publishes to Redis in one pass. The graceful degradation mechanism works again.

The morning crash? Gone. The scanner now runs continuously through market open without a problem.

Scaling Up vs. Scaling Out: Composability by Design

This is why kuhl-haus-mdp and kuhl-haus-mdp-servers are separate repos. The core library defines the data models and processing logic. The servers package implements different deployment strategies.

The MDP Server is designed to scale up — single instance, rich observability, health check endpoints that expose Prometheus metrics.

The LBA Server is designed to scale up and out — headless, no HTTP endpoints, pure message processing. Spin up as many instances as you need and crank up the parallelism while you’re at it.

Both use the same core library. Once I add programmatic tracing and metrics to the core, I can choose scaling strategies based on actual load patterns instead of guessing.

What I Learned

Building the PoC with intentional limitations let me validate the architecture fast and identify real bottlenecks through measurement, not speculation. The stampeding herd problem was something I could have predicted — but it was exacerbated from the interaction of seemingly reasonable design choices under actual production load.

The key was making the PoC simple enough to get working quickly, but instrumented enough to learn from. Now I have a system that handles peak market open traffic without crashing, scales horizontally when needed, and gives me the observability to optimize further.

Not bad for a few weeks of work and some systematic iteration.

What’s Next?

Sharp-eyed readers might’ve noticed something: I keep talking about 1,000+ events per second, but the dashboard screenshots show a max of 340 messages/second from the MDP. What gives?

I’ve got one more piece of code sitting in my private repo — a scanner I built during the PoC that I haven’t released yet. I needed to stabilize the foundation before adding more load to the system.

During the PoC, my first analyzer was based on Massive’s example code — it subscribed to the Trades feed and ranked stocks by number of trades, volume, and cash amount. My initial volume scanner consumed the raw Trades feed, but I discovered I could achieve all my scanning needs using just the Aggregates feed. That cut message processing overhead by 75%, so I shelved the Trades scanner.

Here’s the tradeoff: the Aggregates scanner is slow to detect momentum shifts. When I’m consuming the Trades feed, I can see MOMO (momentum) building in real-time — you catch the move as it’s happening, not after it’s already run. The Aggregates feed smooths everything out, which is great for stability but terrible for timing.

Now that the scaling problems are solved and the architecture can handle horizontal load distribution? Time to bring the Trades scanner back into the mix.

Next post — the finale of this series — I’m going to show you what this thing looks like running at full throttle with both scanners active. 1,000+ events per second, complete with metrics to prove every claim.

We’ve gone from “crashes every morning” to “ready for prime time.” Not a bad arc.

What I Built After Quitting Amazon (Spoiler: It’s a Stock Scanner) — Part 3

2026-01-31T00:00:00+00:00

Deployment and infrastructure — Production deployment strategies and cost optimization techniques

📖 Stock Scanner Series:

Part 1: Why I Built It
Part 2: How to Run It
Part 3: How to Deploy It (you are here)
Part 4: Evolution from Prototype to Production
Part 5: Bugs, Bottlenecks, and Breaking 1,000 msg/s

Introduction

If you’ve been following along with this series, you know the journey so far: I quit Amazon after a decade, dove into day trading, realized I needed better tools, and built a real-time stock scanner from scratch. In Part 2, we got it running on your local machine using Docker Compose — a great way to kick the tires and see if it fits your needs.

But here’s the thing: running it on your laptop is fun. Running it in production is a whole different game.

Your laptop sleeps. Your PC reboots. You want to check your scanner from your phone while you’re out, but localhost:8000 doesn’t work from Starbucks. And let’s be honest — if you’re serious about day trading, you need your scanner up at 4:00 AM Eastern, not whenever you remember to start Docker.

That’s where Part 3 comes in.

What this post covers

Part 2 bombed. You wanted the story, not a glorified README. The engagement numbers don’t lie.

So here’s the deal: I’ve open-sourced my entire production CI/CD stack — the actual Ansible playbooks, GoCD pipeline configs, and deployment scripts running my Market Data Platform in production. Not toy examples. The real deal.

But I’m not going to bore you with another README walkthrough. The docs exist — you don’t need me to read them to you.

Instead, I spun up a fresh Kubernetes cluster on Docker Desktop and deployed the whole stack from scratch. What you’re getting here are the moments that matter: the configuration decisions, the differences between deployment environments, and the hard-won insights that never make it into official documentation.

Think of this as the director’s commentary track for your deployment.

From Docker Compose to Production

Docker Compose is perfect for local development. One command, everything runs, you’re done.

Production requires thinking about:

High availability: Auto-restart crashed components
Scalability: Handle market open when thousands of stocks update every second
Security: No hardcoded credentials
Reliability: Market open waits for no one
Maintainability: Patches and updates happen

What You’ll See

By the end of this post, you’ll watch me:

Deploy the entire Market Data Platform to Kubernetes using Ansible
Set up networking, storage, ingress, and TLS certificates
Validate end-to-end functionality

We’ll walk through the deployment playbooks step-by-step, and I’ll show you the exact modifications I made to go from example configuration to fully-functioning production setup.

Fair warning: This isn’t click-and-deploy. You’ll wrangle Ansible, Kubernetes, and YAML files. But you’ll also get a real CD foundation that works with any automation tool. I’m running the scripts manually here, but anything that can clone a repo and run bash scripts will work.

Let’s deploy something.

Prerequisites

You’ll need Kubernetes (v1.32+) and Ansible (2.19+). I’m using Docker Desktop’s built-in Kubernetes because it’s dead simple for local testing, but these manifests work on any cluster — EKS, GKE, on-prem, whatever.

One critical note: Don’t deploy this to a public cloud and expose it to the internet. The security model assumes you’re behind a firewall. If you’re running this in AWS or GCP, keep it in a private subnet or you’re gonna have a bad time.

SCREENSHOT: Tool versions verification

Initial Setup

I’m running this on WSL2 (Windows 11, Ubuntu). My shell user is stack - same username as my production Ansible account. This matters because Ansible uses your local username for remote connections by default. If yours is different, you’ll need to override it in the inventory file or you’ll spend 20 minutes wondering why SSH keeps failing.

Step 1: Clone the Repositories

Three repos to grab:

cd /mnt/c/Users/tom/Documents/GitHub  
mkdir kuhl-haus  
cd ./kuhl-haus  
gh repo clone kuhl-haus/kuhl-haus-mdp-servers  
gh repo clone kuhl-haus/kuhl-haus-mdp-app  
gh repo clone kuhl-haus/kuhl-haus-mdp-deployment

The fourth repo (kuhl-haus-mdp) is the core library - you don’t need it for deployment, it’s a dependency that gets pulled in automatically.

SCREENSHOT: terminal showing directory structure

Step 2: Configure Ansible Vault

Here’s where people usually screw up: you need to create the vault file before running any playbooks.

The vault holds your API keys, passwords, and other secrets. The example shows you the structure, but don’t just copy-paste — you need real credentials.

Create a vault at ansible/group_vars/secrets.yml, which is .gitignored, so your secrets stay local.

ansible-vault create ansible/group_vars/secrets.yml

SCREENSHOT: Vault configuration example (redacted sensitive data)

Step 3: Environment Variables

Three variables matter:

APP_ENV - This is the name of your inventory folder under ansible/inventories/. I used dev (which is .gitignored, so your dev inventory stays local). Production would be prod, staging would be staging, etc.
BASE_WORKING_DIR - Where you cloned the repos
Domain names for your services

SCREENSHOT: Set environment variables

Step 4: Inventory

Copy the example inventory and edit it:

cp -af ansible/inventories/example/ ansible/inventories/dev/  
vim ansible/inventories/dev/hosts.yml

The example has placeholder domains. Change them to yours. If you’re setting up TLS, this is where you configure your ACME/Let’s Encrypt details.

Why this matters: Kubernetes ingress routes traffic based on hostnames. Get these wrong and you’ll deploy successfully but won’t be able to access anything.

SCREENSHOT: Modified example inventory file.

Deployment Process

Quick housekeeping first: Install Ansible dependencies with the prerequisites playbook. Takes about 30 seconds to create a Python venv and install the Kubernetes module.

SCREENSHOT: Prerequisites installation output

Phase 1: Base Kubernetes infrastructure

This is where deployment gets interesting — and where Docker Desktop diverges from production clusters.

Cloud providers use proprietary networking (CNI) and storage (CSI) plugins. Once you configure those, though, everything else is mostly portable. That’s the whole point of Kubernetes — abstraction that keeps you from getting completely locked into one vendor.

Storage: The Easy Part

Production uses Ceph with RADOS Block Device (csi-rbd-sc storage class). Docker Desktop? Just change one variable to hostpath in ansible/group_vars/all.yml. Done.

Networking: The Fun Part

Here’s where I hit my first real snag.

In production, I run MetalLB for load balancing with NGINX ingress. MetalLB assigns virtual IPs to services using Layer 2 ARP. Works beautifully on bare metal Ubuntu nodes.

Docker Desktop? Nope.

The problem: Docker Desktop runs Kubernetes inside a VM (even on WSL2). MetalLB’s ARP responses happen inside that VM, not on your physical network interface. Your host network never sees the advertisements. You deploy everything, health checks pass, and… you can’t reach anything.

I spent 20 minutes checking NGINX configs before I remembered the VM layer.

The fix: Don’t use MetalLB on Docker Desktop. Just skip it. NGINX will bind directly to ports 80 and 443 on your physical interface instead. No other changes needed — the Service endpoints and ingress routes work identically.

Rather than maintaining separate playbooks, I added a conditional check. If you’re deploying to production and want MetalLB, uncomment use_metal_lb: true in your inventory file.

TLS Certificates: The Clever Part

Important: If you just want to kick the tires on localhost, stick with Docker Compose from Part 2. The Kubernetes deployment assumes you’re setting up proper hostnames and TLS certificates.

Here’s the problem I needed to solve: I want production-grade TLS certificates, but I don’t want my services exposed to the public internet. Let’s Encrypt’s HTTP-01 challenge won’t work because it requires public accessibility.

Enter split-brain DNS with ACME DNS-01 validation.

How it works:

I register real domains with AWS Route53 and Cloudflare (public DNS zones)
ACME DNS-01 validation checks those public zones — ✓ domains are verified
But my internal DNS server resolves those same hostnames to private IPs
Traffic never hits the internet — it routes internally

For production, those internal IPs point to MetalLB virtual IPs. For this Docker Desktop demo, I created internal DNS records pointing to my PC’s IP address (192.168.x.x or whatever your WSL2 interface uses).

The result: Real, valid TLS certificates for services that only exist on my internal network.

The playbook supports both AWS Route53 and Cloudflare for DNS-01 validation. You specify which provider in your inventory file, and cert-manager handles the rest.

For Docker Desktop specifically: You’ll need to set up DNS records on your local network (your router, Pi-hole, or whatever runs your internal DNS) that point your chosen hostnames to your PC. The ACME validation happens against the public zone, but the actual traffic goes to your local machine.

SCREENSHOT: k8s-infra.yml playbook completed successfully

Phase 2: Frontend Deployment

Here’s where we find out if everything actually works.

SCREENSHOT: Showing deployment summary and verification steps

The Version Verification Trick

Remember in Part 2 when I said not to worry about container_image and image_version showing as “Unknown”? That was Docker Compose running locally with no git context.

In Kubernetes, those fields show real values: ghcr.io/kuhl-haus/kuhl-haus-mdp-app-server:0.1.4.dev1-2c68fe9 and 0.1.4.dev1-2c68fe9.

SCREENSHOT: Smoke test script inspecting image tag returned from health check endpoint

Why this matters: The deployment scripts use the same logic as the image build pipeline to calculate version tags from git commit history. That’s why you needed to clone the repos — not for the code, but for the git history.

SCREENSHOT: App landing page with image version and image source highlighted

Checking the GitHub packages confirms 0.1.4.dev1-2c68fe9 is indeed the latest image.

Why Simple Health Checks Aren’t Enough

Here’s the problem with basic smoke tests: they’ll tell you if something is running, but not if your new version deployed successfully.

Kubernetes does blue/green deployments. If a new pod fails health checks, it never enters the load balancer rotation. The old version keeps serving traffic. Your health check endpoint returns 200 OK… from the old pods.

Everything looks fine. Your deployment failed.

My smoke test script checks the version tag in the health check response. If it doesn’t match what I just deployed, the script fails. This catches deployment failures while maintaining high availability — the old version stays up, I get alerted, and I can investigate without taking an outage.

This is also why I run a pre-production environment. Upgrade all PPE nodes first, verify the version-tagged health checks pass, then move to production with confidence.

SCREENSHOT: No market data… yet.

Phase 3: Backend Data Plane (The Order Matters)

Unlike the frontend, the backend components deploy sequentially. Not for fun — because they have dependencies that’ll bite you if you ignore them.

WARNING — RACE CONDITION: The Market Data Processor won’t start if the Market Data Listener hasn’t created its RabbitMQ queues yet. The MDL owns queue creation and only does it on first run. Deploy MDP first? It crashes looking for queues that don’t exist.

So: sequential deployment, dependency order enforced.

Certificate Manager

Quick housekeeping: each namespace needs its own cert-manager to issue certificates. Frontend and data plane are isolated — the frontend cert-manager can’t issue certs for backend services.

SCREENSHOT: certificate manager deployment

Market Data Cache (Redis)

In production, Redis runs with authentication. For this demo, I skipped the password so I could show you the Redis browser interface and capture screenshots of the cache state.

Is this how you should run Redis? No. Is it fine for a local demo that never touches the internet? Yes.

SCREENSHOT: deployment summary for Redis

SCREENSHOT: Smoke test Market Data Cache

SCREENSHOT: Optional Redis Browser Interface

Market Data Queues (RabbitMQ)

Same deal — I enabled the management dashboard metrics collector, which RabbitMQ deprecated in favor of Prometheus. But Prometheus metrics don’t make good screenshots, and you’re not running this in production anyway.

SCREENSHOT: RabbitMQ deployment summary

SCREENSHOT: RabbitMQ smoke test script output

Market Data Listener

Now we’re back to my code, which means we’re back to version-tagged health checks.

Notice the smoke test validates the image tag? Every component I built emits image_version and container_image from its health endpoint. Redis and RabbitMQ are third-party - they don’t have this verification built in.

Market Data Processors

This is the component that crashes if the MDL hasn’t run first. With the MDL deployed, the queues exist, and the MDP starts cleanly.

Final piece of the backend puzzle.

SCREENSHOT: Widget Data Service smoke test script

End-to-End Verification and Testing

Time to see if this thing actually works.

Open the app and… yes, data is flowing. Scanners are populating. But let’s trace exactly how that data got there — this doubles as a tour of the data pipeline.

SCREENSHOT: Stock Scanner Dashboard with populated scanners

Step 1: Market Data Listener

The MDL connects to your market data feed and processes incoming messages. Hit the health endpoint and you get the full picture:

{  
  "service": "Massive Data Listener",  
  "status": "OK",  
  "container_image": "ghcr.io/kuhl-haus/kuhl-haus-mdl-server:0.1.12",  
  "image_version": "0.1.12",  
  "mdq_connection_status": {  
    "connected": true,  
    "last_message_time": "2026-01-31T00:09:04.870812",  
    "messages_received": 98246,  
    "aggregate": 98246  
  },  
  "mdl_connection_status": {  
    "connected": true,  
    "feed": "socket.massive.com",  
    "market": "stocks",  
    "subscriptions": ["A.*"]  
  }  
}

What this tells us:

Image version matches what we just deployed (0.1.12) ✓
Connected to both the market data feed AND RabbitMQ ✓
Processed 98,246 aggregate messages (and counting) ✓
Last message came in seconds ago ✓
Subscribed to per-second Aggregate events for all stocks ✓

That’s a healthy listener. Messages are flowing into RabbitMQ queues.

Step 2: RabbitMQ Queues

If messages were piling up here, it’d mean the processors aren’t keeping pace. But the queues are empty — good sign. Messages are flowing through, not backing up.

SCREENSHOT: RabbitMQ dashboard showing no queued messages.

Minor embarrassment: The dashboard shows 23 messages per second. I advertised this thing as handling 1,000+ messages per second, so what gives?

I’m running this demo after market close. Traffic right now is basically nothing — a few late trades trickling in, some after-hours activity. At 9:30 AM Eastern when market opens and every stock is moving? Yeah, then you get your 1,000+ msg/sec.

Timing is everything in stock market demos, apparently.

Step 3: Market Data Processor

The MDP pulls messages from RabbitMQ, processes them, and writes results to Redis. The health check shows what’s actually happening:

{  
  "status": "OK",  
  "container_image": "ghcr.io/kuhl-haus/kuhl-haus-mdp-server:0.1.12",  
  "image_version": "0.1.12",  
  "mdp_aggregate": {  
    "alive": true,  
    "pid": 9,  
    "processed": 99071,  
    "errors": 0,  
    "mdq_connected": true,  
    "mdc_connected": true,  
    "restarts": 0,  
    "running": true  
  },  
  "mdp_trades": {  
    "alive": true,  
    "processed": 0,  
    ...  
  },  
  "scanner_top_stocks": {  
    "alive": true,  
    "processed": 99070,  
    "errors": 0,  
    "mdc_connected": true,  
    "restarts": 0,  
    "running": true  
  }  
}

What’s happening here:

The MDP runs separate processors for different message types — trades, aggregates, quotes, halts, news. Only aggregate messages are flowing (those 99,071 processed messages) because that’s all I’m subscribed to. Everything else shows zero because those message types aren’t coming in. If I changed my subscription, those processors would immediately start processing the new message types.

Notice scanner_top_stocks has processed 99,070 messages - one less than the aggregate processor. That scanner consumes the aggregate stream and maintains the leaderboards in Redis. It’s keeping perfect pace.

The zero errors thing: No decoding errors, no duplicates, no restarts. All processors show mdq_connected: true (RabbitMQ) and mdc_connected: true (Redis). Clean operation.

Version matches deployment (0.1.12) ✓

Step 4: Redis Cache

This is where processed data lives. The browser shows keys being populated in real-time:

SCREENSHOT: Redis browser showing cache steadily being populated by the market data processor

Each key corresponds to a specific data aggregation — top gainers, top volume, top gappers, etc.

The Widget Data Service is a WebSocket interface to Redis. Its health check is simple but tells you everything you need:

{  
  "status": "OK",  
  "container_image": "ghcr.io/kuhl-haus/kuhl-haus-wds-server:0.1.12",  
  "image_version": "0.1.12",  
  "active_ws_clients": 3  
}

Version matches (0.1.12) ✓

Three active WebSocket clients — that’s the three widgets I have open in my browser right now. Each widget is a separate WebSocket connection subscribing to specific Redis cache keys.

Open browser dev tools and you can watch the WebSocket traffic:

Dev tools showing WebSocket subscriptions

Each widget subscribes to specific cache keys. When the MDP updates Redis, the Widget Data Service pushes updates through the WebSocket, and the UI updates without polling.

This is the cool part: The entire data pipeline — from market feed to UI update — happens in near real-time. No database queries, no REST polling, just WebSocket push notifications driven by cache updates.

And it all just worked on the first deployment.

Cost Optimization (Or: How to Cheap Out If You Must)

Look, I’m not going to pretend I’ve tested every penny-pinching configuration. I run the $199/month plan because I want real-time data and I’m not broke. But if you’re absolutely determined to save a few bucks, here are some half-assed guesses that might work.

Downgrade your market data plan:

Don’t need real-time updates? The $29/month Stocks Starter plan gives you delayed data and daily statistics. You lose the second-by-second scanner updates, but you can still run end-of-day analysis and historical scans.

Trade-off: Your scanner shows what happened, not what’s happening.

Switch from per-second to per-minute aggregates:

Change your subscription from A.* (all tickers, per-second) to AM.* (all tickers, per-minute):

# ansible/group_vars/all.yml  
  
massive_subscriptions:  
  - "AM.*" # Per-minute instead of per-second

Theory: 60x fewer messages means 60x less CPU and bandwidth. Should save you money on cloud hosting.

I’ve actually tried this. It works, but it’s slow. You’re getting updates once per minute instead of every second. Suboptimal for day trading. Fine for end-of-day longer time frame analysis.

Other ideas I haven’t tried:

Run the scanner only during market hours (9:30 AM — 4:00 PM ET). Schedule your Kubernetes pods to scale down outside those hours.
Subscribe to fewer tickers. If you only trade a few stocks, why pay to process data on thousands of symbols?
Use cheaper cloud instances. This runs fine on small VMs — you don’t need a beefy server.

Again: I don’t run any of these configurations. They’re educated guesses. If you try them and they work, great. If they don’t, you get to keep both pieces.

Conclusion

The Reality Check

Let’s be honest: this deployment isn’t trivial. Ansible playbooks, Kubernetes manifests, networking configs, and more YAML than any reasonable person should endure. If you hit roadblocks, that’s normal. Infrastructure work is hard, and anyone who tells you otherwise is selling something.

But here’s what matters: you just deployed a production-grade real-time stock scanner to Kubernetes.

Is it perfect? No. Will you need to tweak it? Absolutely. Should there be monitoring and alerting? Yes, and we’ll get there. But right now, you’ve got market data flowing through a multi-component pipeline, updating in real-time, with proper health checks and version verification.

That’s a hell of a starting point.

What’s Next

This series isn’t done. Coming up:

The Market Data Processor internals — How I calculate relative volume, track daily statistics, and maintain top 500 rankings efficiently
WebSocket challenges — Handling reconnections, backpressure, and ensuring data consistency in real-time streaming applications

If you made it this far, you’re either deploying this thing or you’re a masochist. Either way, thanks for reading.

Tom Pounders

Stock Selection: Why News Matters

Why News Matters

Picking a News Provider

The News Feed Widget

The Flame System

What Else Shipped

Widget linking

Quote widget

Layout lock + toggle autosave

Full scanner widget customization

Data freshness icon

Free float cleanup

What’s Next

I Caught My AI Lying About Math (Confidently)

What I Assumed Going In

The Test

What Actually Breaks It

The Part That Actually Worries Me

What I’m Doing About It

The Broader Point

LLM Arithmetic Reliability Test — 2026-03-23

Executive Summary

Key Findings

Finding 1: Number Size vs. Step Count

Finding 2: Errors Compound Multiplicatively

Finding 3: No Reliable Self-Awareness of Error

Finding 4: Division Is Relatively Stable at Small Scales

Finding 5: The “Close Enough” Trap

Operational Rules (Derived from Test Results)

Appendix: Full Test Results

Round 1 — Numbers up to 65,535, 1-2 steps

Round 2 — 5-digit numbers, 2-6 steps

Round 3 — 4-digit numbers, 2-6 steps

Round 4 — 3-digit numbers, 2-6 steps

Round 5 — 2-3 digit numbers, 2-6 steps

Round 6 — 1-digit numbers, 2-6 steps

Round 7 — 1-digit only, 3-5 steps

Round 8 — 1-2 digit mixed, 4-6 steps

Round 9 — 1-2 digit, larger 2-digit values, 6 steps

Round 10 — 1-2 digit, 5-7 steps

Overall Summary Table

The NHI Time Bomb — Why AI Agents Are an Identity Crisis Waiting to Happen

Your NHI Governance Wasn’t Ready For AI Agents. Neither Was Mine.

What Is a Non-Human Identity?

The Explosion

The Hard Problem: Governance at Nondeterministic Scale

What Good Looks Like

For Enterprises: It’s WHEN, Not IF

The Bottom Line

Who Is Legion?

What Legion Is

How It Was Built

What Legion Can Do

Blast Radius by Design

The Name

Welcome to oldschool-engineer.dev

Why Self-Host Content?

What to Expect

Prevent Cache Stampedes with asyncio Events

The Scenario

Why a Redis Lock Alone Isn’t Enough

The Fix: Two Layers

When the Leader Dies

Same Pattern, Three Methods

The Scorecard

What I Built After Quitting Amazon (Spoiler: It’s a Stock Scanner) — Part 5

The MDQ Bottleneck: A Technical Detective Story

The Crime Scene

Following the Evidence

Root Cause: Sequential Single-Channel Publishing

The Fix

The Lesson

Proving 1,000+ Messages Per Second

Layer 1: Publisher Confirms (~850 msg/s)

Layer 2: Fire and Forget (~2,500 msg/s)

Layer 3: Right-Sizing the Feed

The Money Shot: 1,490 msg/s at Market Close

Read the Docs: Looking Like a Real Project

The Supporting Cast