Most multi-agent demos survive one run. Mine has shipped my mornings, my finances, my workouts, and my job pipeline every day for three months. What follows isn't a success story; it's the one system that survived long enough to teach me what production actually demands.
The cast
Jordan is the orchestrator, named after my dog. Every Telegram message lands on Jordan first, and Jordan routes it to whichever specialist owns the domain. Jordan also handles anything personal I won't delegate to an agent I wrote last month: reminders that need context, cross-agent coordination, changes to the system itself. Lives at ~/.claude/. Bot: @NikoClaudeCode_bot.
Research runs daily digests across four streams: a curated crypto Twitter list, AI news, Discord alpha channels, and my Gmail inbox. Ten LaunchAgents fire through the morning. The most concrete output is the 09:25 UTC brief that hits Telegram before I've had coffee.
Finance syncs Monobank, two Binance accounts, DeBank, and Merkl. Tracks P2P conversions, monthly budgets, subscription churn. Nine LaunchAgents. Runs on Sonnet; moving it off Opus cut token consumption 5× with no drop in output quality.
Fitness handles workout programming, recovery logs, nutrition intake, and weekly retrospectives summarising what moved and what stalled. Ports to the same dashboard as the other agents, so I see training load and job pipeline in one view.
Jobs is the most elaborate of the four: it scans 46 ATS boards plus freelance aggregators, runs a bge-m3 prefilter, scores via Pydantic, tailors CVs, and tracks everything in a 10-column Kanban at app.nikoxyz.com/#jobs. Thirteen LaunchAgents. The only agent that occasionally wakes me up at midnight with a high-fit match.
Day-job context: I'm PM at TwitterScore, 35K monitored crypto accounts and 250+ API clients. The blog posts are about the agents I build at night; the agents I build at night are where I learn what's shipping-grade for the day job.
Pick boring infrastructure
Three primitives keep everything connected, and none of them would look interesting in a keynote.
The message bus is a folder on disk. Any agent drops JSON into ~/projects/shared-agent-hub/messages/<target>/<timestamp>-<source>.json. The target picks it up on its next wake. A typical cross-agent handoff is one file write:
// ~/projects/shared-agent-hub/messages/jobs/2026-04-22T09-17-research.json
{
"from": "research",
"kind": "candidate-topic",
"payload": {
"title": "GitHub quietly added a 'coding-agent' review flag",
"urls": ["https://github.com/features/..."],
"why": "matches jobs-agent's target-keywords config"
}
}Jobs picks that up on its next wake (every two hours), decides, and writes a response JSON back to messages/research/.... No queue, no broker, no schema registry. A folder and a convention.
Memory is SQLite plus local vectors. A custom MCP server runs FTS5 keyword search, bge-m3 semantic similarity, and a recency boost over 144 files and roughly 1,700 chunks. Every agent loads the same MCP in its .mcp.json, so any of them can recall "what did we decide about X last month" without me re-explaining. Embeddings run locally via Ollama on an external SSD.
Telegram is the human interface. Five bots, one per agent. I can trigger any of them from any device. Jordan routes cross-agent asks so I don't have to remember who handles what.
The anti-pattern here is over-architecting. Every tutorial on multi-agent systems tells you to use a proper queue (Redis, NATS), a proper database (Postgres), a proper orchestrator (LangGraph, Temporal). For five agents and one user, I'm three orders of magnitude below the scale where any of those pay off. ls and cat are my debug tools, and that's a feature.
Supervisors for inactivity kills
My first agents died mid-task in the second week. macOS launchd kills Claude Code processes during idle periods, and the kill wiped the conversation buffer on the way out. I'd come back to a Telegram bot that had lost every decision from the last hour of work and couldn't tell me what it had been doing.
The fix was a supervisor process per agent that catches the restart signal and runs a flush protocol before the kill completes. Not a graceful shutdown; a forced save.
# runs before every agent restart
bun ~/.claude/scripts/flush-helpers.ts \
--model claude-haiku-4-5 \
--max-chars 40000 \
--write-memory
launchctl kickstart -k gui/501/ai.jordan.claude-telegramAll five agents share the flush logic. One helper file, 223 lines, one API. The signature every supervisor calls:
export function extractConversation(jsonlPath: string, maxChars = 80_000): string {
// Parses the Claude Code session file, strips Telegram XML noise,
// summarises tool calls into one-liners, trims to maxChars from the end.
// The return value is fed to Sonnet for the flush summary.
}Centralising meant one bug fix covered all five. When I added WAL retries for concurrent SQLite writes, I added them once. When Haiku's context window changed, I changed one default. When I realised I needed to exclude tool-call blobs from the summary because they were eating the budget, same thing.
The supervisor only covers clean kills, though. Crashes still happen, and those have nothing to flush. The next layer was micro-flushing: a checkpoint every three hours that writes the running summary to memory automatically, regardless of whether a kill is coming. Worst-case loss on crash is now three hours, not a day. In practice it runs on Haiku, which costs nothing at the Max tier and adds about two seconds to the three-hour cadence.
Takeaway: the restart story is never "it'll probably be fine". Write the flush once, reuse it everywhere, and assume the next kill will be uncatchable.
The March cache bug
In March, Anthropic shipped a prompt-caching change that doubled-to-tripled per-request costs for anyone with a large cached system prompt. The community tracked it in GH #41930, and The Register ran a write-up a few weeks later.
My monthly bill didn't spike, because I'm on the Max subscription and the bill is flat by design. My context window usage did spike, silently. For a week I was losing about 20% more context per turn than before. That meant conversations hit the compaction threshold sooner, more auto-summary cycles, and degraded output quality. I only caught it when two different agents gave me worse digest writeups on the same weekend.
The problem wasn't the cache change. The problem was that I watched the bill (flat, by design) instead of per-turn token consumption (spiking, invisible).
A token-drain report now runs nightly on Research:
2026-04-22 23:50 · finance · avg 48.2k tokens/turn (↑12% vs 7d median)
· top offender: Merkl sync prompt (+6.1k)
· action: moved Merkl block below tool defsTwo regressions caught since I added it. One was my own, a poorly-ordered prompt in the Fitness agent. One was Claude Code 1.2 shipping a longer default system prompt I hadn't noticed until the 12% spike fired the alert.
Takeaway: subscription pricing decouples your bill from your usage. If you run agents on Max, you need observability on context growth, not on cost.
Sonnet over Opus, for most agents
Half my agents don't need Opus. Moving Finance from Opus to Sonnet cut its per-day token consumption by a factor of five, and I couldn't tell the difference in the output for a full month.
This surprised me. I'd assumed "use the best model" was always right, because that's what the influencers say. It's wrong when the task is deterministic parsing and categorisation, which is what Finance does: read a bank statement, categorise the transactions, roll up by month, flag unusual spikes. Sonnet nails that at a fraction of the cost. Where it fell short was the edge case: a new merchant name Sonnet hadn't seen before got categorised as "other" where Opus would have guessed correctly from context. I fixed that with a three-line override list rather than re-upgrading the whole agent.
The agents I kept on Opus are the ones that need judgment. Jordan, because it routes ambiguous Telegram messages and handles personal ops where a wrong call is expensive. Jobs, because its fit-scoring compares a candidate JD against my profile, and the failure mode is applying to jobs that waste my time.
Research runs Haiku for digests, because digests are summarisation. Fitness runs Sonnet, because workout programming isn't creative writing.
The anti-pattern is defaulting every agent to the top tier because it's in the session config template. Go through them and ask: does this agent need to reason, or does it need to recognise?
Start with the official courses
My first month on Claude Code was a YouTube demo loop. Watch a setup, copy it, hit a wall two hours in because the demo skipped the foundation, move on. I was collecting patterns without the system underneath, and I had no way to tell a good pattern from a cargo cult.
The fix was a weekend on Anthropic Academy. I ran Claude 101 and Claude Code in Action back-to-back. Free, official, about ten hours of material between them.
Afterwards, every influencer video made sense in a different way. I could tell pattern from noise from confidently wrong. I could see when a workflow was fighting the model instead of using it. The things I'd spent a month reverse-engineering were all in the first course, in the order you need to learn them.
Ten hours with the source beat a month of pattern-matching in the dark. If you're starting on Claude Code, do the courses first.
What's next
The current cliff is write access. Every agent can read the shared hub and the memory MCP, but writes stay scoped per-agent. Finance can see that Jobs received an interview request, but it can't update the Kanban column when the interview gets booked. Research can suggest topics to Jobs, but can't file them.
Giving agents write access to each other's data is the biggest unlock I have left and the biggest blast-radius risk. I'm working on scoped write permissions with revertable diffs, so a bad write rolls back in one command.
Everything sits on my Claude Max subscription, about $200 a month all in. Embeddings run locally. Docker handles image rendering. Zero SaaS, zero cloud.
Full project catalogue at /projects#ai-agents.
If you're running something similar, I'm @nikolayxyz on X.