From Inference Loops
to Long-Running Agents
Fundamentals, Workflow, and a Real Example
Shekhar UpadhayaShekhar Upadhaya
CTO · Skyhost · ex-Amazon
An LLM is a function. Text in, text out. Stateless. No memory between calls.
An agent is a while-true loop
that appends to an array.
# What an agent actually is
while True:
user_input = get_input()
response = llm.complete(user_input)
if response.wants_tool:
result = execute_tool(response.tool_call)
response = llm.complete(result)
print(response)The context window
is just an array.
- ◆Every API call sends the entire array.
- ◆Each turn appends.
- ◆The model is stateless.
The loop, in action.
# this iteration takes the tool branch
while True:
user_input = get_input()
response = llm.complete(user_input)
if response.wants_tool:
result = execute_tool(response.tool_call)
response = llm.complete(result)
print(response)The agent harness wraps the loop.
Everything that isn’t the LLM — what tools exist, what context loads, when to stop.
An agent is a while-true loop appending to an array. The harness controls what’s in it.
If the context window is just an array,
what goes in the array is everything.

The instruction ceiling is real.
Frontier thinking models reliably follow ~150–200 instructions. Beyond that, even rules at the top get ignored.
Smart zone, dumb zone.
The window isn’t uniform. The first ~40%is where the model thinks clearly. Past that, attention frays — tool choice gets sloppy, instructions get dropped, the goal drifts.
“The more context you use, the worse results you’ll get.”
The allocation problem.
Static fills eat your usable space before the conversation starts.
The context rot problem.
What’s in there is wrong. Errors propagate. Compaction is lossy.
Good context stays in the smart zone.
- Fresh session per task start clean, don't reuse a tired window
- Only what this task needs drop the MCPs and notes that aren't useful here
- Offload to disk save big stuff as files, keep short summaries in the window
- Send sub-agents for side quests let them explore, return one paragraph
- Leave room below the line finalizing work (tests, commits, lint) still has space
- Split big work across sessions when it won't fit one window, plan it, write the spec to disk, let multiple agents pick it up
Every session starts from zero. Context doesn’t engineer itself.
Allocation, rot, compaction, recovery. Someone has to handle them.
Your harness wraps theirs.
Anthropic ships the agent harness. You ship the layer around it.
Your harness is the layer you own.
Files, skills, loops, and rules the agent reads every session. That’s what makes long-running runs possible.
The setup.
CLAUDE.md.
- A map, not a brain dump. Points at docs, doesn't contain them.
- Lists standards, never-rules, skill names. One section each.
- Multi-level. Sub-CLAUDE.mds in each module — app, modules, components, server.
- Progressive disclosure. Root loads at session start. Sub-files load only when the agent enters that directory.
- Same context budget, more steering. Right rules show up at the right moment.
agent_docs/
- One concern per file. architecture, conventions, anti-patterns, workflow.
- Linked from CLAUDE.md, not loaded by it. “Before adding a feature, read adding-a-feature.md.”
- On-demand context. Agent reads docs only when relevant — nothing wasted up front.
- Specs split big work into phases. Dated, on disk, diffable. One phase per session.
- Surviveable. Specs outlive context windows. Reset and continue.
custom skills
Each skill wraps a recurring workflow into one verb. Loaded only when invoked. Stolen from Matt Pocock.
issue tracker (Beads)
- Tasks survive sessions. Not in the context window — on disk, in a graph.
- Dependency graph. Beads knows what's blocked, what's ready, what's done.
- bd ready — the next thing to work on. One command. Top-priority issue with no open blockers.
- Linked to specs. One spec breaks into many issues. Same vocabulary across plan and execution.
- Feeds the loop. Ralph asks Beads what's next, runs it, loops.
The Ralph loop.
while :; do
cat PROMPT.md | claude # Claude Code CLI
done- ▸The window is the budget. A plan becomes epics, epics become issues. Each session takes whatever fits inside the smart zone, could be one issue, could be five.
- ▸PROMPT.md is the instruction sheet. Tells the agent which spec to read, where to find the next task, and the rules of the run. Re-read every iteration. This is where the intelligence lives.
- ▸Reset every loop. Fresh window each pass. No compaction. State that matters lives on disk: Beads, specs, CLAUDE.md.
Where my hours actually go.
Most of my time is here, not in the run. Brainstorm, grill, spec, atomic issues.
Two ways to plan.
Write the spec after you understand it, not before.
Wants to produce an asset first. Writes the plan before alignment.
Wants shared understanding and alignment first. The asset comes after.
Grill before you plan.
·/grill-meThe agent asks now or assumes later. Assumptions become bugs.
The spec.
·/to-prdSynthesizes the grilling session into a PRD on disk. No new interview.
Decompose.
·/session-plannerSpec to issues, sized for the smart zone.
- ▸Sized for the smart zone. Each issue fits one fresh session.
- ▸Ralph script wired to this queue. Per-epic guard rails, ready to run.
The run.
The same session that planned the work now runs and watches it.

- ▸Launched from the session, not a terminal. Same session that did grilling, spec, decompose now runs
ralph.sh. - ▸Ralph runs headless. Each iteration spawns a fresh
claude -p, picks the next Beads issue, lints, commits, closes it, loops. Verbose output streams to a log. - ▸The session polls the log. Tails every minute, summarizes progress, flags failures. I’m on my phone.
Review.
Ralph closes the queue. I open the PR.

- ▸Open the PR. Ralph pushed and opened it. I read the diff.
- ▸Check the preview build. Web → Vercel preview. Mobile → Xcode Cloud build lands in TestFlight. Click through.
- ▸Loop back if needed. Anything off becomes a new issue. Ralph runs again. Otherwise merge.
When the task fits, the result lands at 95–99% of what I wanted.
Task selection is the work.
The harness compounds.
Each fix is permanent. The next session starts smarter than the last.
Be on the loop. Not in.
With ideas fromGeoffrey Huntley · Dex Horthy · Matt Pocock · Steve Yegge
Ryan Lopopolo · Lance Martin · Mario Zechner · Armin Ronacher
Thank you.
You’ve got the map.
The Deck is free, and it stays free. When you’re ready to build the way it describes, the Harness Starter Kithands you the founder’s actual harness — the custom skills, CLAUDE.md, the ralph-loop script and the agent_docs templates — the moment you grab it.
Follow along — new drops, plus the Screencast free when it ships.
The Deck is the map. The Harness Starter Kit is the toolkit — founders lock the Founding price.