An agent is a language model placed inside a loop with access to tools. Instead of producing one answer, it reasons, takes an action, observes the result, and repeats until done. This note covers the mechanics and when the extra machinery actually pays off.
The loop
βββββββββββββββββββββββββββββββ
βΌ β
βββββββββββ ββββββββββ ββββββββββββ
β THINK β β β ACT β β β OBSERVE β
β (reason)β β(call β β (read β
β β β tool) β β result) β
βββββββββββ ββββββββββ ββββββββββββ
β² β
βββββββββββββ until done ββββββ
Each turn the model decides whether it has enough information to answer or whether it needs to call a tool. Tool results are appended to the conversation and fed back, so the modelβs context grows with everything it has learned.
How tool calls actually work
The model doesnβt execute anything itself. You declare tools as structured schemas; the model emits a request to call one; your code runs it and returns the output.
const tools = [{
name: "get_weather",
description: "Current weather for a city",
input_schema: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
}];
The exchange is a strict turn-taking protocol:
| Step | Who | Content |
|---|---|---|
| 1 | model | βI need weather for Lisbonβ β tool_use(get_weather, {city: βLisbonβ}) |
| 2 | your code | run the function, get 18Β°C, clear |
| 3 | you | send tool_result back into the conversation |
| 4 | model | continue reasoning, or answer |
The model decides which tool and what arguments; your runtime owns execution and the loopβs stopping condition.
A minimal loop
messages = [{"role": "user", "content": task}]
while True:
resp = model.run(messages, tools=tools)
messages.append(resp.message)
if not resp.tool_calls: # model gave a final answer
return resp.text
for call in resp.tool_calls:
result = dispatch(call.name, call.args)
messages.append(tool_result(call.id, result))
Two things keep this safe in production: a max-iterations cap so a confused model canβt loop forever, and validation of tool arguments before execution (the model can and will hallucinate malformed inputs).
When a loop beats a single prompt
A loop adds latency, cost, and failure surface. Use one only when the task genuinely needs it.
- Unknown number of steps. βFind the bug and fix itβ β you canβt know upfront how many files to read.
- The model needs ground truth it canβt have. Live data, code execution, search results. A single prompt can only guess.
- Error recovery. A test fails, the agent reads the output and tries again. Single prompts canβt observe their own mistakes.
Conversely, prefer a single prompt when the task is bounded and self-contained β summarize this text, classify this ticket, rewrite this paragraph. No external state means no reason to loop, and the loop only adds ways to go wrong.
Failure modes
- Looping without progress β same tool, same args, same result. Detect repetition and break.
- Context bloat β every observation accumulates; long tasks blow the window. Summarize or prune old turns.
- Over-eager tool use β the model calls tools when it already knows the answer. Sharpen tool descriptions and system guidance.
Wrap up
- An agent is just an LLM in a thinkβactβobserve loop with tools; your code owns execution and termination.
- Reach for a loop only when step count is unknown or the model needs real-world feedback.
- Always cap iterations, validate tool inputs, and guard against no-progress loops.
References
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models (2022)
- Schick et al., Toolformer: Language Models Can Teach Themselves to Use Tools (2023)