AI Agents (Autonomous Agents)
How it works
The agent receives a goal from the user and definitions of available tools (JSON/OpenAPI/MCP schemas) and system instructions (role, security policies). In each iteration of the loop: (1) the model analyzes the current context and decides on the next action - invoke a tool, ask a question, or terminate; (2) the host executes the chosen tool and returns the result; (3) the result is appended to the context as an observation; (4) the model decides whether to continue. The loop ends when the model determines the goal has been achieved, the max_steps limit is reached, or an error state requiring escalation to a human is detected. The agent can maintain short-term memory within the context window and long-term memory in an external storage (vector database, key-value store).
Problem solved
A single LLM invocation is unable to handle open-ended tasks, where the number of steps is not known in advance, requiring interaction with the environment, access to current data, code execution, or iterative result verification. The AI Agent solves this problem by embedding the model in a control loop with access to tools and memory, enabling autonomous end-to-end task execution.
Components
The agent's reasoning and decision engine. Generates plans, selects tools, interprets results, and decides termination. Typically an LLM post-trained with RLHF and tool-use, optionally a reasoning model (CoT with a dedicated reasoning token budget).
Definitions of callable functions with their schemas (JSON Schema, OpenAPI) and documentation. Anthropic calls this the Agent-Computer Interface (ACI) — care in designing it is critical for agent reliability. Often exposed through Model Context Protocol.
Official
Short-term memory (conversation history, tool results in context) and optional long-term memory (vector store, key-value store, episodic structures) across sessions. Determines coherence and personalization in long-running tasks.
Official
Mechanism executing iterations: fetch context → call model → parse decision → execute tool → update context → check termination. Manages limits (max_steps, time/token budgets) and detects infinite loops.
Official
Constant instruction defining the agent's identity, goal, scope of responsibility, safety rules, response format, and termination criteria. First line of defense against misbehavior and prompt injection.
Filters and validators operating before inference (input sanitization), during (tool call schema validation), and after (output control, PII redaction, blocking irreversible actions). Critical for production safety.
Official
Step logging, traces (LangSmith, Arize, Helicone), metrics (success rate, tool error rate, average steps, cost per task), and automated evaluations against test sets. Essential for production maintenance of an agent.
Official
Implementation
Unclear tool names, missing examples, ambiguous parameters — the same problems that affect junior developers affect the model. Anthropic reports spending more time optimizing tools than the agent prompt itself.
The agent may claim to have performed an action it didn't actually execute, or invoke tools with fabricated parameters — particularly dangerous in multi-step pipelines where errors propagate.
Without a hard max_steps and repetition detection, the agent can loop indefinitely, generating wrong steps based on previously wrong observations. Costs grow linearly with the number of steps.
Malicious instructions embedded in web pages, documents, or emails the agent reads can hijack its behavior by impersonating system instructions.
An agent with access to write-capable tools (delete, send_email, db_write, payment) can cause real damage based on flawed reasoning. Consequences may be irreversible.
Accumulated action history and tool results can exceed the model's context window, causing silent truncation of earlier steps and loss of relevant information.
Anthropic strongly recommends: don't build an agent when the task has a known, predefined structure. A workflow is cheaper, faster, more predictable, and easier to debug than an agent.
Evolution
Russell and Norvig formalize rational agents; Belief-Desire-Intention architectures emerge (Rao and Georgeff). The canon is defined: an agent perceives the environment and takes goal-oriented actions.
Yao et al. (2022) demonstrate that LLMs can interleave Chain-of-Thought with tool calls in a single loop. The practical definition of an LLM-based AI Agent.
Virally popular implementations show autonomous GPT-4 agents performing multi-step tasks. Despite limited reliability, they show the potential and popularize the term.
OpenAI (June 2023) introduces function calling in GPT-4; Anthropic and Google follow. First-class agent support at the level of commercial model APIs.
Anthropic publishes (December 2024) guidelines distinguishing agents from workflows and five composition patterns. The canonical definition of an agent: a system in which an LLM dynamically directs its own process.
Anthropic introduces Computer Use in Claude (October 2024) — the agent clicks, types, and moves the mouse like a human. OpenAI's Operator (2025) follows. Opens a class of GUI-driven agents independent of APIs.
Anthropic releases MCP as an open standard for connecting LLMs to external tool servers. Enables an ecosystem of tools portable across model providers.
Sierra (March 2026) announces the Agents-as-a-Service paradigm — customers buy outcomes delivered by an agent rather than SaaS access. Agents become the unit of product delivery, not just a technical library.
Technical details
Hyperparameters (configurable axes)
Scope of decisions the agent makes without human approval — from proposal-only mode to full autonomy with rollback.
List of callable functions available to the agent. Defines the action space and is the strongest predictor of agent behavior.
Hard limit on loop iterations before forced termination. Safeguards against cost overrun and infinite loops.
How context is managed across steps and sessions: context window only, summarization, vector store, episodic structures.
Maximum computational cost or token count per agent run. Critical for production deployments with outcome-based billing.
When the agent escalates to a human: never, on demand, after N failed steps, before irreversible action, based on uncertainty signals.
Execution paradigm
An agent is distinct from a workflow, where the path is predefined in the code and the LLM only executes specific steps, whereas in an agent, the LLM controls the entire process.
At every step, the model decides which tool to invoke, whether to ask a clarification question, or to terminate — based on current context and observations. The execution path is not predefined in code.
Parallelism
Parallelism is most often achieved inter-sessionally (multiple agents for different tasks) or in orchestrator-workers patterns (one orchestrator delegates to multiple worker agents simultaneously).
Hardware requirements
Base LLM inference dominates the agent's cost and latency; GPUs with tensor cores are the standard for all modern production-grade models.
Google deploys Gemini-based agents on TPUs; comparable throughput and cost to GPUs for most workloads.
The control loop, tool parsing, and orchestration layer itself is lightweight and runs on CPU; hardware requirements stem from the base model, not from the agent's construction.