Robots AtlasRobots Atlas

ReAct (Reasoning + Acting)

Interleaving Chain-of-Thought reasoning with action execution (tool calls) in a single LLM-generated stream — allowing the model to plan steps, gather information from the external world, and revise its plan based on observations.

Category
Abstraction level
Operation level
Web-search agents (Perplexity, You.com, ChatGPT search)Multi-hop question answering with knowledge bases (HotpotQA, FEVER)Code assistants that execute scripts and read outputs (Cursor, GitHub Copilot Workspace)Browser-using agents (Browser Use, Anthropic Computer Use)Autonomous research agents (The AI Scientist, AutoGPT)Assistants performing many operations across connected apps (LangChain Agent, OpenAI Assistants)Robotic agents planning sequences of manipulations (PaLM-E, RT-2 with language grounding)Workflow automation — agents invoking CRM, ERP, Slack APIs (Zapier, n8n agents)

1. Prompt construction: a system instruction describing the available tools (with name, description, and argument schema) + a few demonstrations in ReAct format (Thought/Action/Observation/Finish) + the user query. 2. Agent loop: a. The model generates a "Thought:" token + reasoning over everything seen so far (query + action/observation history). b. The model generates "Action:" + the tool name and arguments (typically JSON or a function call). c. Generation is stopped at a separator (e.g. "Observation:"). The orchestrator parses the action, executes the tool, and pastes the result as "Observation: <result>". d. The loop returns to (a) — the model sees the new observation and decides on the next step. 3. Stop condition: the model emits "Action: Finish[<answer>]" or exceeds a step limit (typically 5–15). 4. Validation: the orchestrator may parse the final answer, check format, and optionally force re-iteration. 5. Function-calling variant: in modern APIs (OpenAI, Anthropic), Action is a structured token (function call) rather than text — the API returns an object that does not require text parsing.

Pure Chain-of-Thought hallucinates facts not in the model's weights — particularly on tasks requiring current knowledge or multi-hop reasoning. Pure tool-using LLMs without explicit reasoning select actions impulsively, without a plan, and cannot recover from unexpected observations. ReAct addresses both problems simultaneously: reasoning provides plan and context, actions verify facts in the world, observations update the plan.

01

Thought (reasoning step)

Planning and agent-loop control

A natural-language token preceding an action; used for planning, problem decomposition, evaluating observations, and deciding on the next step. Mechanically it is a Chain-of-Thought reasoning fragment generated within the agent loop.

02

Action (tool invocation)

Interaction with the external world

Modular

A token containing the tool name and its arguments. In the original ReAct, the format is textual: 'Action: search[Apple Remote]'. In modern implementations, the format is structured (function-call JSON). The special 'Finish[<answer>]' action terminates the loop.

Text-based ReAct formatJSON function callMCP (Model Context Protocol)
03

Observation (tool result)

Updating the reasoning state with facts from the world

The result of executing the action, injected back into the model's context. It may be a Wikipedia page snippet, a SQL query result, an API JSON, file contents, etc. The format depends on the tool.

04

Loop orchestrator (agent executor)

Driving the Thought→Action→Observation loop

Modular

A component external to the LLM that: (1) parses the generated Action, (2) invokes the actual tool, (3) injects the Observation into the context, (4) detects Finish or the step limit. Implemented by frameworks: LangChain AgentExecutor, LlamaIndex ReActAgent, OpenAI Assistants Runner.

05

Available tool set

Defines what the agent can do in the world

Modular

A pre-defined action space: a list of tools with descriptions and argument schemas. In the original ReAct for HotpotQA: search[entity], lookup[keyword], finish[answer]. In modern agents: dozens/hundreds of tools via function calling.

Time

N = number of loop steps, L_ctx = context length (growing each step by Thought+Action+Observation), d = model dimension, T_tool = tool execution time. Each step requires one LLM forward pass over the growing context + one tool invocation.

Cumulative cost due to growing context: step N consumes O(N²) tokens total. KV-cache reduces this to linear.

Memory complexity

N = number of steps, L_step = average length of a single Thought/Action/Observation step, N_tools = number of tools in the prompt, L_tool_def = length of a tool definition.

The full trajectory must fit in the context window. For 30+-step agents, long-context models (≥128k tokens) or history-summarization mechanisms are required.

Wąskie gardło: Sequential forward passes over a growing context

Each loop step requires a full LLM prefill over the entire history so far + autoregressive generation of a new Thought and Action. Without KV-cache, the cost is quadratic in the number of steps.

Parallelism

Sequential

Multiple independent ReAct trajectories (e.g. for different queries or in Tree-of-Thoughts search) can be run in parallel as a batch. A single trajectory remains sequential.

Paradigm

Dense

Stage dependent

The LLM itself runs in dense mode (every forward pass activates all parameters), but the full ReAct application is conditional/stage-dependent: each loop stage has a different context state and a different generation goal.

Maximum loop steps

Critical
  • 5–10Standard range for most QA and tool-use tasks.
  • 30–50Complex research tasks (AutoGPT, The AI Scientist).

Limit on Thought→Action→Observation iterations to prevent infinite loops. When exceeded, the agent is forced to answer or terminates with an error.

Number of available tools

Standard
  • 3Original ReAct (HotpotQA): search/lookup/finish.
  • 10–50Typical production agent.
  • 1000+Tool retrieval — dynamically surfacing relevant tools (Toolshed, Gorilla).

Size of the action space. More tools increase flexibility but strain the context window and make correct action selection harder.

Action format

Standard
  • text 'Action: tool[args]'Original format by Yao et al.
  • JSON function callStandard OpenAI / Anthropic API.

How tool calls are encoded: textual (original ReAct) vs structured (JSON function call) vs MCP.

Number of ReAct demonstrations in the prompt

Standard
  • 6Standard from the Yao et al. paper.
  • 0Native function calling in modern instruction-tuned models.

Few-shot demonstrations of full Thought/Action/Observation/Finish trajectories. Critical for smaller models; for GPT-4/Claude, 0–2 are typically sufficient.

Common pitfalls

Infinite loops and inability to terminate
CRITICAL

Without a hard step limit, the agent can call tools indefinitely (e.g. repeat the same action expecting a different result). This is one of the most common ReAct failure modes.

Always set max_steps (typically 10–30). Add detection of repeated actions and force a Finish.

Hallucinated tool observations
HIGH

The model may generate part of an Observation as its own continuation instead of waiting for the real tool result. The result is decisions based on false "facts."

Use stop tokens (e.g. halt generation at 'Observation:'). In function calling, the issue does not arise because Action is a structured token.

Parse errors in textual action format
HIGH

In the original textual ReAct, the model can produce a malformed Action (missing brackets, unknown tool, invalid arguments). Each such error breaks the loop or requires a retry.

Prefer structured function calling when available. If using a textual format, add validation with a readable error returned as Observation so the model can self-correct.

Context explosion in long trajectories
HIGH

Each Observation (especially search results, web pages) adds hundreds/thousands of tokens. After a few iterations the context can exceed the model's window or cause cost explosion.

Apply observation compression (summarization, salient-fragment extraction). Use sliding-window memory or external vector memory for older steps.

Selecting an unsuitable tool (tool misuse)
MEDIUM

With a large tool set (≥30), the model often picks a suboptimal tool, confused by similar names or ambiguous descriptions.

Write precise, non-overlapping tool descriptions with usage examples. For very large sets, apply tool retrieval (e.g. Toolshed, Gorilla).

No recovery strategy after tool failures
MEDIUM

When a tool returns an error (timeout, 500, unexpected format), the agent may loop on repeating the same action or give up.

Return errors as a readable Observation with a suggested alternative. Implement retry with backoff and fallback to a different tool.

GENESIS · Source paper

ReAct: Synergizing Reasoning and Acting in Language Models
2022ICLR 2023Shunyu Yao, Jeffrey Zhao, Dian Yu et al.
2022

Chain-of-Thought Prompting (Wei et al.)

Wei et al. show that explicit reasoning steps in the prompt improve LLM ability to solve complex tasks. ReAct builds on CoT by adding actions and observations.

2022

WebGPT and Toolformer — tools without explicit reasoning

WebGPT (OpenAI 2022) and Toolformer (Schick et al. 2023) show that LLMs can invoke tools, but without interleaved reasoning. ReAct addresses the missing planning layer.

2022

ReAct paper published (Yao et al.)

breakthrough

Yao et al. (Princeton + Google Brain) introduce the Thought/Action/Observation pattern, demonstrating improvement over CoT and tool-only baselines on HotpotQA, FEVER, ALFWorld, and WebShop.

2023

LangChain implements ReAct as AgentExecutor

LangChain (launched October 2022) popularizes ReAct as the standard agent pattern. The pattern becomes the de-facto standard for agentic applications in 2023.

2023

Reflexion — episodic memory and self-critique (Shinn et al.)

Reflexion extends ReAct with an outer loop: after a failed episode, the agent generates a self-critique stored in memory that conditions the next attempt. Improves HumanEval from 80% (GPT-4 + ReAct) to 91%.

2023

OpenAI Function Calling — actions internalized in the API

breakthrough

OpenAI introduces function calling in June 2023, turning Action from text into a structured JSON object. Removes the need for text parsing of Action — Anthropic and Google follow suit.

2024

Model Context Protocol (Anthropic) — tool standardization

Anthropic publishes MCP in November 2024 — an open standard for LLM↔tool communication, generalizing the Action layer of ReAct across the entire provider ecosystem.

2024

Native reasoning models — reasoning internalized (OpenAI o1)

breakthrough

Reasoning models (o1, o3, DeepSeek-R1) generate extended internal reasoning trained via RL. ReAct evolves: the Thought layer is absorbed into the model, while the external orchestrator focuses on actions and memory.

GPU Tensor CoresPRIMARY

ReAct is a pattern applied to a standard LLM that runs most efficiently on GPUs with tensor cores. Hardware requirements come solely from the base model.

Hardware agnosticGOOD

The ReAct pattern itself has no specific hardware requirements — it can be realized by any LLM, locally (Ollama, vLLM) or via API.

BUILT ON

LLM

A Large Language Model (LLM) is a class of machine learning models based on the Transformer architecture, trained on large text datasets via autoregressive language modeling (next-token prediction). These models have billions of parameters and can generate coherent text, answer questions, write code, translate languages, and perform many other language-cognitive tasks without task-specific fine-tuning. The term covers models such as GPT, LLaMA, Gemini, Claude, and Mistral. Most modern LLMs are instruction-tuned (SFT + RLHF) after the pre-training phase.

GO TO CONCEPT
CoT

Chain-of-Thought (CoT) Reasoning is a prompting technique introduced by Wei et al. (2022) in which a large language model is induced to generate a series of intermediate natural-language reasoning steps as part of its output, prior to producing a final answer. The technique was shown to significantly improve LLM performance on arithmetic, commonsense, and symbolic reasoning benchmarks where standard few-shot prompting yields flat or poor results. In the original formulation (few-shot CoT), a small number of exemplar question-answer pairs are included in the prompt, where each answer consists of a chain of thought followed by the final answer. The model learns from these demonstrations to produce its own reasoning chains. A subsequent zero-shot variant (Kojima et al., 2022) showed that appending the phrase 'Let's think step by step' to a question is sufficient to elicit reasoning chains from large models without any exemplars. CoT is an emergent property: empirical results in the originating paper show that reasoning ability via CoT prompting appears only in models above a certain parameter threshold (approximately 100B parameters for the models tested in 2022), with smaller models not benefiting or performing worse. This relationship has been revisited by subsequent work as smaller models have been fine-tuned on CoT data. Key extensions include Self-Consistency CoT (Wang et al., 2022), which samples multiple reasoning paths and selects the most frequent final answer; Tree of Thoughts (Yao et al., 2023), which frames reasoning as search over a tree of intermediate thoughts; and native reasoning models such as OpenAI o1 (2024) and DeepSeek-R1 (2025), which internalize extended reasoning through reinforcement learning on process reward signals rather than relying on prompting.

GO TO CONCEPT
ICL

In-Context Learning (ICL) is the ability of large language models to perform a new task from a handful of examples (called demonstrations or shots) given directly in the prompt, without modifying model weights. The concept was formalized by Brown et al. (2020) in the GPT-3 paper "Language Models are Few-Shot Learners" as an emergent capability of models at ≥175B-parameter scale. In ICL, the prompt contains k (input, output) pairs demonstrating the task, followed by a new query input. Conditioned on these examples, the model produces output following the demonstration pattern. The number of examples k defines variants: zero-shot (k=0, natural-language task description only), one-shot (k=1), and few-shot (k=2–32, typically 4–8). Brown et al. showed that GPT-3 175B achieves competitive performance against fine-tuned models on many NLP tasks — using few-shot prompting alone. The underlying mechanism of ICL remains an active research topic. Main hypotheses: (1) ICL implements implicit gradient descent in attention activation space (Akyürek et al. 2022, von Oswald et al. 2023); (2) models perform pattern matching over distributions of patterns seen during pretraining (Xie et al. 2022 — Bayesian inference framework); (3) ICL relies on induction heads — attention structures forming during pretraining (Olsson et al. 2022, Anthropic). Empirically, demonstration quality, ordering, and even labels significantly affect performance (Min et al. 2022). ICL is the foundation of a broader family of prompt-engineering techniques: Chain-of-Thought (Wei et al. 2022) extends ICL with reasoning chains in demonstrations, instruction tuning (FLAN, T0) strengthens zero-shot ICL, and Retrieval-Augmented Generation dynamically selects demonstrations from a knowledge base. ICL became the dominant paradigm for using LLMs from 2022–2024, before being supplemented by instruction-tuned models requiring fewer or no examples.

GO TO CONCEPT

Connects

CoT

Chain-of-Thought (CoT) Reasoning is a prompting technique introduced by Wei et al. (2022) in which a large language model is induced to generate a series of intermediate natural-language reasoning steps as part of its output, prior to producing a final answer. The technique was shown to significantly improve LLM performance on arithmetic, commonsense, and symbolic reasoning benchmarks where standard few-shot prompting yields flat or poor results. In the original formulation (few-shot CoT), a small number of exemplar question-answer pairs are included in the prompt, where each answer consists of a chain of thought followed by the final answer. The model learns from these demonstrations to produce its own reasoning chains. A subsequent zero-shot variant (Kojima et al., 2022) showed that appending the phrase 'Let's think step by step' to a question is sufficient to elicit reasoning chains from large models without any exemplars. CoT is an emergent property: empirical results in the originating paper show that reasoning ability via CoT prompting appears only in models above a certain parameter threshold (approximately 100B parameters for the models tested in 2022), with smaller models not benefiting or performing worse. This relationship has been revisited by subsequent work as smaller models have been fine-tuned on CoT data. Key extensions include Self-Consistency CoT (Wang et al., 2022), which samples multiple reasoning paths and selects the most frequent final answer; Tree of Thoughts (Yao et al., 2023), which frames reasoning as search over a tree of intermediate thoughts; and native reasoning models such as OpenAI o1 (2024) and DeepSeek-R1 (2025), which internalize extended reasoning through reinforcement learning on process reward signals rather than relying on prompting.

GO TO CONCEPT
Tool-augmented LLM

Tool-augmented LLM is an architectural pattern in which a large language model is equipped with access to one or more external tools that it can invoke during inference by generating structured function-call or API-call outputs. The model learns when and how to call tools by producing special tokens or structured output (e.g., JSON function calls) that are intercepted by a host runtime, executed against the tool, and whose results are returned to the model as new context for continued generation. The canonical formalization appeared in the Toolformer paper (Schick et al., Meta AI, 2023), which demonstrated that LLMs can learn to self-supervise their own tool-use through API call annotation without requiring large labeled datasets. Toolformer showed that models trained this way can decide which tools to call, when, and with which arguments, and that tool use substantially improves performance on tasks requiring fresh information, arithmetic, multilingual lookup, and question answering. The pattern encompasses several mechanisms: (1) in-context tool specification, where tool interfaces are described in the system prompt or context (JSON Schema, OpenAPI, natural language); (2) function calling APIs, where the model produces structured output matched to a defined schema and the host dispatches the call; (3) ReAct-style interleaving, where the model alternates reasoning traces with tool-use observations; and (4) parallel tool calling, where the model emits multiple tool calls simultaneously to be executed concurrently. Key implementations include OpenAI function calling (GPT-4, June 2023), Anthropic tool use (Claude, 2023), Google Gemini function calling, and the Model Context Protocol (MCP, 2024) which standardizes tool server connectivity.

GO TO CONCEPT

Commonly used with

AI Agents (Autonomous Agents)

An AI Agent (autonomous agent) is a single, autonomous system based on an AI model — most often an LLM — that dynamically directs its own process and tool usage to accomplish a given goal. In Anthropic's definition (December 2024), an agent is a system in which an LLM independently controls its actions, in contrast to a workflow, where LLMs and tools are orchestrated through predefined code paths. An AI Agent is the concrete executable artifact of the Agentic AI paradigm — analogous to how a microservice is an instance of the microservice paradigm. A single agent has a clearly defined goal, access to a set of tools (web search, code execution, file operations, APIs, MCP), memory (in-context and optionally external), a control loop (perceive → reason → act → observe), and termination conditions (goal achievement, max_steps, escalation). The agent starts work from a command or interactive discussion with a human; once the task is clarified, it operates independently, optionally returning for further information or approval. During execution it obtains "ground truth" from the environment after each step (tool results, code execution) and may pause at checkpoints. In practice, an AI Agent is typically just an LLM using tools in a loop based on environmental feedback — the implementation is often simpler than a framework, but requires care in designing the agent-computer interface (ACI) and tool documentation. AI Agent should be distinguished from related concepts: Agentic AI is the paradigm (class of systems), an AI Agent is an instance (concrete actor); a Multi-Agent System is a collective of multiple cooperating agents; a Workflow is a predefined orchestration of LLMs without decisional autonomy.

GO TO CONCEPT
Agentic AI

Agentic AI denotes an architectural transition from single-turn, stateless generative models toward goal-directed systems capable of autonomous perception, planning, action, and adaptation through iterative control loops. An agentic system wraps a large language model in a runtime that gives the model access to tools (web search, code execution, APIs, file I/O), persistent memory, and feedback from prior steps. The model then decides dynamically which tools to call, in what order, and whether to loop or stop, rather than following a predefined code path. Two primary system types are commonly distinguished: (1) Workflows, in which LLMs and tools are orchestrated through predefined code paths, and (2) Agents, in which the LLM itself directs its process and tool usage dynamically. Both can be composed into multi-agent systems where specialized agents collaborate, with one acting as orchestrator and others as subagents. Key design patterns identified by Anthropic (2024) include prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops. Andrew Ng's 2024 taxonomy describes four foundational patterns: Reflection, Tool Use, Planning, and Multi-Agent Collaboration. Formal frameworks model agentic control loops as Partially Observable Markov Decision Processes (POMDPs). The control loop is: perceive state → reason/plan → select action → execute tool → observe result → update state → repeat. Agentic systems introduce risks not present in single-turn models, including hallucination in action, prompt injection through observed content, infinite loops, reward hacking, and tool misuse.

GO TO CONCEPT
Tool-augmented LLM

Tool-augmented LLM is an architectural pattern in which a large language model is equipped with access to one or more external tools that it can invoke during inference by generating structured function-call or API-call outputs. The model learns when and how to call tools by producing special tokens or structured output (e.g., JSON function calls) that are intercepted by a host runtime, executed against the tool, and whose results are returned to the model as new context for continued generation. The canonical formalization appeared in the Toolformer paper (Schick et al., Meta AI, 2023), which demonstrated that LLMs can learn to self-supervise their own tool-use through API call annotation without requiring large labeled datasets. Toolformer showed that models trained this way can decide which tools to call, when, and with which arguments, and that tool use substantially improves performance on tasks requiring fresh information, arithmetic, multilingual lookup, and question answering. The pattern encompasses several mechanisms: (1) in-context tool specification, where tool interfaces are described in the system prompt or context (JSON Schema, OpenAPI, natural language); (2) function calling APIs, where the model produces structured output matched to a defined schema and the host dispatches the call; (3) ReAct-style interleaving, where the model alternates reasoning traces with tool-use observations; and (4) parallel tool calling, where the model emits multiple tool calls simultaneously to be executed concurrently. Key implementations include OpenAI function calling (GPT-4, June 2023), Anthropic tool use (Claude, 2023), Google Gemini function calling, and the Model Context Protocol (MCP, 2024) which standardizes tool server connectivity.

GO TO CONCEPT
MAS

Multi-Agent Systems (MAS) are a paradigm in Distributed Artificial Intelligence in which multiple autonomous software entities — agents — interact within a shared environment to achieve individual or collective goals. Each agent perceives its environment through sensors or interfaces, reasons about its state, and acts through actuators or API calls. In the context of LLM-based MAS (emerging prominently from 2023 onward), agents are powered by large language models that provide the cognitive core (planning, reasoning, natural language communication), supplemented by memory modules, tool-use interfaces, and role-specific prompts. The system architecture defines how agents coordinate: coordination topologies include sequential pipelines, hierarchical orchestration (orchestrator-worker), parallel fan-out/fan-in, publish-subscribe messaging, and decentralized peer-to-peer communication. Core agent properties, as defined by Wooldridge and Jennings (1995), include autonomy, social ability, reactivity, and pro-activeness. In LLM-based systems, key components are: the agent (an LLM with a system prompt defining its role), a communication channel (natural language messages, structured function calls, or shared memory), an orchestrator or coordinator (managing task decomposition, routing, and state), tool-use interfaces (external APIs, code execution, web search), and a memory subsystem (short-term context, long-term vector storage). Prominent frameworks implementing LLM-based MAS include AutoGen (Microsoft, 2023), CAMEL (2023), MetaGPT (2023), CrewAI, and LangGraph.

GO TO CONCEPT