Robots Atlas>ROBOTS ATLAS
Agents

ReAct

2022ActiveUpdated: 5 May 2026Published
Key innovation
Interleaving Chain-of-Thought reasoning with action execution (tool calls) in a single LLM-generated stream — allowing the model to plan steps, gather information from the external world, and revise its plan based on observations.
Category
Agents
Abstraction level
Pattern
Operation level
InferenceOrchestrationAgent runtime
Use cases
Web-search agents (Perplexity, You.com, ChatGPT search)Multi-hop question answering with knowledge bases (HotpotQA, FEVER)Code assistants that execute scripts and read outputs (Cursor, GitHub Copilot Workspace)Browser-using agents (Browser Use, Anthropic Computer Use)Autonomous research agents (The AI Scientist, AutoGPT)Assistants performing many operations across connected apps (LangChain Agent, OpenAI Assistants)Robotic agents planning sequences of manipulations (PaLM-E, RT-2 with language grounding)Workflow automation — agents invoking CRM, ERP, Slack APIs (Zapier, n8n agents)

How it works

1. Prompt construction: a system instruction describing the available tools (with name, description, and argument schema) + a few demonstrations in ReAct format (Thought/Action/Observation/Finish) + the user query. 2. Agent loop: a. The model generates a "Thought:" token + reasoning over everything seen so far (query + action/observation history). b. The model generates "Action:" + the tool name and arguments (typically JSON or a function call). c. Generation is stopped at a separator (e.g. "Observation:"). The orchestrator parses the action, executes the tool, and pastes the result as "Observation: <result>". d. The loop returns to (a) — the model sees the new observation and decides on the next step. 3. Stop condition: the model emits "Action: Finish[<answer>]" or exceeds a step limit (typically 5–15). 4. Validation: the orchestrator may parse the final answer, check format, and optionally force re-iteration. 5. Function-calling variant: in modern APIs (OpenAI, Anthropic), Action is a structured token (function call) rather than text — the API returns an object that does not require text parsing.

Problem solved

Pure Chain-of-Thought hallucinates facts not in the model's weights — particularly on tasks requiring current knowledge or multi-hop reasoning. Pure tool-using LLMs without explicit reasoning select actions impulsively, without a plan, and cannot recover from unexpected observations. ReAct addresses both problems simultaneously: reasoning provides plan and context, actions verify facts in the world, observations update the plan.

Components

Thought (reasoning step)Planning and agent-loop control

A natural-language token preceding an action; used for planning, problem decomposition, evaluating observations, and deciding on the next step. Mechanically it is a Chain-of-Thought reasoning fragment generated within the agent loop.

OUTNatural-language text ending with a separator token (e.g. newline + 'Action:').
Action (tool invocation)Interaction with the external world

A token containing the tool name and its arguments. In the original ReAct, the format is textual: 'Action: search[Apple Remote]'. In modern implementations, the format is structured (function-call JSON). The special 'Finish[<answer>]' action terminates the loop.

Text-based ReAct format'Action: tool_name[args]' — original form from Yao et al.
JSON function callStructured {name, arguments} object used in OpenAI/Anthropic APIs.
MCP (Model Context Protocol)Anthropic standard for integrating tools with agents; actions are invocations of an MCP server.

Official

Observation (tool result)Updating the reasoning state with facts from the world

The result of executing the action, injected back into the model's context. It may be a Wikipedia page snippet, a SQL query result, an API JSON, file contents, etc. The format depends on the tool.

INText (or JSON serialized as text), inserted into the context after the 'Observation:' token.
Loop orchestrator (agent executor)Driving the Thought→Action→Observation loop

A component external to the LLM that: (1) parses the generated Action, (2) invokes the actual tool, (3) injects the Observation into the context, (4) detects Finish or the step limit. Implemented by frameworks: LangChain AgentExecutor, LlamaIndex ReActAgent, OpenAI Assistants Runner.

Official

Available tool setDefines what the agent can do in the world

A pre-defined action space: a list of tools with descriptions and argument schemas. In the original ReAct for HotpotQA: search[entity], lookup[keyword], finish[answer]. In modern agents: dozens/hundreds of tools via function calling.

Official

Implementation

Implementation pitfalls
Infinite loops and inability to terminateCritical

Without a hard step limit, the agent can call tools indefinitely (e.g. repeat the same action expecting a different result). This is one of the most common ReAct failure modes.

Fix:Always set max_steps (typically 10–30). Add detection of repeated actions and force a Finish.
Hallucinated tool observationsHigh

The model may generate part of an Observation as its own continuation instead of waiting for the real tool result. The result is decisions based on false "facts."

Fix:Use stop tokens (e.g. halt generation at 'Observation:'). In function calling, the issue does not arise because Action is a structured token.
Parse errors in textual action formatHigh

In the original textual ReAct, the model can produce a malformed Action (missing brackets, unknown tool, invalid arguments). Each such error breaks the loop or requires a retry.

Fix:Prefer structured function calling when available. If using a textual format, add validation with a readable error returned as Observation so the model can self-correct.
Context explosion in long trajectoriesHigh

Each Observation (especially search results, web pages) adds hundreds/thousands of tokens. After a few iterations the context can exceed the model's window or cause cost explosion.

Fix:Apply observation compression (summarization, salient-fragment extraction). Use sliding-window memory or external vector memory for older steps.
Selecting an unsuitable tool (tool misuse)Medium

With a large tool set (≥30), the model often picks a suboptimal tool, confused by similar names or ambiguous descriptions.

Fix:Write precise, non-overlapping tool descriptions with usage examples. For very large sets, apply tool retrieval (e.g. Toolshed, Gorilla).
No recovery strategy after tool failuresMedium

When a tool returns an error (timeout, 500, unexpected format), the agent may loop on repeating the same action or give up.

Fix:Return errors as a readable Observation with a suggested alternative. Implement retry with backoff and fallback to a different tool.

Evolution

Original paper · 2022 · ICLR 2023 · Shunyu Yao
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
2022
Chain-of-Thought Prompting (Wei et al.)

Wei et al. show that explicit reasoning steps in the prompt improve LLM ability to solve complex tasks. ReAct builds on CoT by adding actions and observations.

2022
WebGPT and Toolformer — tools without explicit reasoning

WebGPT (OpenAI 2022) and Toolformer (Schick et al. 2023) show that LLMs can invoke tools, but without interleaved reasoning. ReAct addresses the missing planning layer.

2022
ReAct paper published (Yao et al.)
Inflection point

Yao et al. (Princeton + Google Brain) introduce the Thought/Action/Observation pattern, demonstrating improvement over CoT and tool-only baselines on HotpotQA, FEVER, ALFWorld, and WebShop.

2023
LangChain implements ReAct as AgentExecutor

LangChain (launched October 2022) popularizes ReAct as the standard agent pattern. The pattern becomes the de-facto standard for agentic applications in 2023.

2023
Reflexion — episodic memory and self-critique (Shinn et al.)

Reflexion extends ReAct with an outer loop: after a failed episode, the agent generates a self-critique stored in memory that conditions the next attempt. Improves HumanEval from 80% (GPT-4 + ReAct) to 91%.

2023
OpenAI Function Calling — actions internalized in the API
Inflection point

OpenAI introduces function calling in June 2023, turning Action from text into a structured JSON object. Removes the need for text parsing of Action — Anthropic and Google follow suit.

2024
Model Context Protocol (Anthropic) — tool standardization

Anthropic publishes MCP in November 2024 — an open standard for LLM↔tool communication, generalizing the Action layer of ReAct across the entire provider ecosystem.

2024
Native reasoning models — reasoning internalized (OpenAI o1)
Inflection point

Reasoning models (o1, o3, DeepSeek-R1) generate extended internal reasoning trained via RL. ReAct evolves: the Thought layer is absorbed into the model, while the external orchestrator focuses on actions and memory.

Technical details

Hyperparameters (configurable axes)

Maximum loop stepsCritical

Limit on Thought→Action→Observation iterations to prevent infinite loops. When exceeded, the agent is forced to answer or terminates with an error.

5–10Standard range for most QA and tool-use tasks.
30–50Complex research tasks (AutoGPT, The AI Scientist).
Number of available toolsHigh

Size of the action space. More tools increase flexibility but strain the context window and make correct action selection harder.

3Original ReAct (HotpotQA): search/lookup/finish.
10–50Typical production agent.
1000+Tool retrieval — dynamically surfacing relevant tools (Toolshed, Gorilla).
Action formatHigh

How tool calls are encoded: textual (original ReAct) vs structured (JSON function call) vs MCP.

text 'Action: tool[args]'Original format by Yao et al.
JSON function callStandard OpenAI / Anthropic API.
Number of ReAct demonstrations in the promptMedium

Few-shot demonstrations of full Thought/Action/Observation/Finish trajectories. Critical for smaller models; for GPT-4/Claude, 0–2 are typically sufficient.

6Standard from the Yao et al. paper.
0Native function calling in modern instruction-tuned models.

Computational complexity

Time complexity: O(N · (L_ctx · d + T_tool)). Space complexity: O(N · L_step + N_tools · L_tool_def).

Compute bottleneck

Sequential forward passes over a growing context

Each loop step requires a full LLM prefill over the entire history so far + autoregressive generation of a new Thought and Action. Without KV-cache, the cost is quadratic in the number of steps.

Depends on
Liczba kroków pętli NLatencja narzędzi

Execution paradigm

Primary mode
dense

The LLM itself runs in dense mode (every forward pass activates all parameters), but the full ReAct application is conditional/stage-dependent: each loop stage has a different context state and a different generation goal.

Activation pattern
stage_dependent
Routing mechanism

The orchestrator (agent executor) decides whether to re-invoke the model after injecting the observation. This is not routing inside the model, but an external control loop.

Parallelism

Parallelism level
sequential

Multiple independent ReAct trajectories (e.g. for different queries or in Tree-of-Thoughts search) can be run in parallel as a batch. A single trajectory remains sequential.

Scope
inference
Constraints
!Each action depends on the history of observations — the loop is inherently sequential.
!Tool execution introduces network/I/O latency that cannot be parallelized away with generation.

Hardware requirements

Primary

ReAct is a pattern applied to a standard LLM that runs most efficiently on GPUs with tensor cores. Hardware requirements come solely from the base model.

Good fit

The ReAct pattern itself has no specific hardware requirements — it can be realized by any LLM, locally (Ollama, vLLM) or via API.