ReAct
How it works
1. Prompt construction: a system instruction describing the available tools (with name, description, and argument schema) + a few demonstrations in ReAct format (Thought/Action/Observation/Finish) + the user query. 2. Agent loop: a. The model generates a "Thought:" token + reasoning over everything seen so far (query + action/observation history). b. The model generates "Action:" + the tool name and arguments (typically JSON or a function call). c. Generation is stopped at a separator (e.g. "Observation:"). The orchestrator parses the action, executes the tool, and pastes the result as "Observation: <result>". d. The loop returns to (a) — the model sees the new observation and decides on the next step. 3. Stop condition: the model emits "Action: Finish[<answer>]" or exceeds a step limit (typically 5–15). 4. Validation: the orchestrator may parse the final answer, check format, and optionally force re-iteration. 5. Function-calling variant: in modern APIs (OpenAI, Anthropic), Action is a structured token (function call) rather than text — the API returns an object that does not require text parsing.
Problem solved
Pure Chain-of-Thought hallucinates facts not in the model's weights — particularly on tasks requiring current knowledge or multi-hop reasoning. Pure tool-using LLMs without explicit reasoning select actions impulsively, without a plan, and cannot recover from unexpected observations. ReAct addresses both problems simultaneously: reasoning provides plan and context, actions verify facts in the world, observations update the plan.
Components
A natural-language token preceding an action; used for planning, problem decomposition, evaluating observations, and deciding on the next step. Mechanically it is a Chain-of-Thought reasoning fragment generated within the agent loop.
A token containing the tool name and its arguments. In the original ReAct, the format is textual: 'Action: search[Apple Remote]'. In modern implementations, the format is structured (function-call JSON). The special 'Finish[<answer>]' action terminates the loop.
Official
The result of executing the action, injected back into the model's context. It may be a Wikipedia page snippet, a SQL query result, an API JSON, file contents, etc. The format depends on the tool.
A component external to the LLM that: (1) parses the generated Action, (2) invokes the actual tool, (3) injects the Observation into the context, (4) detects Finish or the step limit. Implemented by frameworks: LangChain AgentExecutor, LlamaIndex ReActAgent, OpenAI Assistants Runner.
Official
A pre-defined action space: a list of tools with descriptions and argument schemas. In the original ReAct for HotpotQA: search[entity], lookup[keyword], finish[answer]. In modern agents: dozens/hundreds of tools via function calling.
Official
Implementation
Without a hard step limit, the agent can call tools indefinitely (e.g. repeat the same action expecting a different result). This is one of the most common ReAct failure modes.
The model may generate part of an Observation as its own continuation instead of waiting for the real tool result. The result is decisions based on false "facts."
In the original textual ReAct, the model can produce a malformed Action (missing brackets, unknown tool, invalid arguments). Each such error breaks the loop or requires a retry.
Each Observation (especially search results, web pages) adds hundreds/thousands of tokens. After a few iterations the context can exceed the model's window or cause cost explosion.
With a large tool set (≥30), the model often picks a suboptimal tool, confused by similar names or ambiguous descriptions.
When a tool returns an error (timeout, 500, unexpected format), the agent may loop on repeating the same action or give up.
Evolution
Wei et al. show that explicit reasoning steps in the prompt improve LLM ability to solve complex tasks. ReAct builds on CoT by adding actions and observations.
WebGPT (OpenAI 2022) and Toolformer (Schick et al. 2023) show that LLMs can invoke tools, but without interleaved reasoning. ReAct addresses the missing planning layer.
Yao et al. (Princeton + Google Brain) introduce the Thought/Action/Observation pattern, demonstrating improvement over CoT and tool-only baselines on HotpotQA, FEVER, ALFWorld, and WebShop.
LangChain (launched October 2022) popularizes ReAct as the standard agent pattern. The pattern becomes the de-facto standard for agentic applications in 2023.
Reflexion extends ReAct with an outer loop: after a failed episode, the agent generates a self-critique stored in memory that conditions the next attempt. Improves HumanEval from 80% (GPT-4 + ReAct) to 91%.
OpenAI introduces function calling in June 2023, turning Action from text into a structured JSON object. Removes the need for text parsing of Action — Anthropic and Google follow suit.
Anthropic publishes MCP in November 2024 — an open standard for LLM↔tool communication, generalizing the Action layer of ReAct across the entire provider ecosystem.
Reasoning models (o1, o3, DeepSeek-R1) generate extended internal reasoning trained via RL. ReAct evolves: the Thought layer is absorbed into the model, while the external orchestrator focuses on actions and memory.
Technical details
Hyperparameters (configurable axes)
Limit on Thought→Action→Observation iterations to prevent infinite loops. When exceeded, the agent is forced to answer or terminates with an error.
Size of the action space. More tools increase flexibility but strain the context window and make correct action selection harder.
How tool calls are encoded: textual (original ReAct) vs structured (JSON function call) vs MCP.
Few-shot demonstrations of full Thought/Action/Observation/Finish trajectories. Critical for smaller models; for GPT-4/Claude, 0–2 are typically sufficient.
Computational complexity
Time complexity: O(N · (L_ctx · d + T_tool)). Space complexity: O(N · L_step + N_tools · L_tool_def).
Compute bottleneck
Each loop step requires a full LLM prefill over the entire history so far + autoregressive generation of a new Thought and Action. Without KV-cache, the cost is quadratic in the number of steps.
Execution paradigm
The LLM itself runs in dense mode (every forward pass activates all parameters), but the full ReAct application is conditional/stage-dependent: each loop stage has a different context state and a different generation goal.
The orchestrator (agent executor) decides whether to re-invoke the model after injecting the observation. This is not routing inside the model, but an external control loop.
Parallelism
Multiple independent ReAct trajectories (e.g. for different queries or in Tree-of-Thoughts search) can be run in parallel as a batch. A single trajectory remains sequential.
Hardware requirements
ReAct is a pattern applied to a standard LLM that runs most efficiently on GPUs with tensor cores. Hardware requirements come solely from the base model.
The ReAct pattern itself has no specific hardware requirements — it can be realized by any LLM, locally (Ollama, vLLM) or via API.