Robots Atlas>ROBOTS ATLAS
Augmentation

Tool-augmented LLM

Key innovation
Extends large language models with the ability to invoke external tools — search engines, calculators, APIs, and code interpreters — by generating structured calls during text generation, enabling access to current knowledge and precise computation unavailable in model parameters.
Category
Augmentation
Abstraction level
Pattern
Operation level
ModelInferenceOrchestrationTooling

Components

Tool specificationInforms the model about available tools, their interfaces, and expected call parameters.

A formal definition of a tool interface passed to the model — typically in JSON Schema or OpenAPI format — describing the tool's name, its parameters, and input data types. Provided in the system prompt or a dedicated API field.

JSON Schema
Natural Language Description
Model Context Protocol (MCP)

Official

Tool call generationGenerates a structured tool call during model decoding.

The module responsible for generating a structured tool call by an LLM — typically in the form of special tokens, a JSON block, or a function_call object. The model decides when and with what arguments to invoke a tool, based on context.

Function calling (OpenAI/Anthropic)
ReAct — text-prompted reasoning invocation
Toolformer — API call tokens in text
Tool Executor / HostExecutes tool calls and returns results to the model to continue generation.

The runtime environment outside the model that intercepts tool calls generated by the LLM, executes them (by calling APIs, running code, or querying a database), and returns the results to the model as new context.

Direct API Call
Sandboxed Code Execution Environment
Serwer MCP

Official

Tool Result Injection into ContextIntegrates tool outputs into the model context for subsequent generation.

The mechanism for returning tool execution results to the model's context window, enabling the model to continue generation incorporating the retrieved data. Results may be injected as a tool_result block, a new message, or special tokens.

Official

Implementation

Implementation pitfalls
Tool call argument hallucinationHigh

The model may generate tool calls with fabricated or incorrect parameters — such as fictitious function names, wrong data types, or invalid date and identifier formats. This causes silent execution errors on the host side.

Fix:Validate all tool-call arguments against a schema before execution; use schemas with strict types and constraints; log and monitor failed calls.
Tool result prompt injectionCritical

Results returned by tools (web pages, documents, API responses) may contain malicious instructions that the model treats as system commands — a classic prompt injection attack via observed content.

Fix:Isolate tool outputs from system instructions; apply explicit delimiters and source metadata; require user confirmation before executing irreversible actions based on observed content.
Excessive or infinite tool-call loopsHigh

Without hard limits, a model can invoke tools in a loop — for example, repeatedly searching the web for information unavailable in any source — exhausting the token budget and generating unnecessary API costs.

Fix:Set a hard limit on tool calls per turn/session; implement repeated-call detection; require human-in-the-loop for critical or costly calls.
Tool output context overflowHigh

Results from external APIs or search engines can be very long — HTML pages, JSON responses with many fields — quickly filling the context window and causing earlier conversational context to be lost.

Fix:Apply extraction or summarization of tool outputs before injecting them into the context; constrain output size via API parameters (token limits, pagination); monitor the context token budget.
Unnecessary tool calls for known factsMedium

Models may invoke tools (e.g., a search engine) for information already present in their parametric knowledge, unnecessarily increasing latency and cost. This is especially relevant for models with a low confidence threshold for tool invocation.

Fix:Calibrate model confidence thresholds; explicitly instruct the model in system prompts when to invoke tools versus rely on parametric knowledge; apply reflection mechanisms before tool calls.

Evolution

Original paper · 2023 · NeurIPS 2023 · Timo Schick
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
2021
WebGPT — GPT-3 augmented with a web browser
Inflection point

Nakano et al. (OpenAI) augment GPT-3 with the ability to search the web via a text-based browser interface. This was the first demonstration that an LLM can use an external information source through reinforcement learning from human feedback.

2022
TALM — tool bootstrapping via self-annotation

Parisi et al. (Google) propose TALM (Tool Augmented Language Models), in which an LLM iteratively expands its set of tool calls by filtering out those that improve results — an early step toward self-supervised learning of tool use.

2022
ReAct — interleaved reasoning and tool use
Inflection point

Yao et al. (Princeton / Google) propose ReAct: an LLM alternately generates reasoning traces (Thought) and tool calls (Action), receiving observations (Observation) from the environment. The work establishes the interleaved reasoning + tool use pattern.

2023
Toolformer — LLM learns to use tools autonomously
Inflection point

Schick et al. (Meta AI) introduce Toolformer — a model trained via self-annotated API call insertions in text, without large manually labeled datasets. The model learns when and how to invoke external tools (calculator, search engine, translator, QA system) and how to integrate their outputs.

2023
OpenAI Function Calling — commercial standardization of tool invocation
Inflection point

OpenAI introduces function calling in GPT-4 and GPT-3.5 Turbo (June 2023) — a structured API enabling the model to generate function calls in JSON Schema format. This becomes the de facto industry standard for tool augmentation.

2023
Anthropic tool use and parallel tool calls

Anthropic introduces tool use in the Claude API with support for parallel tool calling, enabling the model to generate multiple tool calls that the host executes simultaneously.

2024
Model Context Protocol (MCP) — standardization of tool connectivity
Inflection point

Anthropic published the Model Context Protocol as an open standard connecting models to external tool servers — analogous to the Language Server Protocol for developer tooling. MCP standardizes both the tool description format and the communication protocol between LLMs and tool servers.

Technical details

Hyperparameters (configurable axes)

Available Tool SetCritical

A set of tools made available to the model, defining the space of possible calls. Tools may include search engines, calculators, code interpreters, databases, external APIs, and system utilities.

web_search + code_interpreter
calculator + calendar + email
custom_domain_api + vector_db_retrieval
Parallel tool callsHigh

Whether the model can generate multiple tool calls simultaneously in a single turn (parallel tool calling), which the host executes in parallel before returning the results.

trueSupported by the OpenAI and Anthropic APIs.
falseSequential ReAct-style calls.
Tool Specification FormatHigh

The format in which tools are described to the model affects the precision of generated calls and compatibility with the host.

JSON Schema (OpenAI/Anthropic format)
Model Context Protocol (MCP)
Opis w języku naturalnym
Maximum tool calls per turnMedium

Limit the number of tool calls within a single conversational turn; guard against infinite call loops.

1One call per turn — easier to debug.
5–20Typical limit for agents with multiple tools.

Compute bottleneck

Latency of external tool calls and context size

The primary bottlenecks are latency from external tool calls (API response times, code execution) and the linear growth of context length as tool results are injected — which increases the cost of each subsequent LLM inference step.

Depends on
Opóźnienie zewnętrznych API / narzędziAkumulacja długości kontekstu

Execution paradigm

Primary mode
conditional

The base LLM remains dense — all parameters are active at every step. The conditional nature applies to external tool invocation, not to the model's internal structure.

Activation pattern
input_dependent
Routing mechanism

The model decides during decoding whether to invoke a tool (and which one) or continue generating text, based on the context content and tool specifications. This decision is endogenous: it arises from the probability distribution over output tokens.

Parallelism

Parallelism level
conditionally_parallel

Parallelism is possible when the model generates multiple tool calls in a single turn with no dependencies between them. LLM execution is sequential; parallelism applies only to tool execution by the host.

Scope
inference
Constraints
!Sekwencyjna zależność narzędzi w łańcuchu wywołań
!Równoległe wywołania niezależnych narzędzi

Hardware requirements

Primary

Tool-augmented LLM is a runtime architectural pattern — hardware requirements are determined solely by the underlying LLM and the tools themselves, not by the tool augmentation mechanism.