Tool-augmented LLM

Extends large language models with the ability to invoke external tools — search engines, calculators, APIs, and code interpreters — by generating structured calls during text generation, enabling access to current knowledge and precise computation unavailable in model parameters.

Tool specification

Informs the model about available tools, their interfaces, and expected call parameters.

Modular

A formal definition of a tool interface passed to the model — typically in JSON Schema or OpenAPI format — describing the tool's name, its parameters, and input data types. Provided in the system prompt or a dedicated API field.

Tool call generation

Generates a structured tool call during model decoding.

The module responsible for generating a structured tool call by an LLM — typically in the form of special tokens, a JSON block, or a function_call object. The model decides when and with what arguments to invoke a tool, based on context.

Tool Executor / Host

Executes tool calls and returns results to the model to continue generation.

Modular

The runtime environment outside the model that intercepts tool calls generated by the LLM, executes them (by calling APIs, running code, or querying a database), and returns the results to the model as new context.

Tool Result Injection into Context

Integrates tool outputs into the model context for subsequent generation.

Modular

The mechanism for returning tool execution results to the model's context window, enabling the model to continue generation incorporating the retrieved data. Results may be injected as a tool_result block, a new message, or special tokens.

Wąskie gardło: Latency of external tool calls and context size

The primary bottlenecks are latency from external tool calls (API response times, code execution) and the linear growth of context length as tool results are injected — which increases the cost of each subsequent LLM inference step.

Parallelism

Conditionally parallel

Parallelism is possible when the model generates multiple tool calls in a single turn with no dependencies between them. LLM execution is sequential; parallelism applies only to tool execution by the host.

Paradigm

Conditional

Input dependent

The base LLM remains dense — all parameters are active at every step. The conditional nature applies to external tool invocation, not to the model's internal structure.

Available Tool Set

Critical

web_search + code_interpreter
calculator + calendar + email
custom_domain_api + vector_db_retrieval

A set of tools made available to the model, defining the space of possible calls. Tools may include search engines, calculators, code interpreters, databases, external APIs, and system utilities.

Parallel tool calls

Standard

trueSupported by the OpenAI and Anthropic APIs.
falseSequential ReAct-style calls.

Whether the model can generate multiple tool calls simultaneously in a single turn (parallel tool calling), which the host executes in parallel before returning the results.

Tool Specification Format

Standard

JSON Schema (OpenAI/Anthropic format)
Model Context Protocol (MCP)
Opis w języku naturalnym

The format in which tools are described to the model affects the precision of generated calls and compatibility with the host.

Maximum tool calls per turn

Standard

1One call per turn — easier to debug.
5–20Typical limit for agents with multiple tools.

Limit the number of tool calls within a single conversational turn; guard against infinite call loops.

Common pitfalls

Tool call argument hallucination

HIGH

The model may generate tool calls with fabricated or incorrect parameters — such as fictitious function names, wrong data types, or invalid date and identifier formats. This causes silent execution errors on the host side.

Validate all tool-call arguments against a schema before execution; use schemas with strict types and constraints; log and monitor failed calls.

Tool result prompt injection

CRITICAL

Results returned by tools (web pages, documents, API responses) may contain malicious instructions that the model treats as system commands — a classic prompt injection attack via observed content.

Isolate tool outputs from system instructions; apply explicit delimiters and source metadata; require user confirmation before executing irreversible actions based on observed content.

Excessive or infinite tool-call loops

HIGH

Without hard limits, a model can invoke tools in a loop — for example, repeatedly searching the web for information unavailable in any source — exhausting the token budget and generating unnecessary API costs.

Set a hard limit on tool calls per turn/session; implement repeated-call detection; require human-in-the-loop for critical or costly calls.

Tool output context overflow

HIGH

Results from external APIs or search engines can be very long — HTML pages, JSON responses with many fields — quickly filling the context window and causing earlier conversational context to be lost.

Apply extraction or summarization of tool outputs before injecting them into the context; constrain output size via API parameters (token limits, pagination); monitor the context token budget.

Unnecessary tool calls for known facts

MEDIUM

Models may invoke tools (e.g., a search engine) for information already present in their parametric knowledge, unnecessarily increasing latency and cost. This is especially relevant for models with a low confidence threshold for tool invocation.

Calibrate model confidence thresholds; explicitly instruct the model in system prompts when to invoke tools versus rely on parametric knowledge; apply reflection mechanisms before tool calls.

Reference implementations

Toolformer (community reproduction)

Python · lucidrains (community reproduction)

Anthropic tool use — official documentation and examplesofficial

Python, JavaScript · Anthropic

LangChain Tools

Python · LangChain AI

GENESIS · Source paper

Toolformer: Language Models Can Teach Themselves to Use Tools

2023NeurIPS 2023Timo Schick, Jane Dwivedi-Yu, Roberto Dessì et al.

2021

WebGPT — GPT-3 augmented with a web browser

breakthrough

Nakano et al. (OpenAI) augment GPT-3 with the ability to search the web via a text-based browser interface. This was the first demonstration that an LLM can use an external information source through reinforcement learning from human feedback.

WebGPT: Browser-assisted question-answering with human feedback

2022

TALM — tool bootstrapping via self-annotation

Parisi et al. (Google) propose TALM (Tool Augmented Language Models), in which an LLM iteratively expands its set of tool calls by filtering out those that improve results — an early step toward self-supervised learning of tool use.

TALM: Tool Augmented Language Models

2022

ReAct — interleaved reasoning and tool use

breakthrough

Yao et al. (Princeton / Google) propose ReAct: an LLM alternately generates reasoning traces (Thought) and tool calls (Action), receiving observations (Observation) from the environment. The work establishes the interleaved reasoning + tool use pattern.

ReAct: Synergizing Reasoning and Acting in Language Models

2023

Toolformer — LLM learns to use tools autonomously

breakthrough

Schick et al. (Meta AI) introduce Toolformer — a model trained via self-annotated API call insertions in text, without large manually labeled datasets. The model learns when and how to invoke external tools (calculator, search engine, translator, QA system) and how to integrate their outputs.

Toolformer: Language Models Can Teach Themselves to Use Tools

2023

OpenAI Function Calling — commercial standardization of tool invocation

breakthrough

OpenAI introduces function calling in GPT-4 and GPT-3.5 Turbo (June 2023) — a structured API enabling the model to generate function calls in JSON Schema format. This becomes the de facto industry standard for tool augmentation.

2023

Anthropic tool use and parallel tool calls

Anthropic introduces tool use in the Claude API with support for parallel tool calling, enabling the model to generate multiple tool calls that the host executes simultaneously.

2024

Model Context Protocol (MCP) — standardization of tool connectivity

breakthrough

Anthropic published the Model Context Protocol as an open standard connecting models to external tool servers — analogous to the Language Server Protocol for developer tooling. MCP standardizes both the tool description format and the communication protocol between LLMs and tool servers.

Hardware agnosticPRIMARY

Tool-augmented LLM is a runtime architectural pattern — hardware requirements are determined solely by the underlying LLM and the tools themselves, not by the tool augmentation mechanism.

GPU tensor cores are required by the underlying LLM; tool execution by the host typically runs on CPU or via external APIs. The function calling / tool use mechanism adds no hardware requirements beyond those of the model itself.