The application provides the model with a list of functions plus JSON schemas for their parameters. Through post-training (instruction tuning + RLHF on tool-use), the model decides whether to reply with text or emit a structured {name, arguments} call. The runtime parses the call, validates arguments against the schema, executes the function and injects the result back into context as a tool message. The model then reasons over the result or replies to the user. The loop can repeat (multi-step tool use) and — since November 2023 — the model can emit multiple calls at once (parallel tool calls).
A bare LLM is isolated from the world — no access to fresh data, databases, calculators, enterprise systems, or physical devices. Function Calling provides a reliable, structured bridge between the model's free-form reasoning and deterministic external code execution, eliminating brittle ad-hoc text parsing and format hallucinations.
The model may invent a nonexistent function or supply arguments that violate the schema. Mitigation: JSON schema validation before execution, strict / structured-output modes, error-feedback retries.
An agent can loop on the same function or invoke costly tools excessively. Mitigation: max-iteration limits, call deduplication, token and time budgets.
Function results (especially search and SQL) quickly fill the context window, raising cost and degrading quality. Mitigation: result summarisation, pagination, selective injection.
Content returned from a function (web page, email, SQL row) may contain instructions trying to hijack the agent. Mitigation: isolating tool messages, marking untrusted content, tool-permission policies.
Yao et al. showed an LLM can interleave "thought" and tool-invoking "action" steps in a single chain — the conceptual foundation for later function calling.
Meta AI publishes Toolformer (February 2023): the model self-supervises insertion of API calls in text and learns when to use them — the direct academic precursor to production function calling.
On June 13, 2023 OpenAI releases function calling in gpt-3.5-turbo-0613 and gpt-4-0613. For the first time a widely available commercial LLM emits structured JSON function calls as a first-class response mode.
November 2023: OpenAI DevDay introduces the tools/tool_choice fields (replacing functions/function_call) and parallel tool calls — the model can issue multiple independent calls in one response.
Function calling becomes an industry standard — Anthropic Claude ships Tool Use GA (May 2024), Google Gemini exposes Function Calling, frameworks (LangChain, LlamaIndex) unify the interfaces.
Anthropic publishes MCP (November 2024) — an open protocol standardising how tools, data and resources are exposed to models via function calling, independent of the LLM provider.
Guided by tool descriptions and conversation history, the model selects which function (if any) to call and what arguments to pass. Routing happens in the model's output-token space, with no external router.
A single tool-use decision is sequential, but since November 2023 (OpenAI parallel tool calls) the model can emit multiple independent calls in one turn that the runtime executes in parallel.
Function Calling is an application- and orchestration-level pattern — it runs anywhere a tool-use-capable language model runs, independent of hardware backend.