Agentic AI

Agenci researchowiAutomation of office and knowledge workAssistants executing end-to-end tasksAgent workflows and task orchestrationHandling processes that require planning and action

The agentic system receives a goal, then independently plans steps, selects tools, gathers data, executes actions, and evaluates intermediate results. In simpler variants, a single agent handles this using tool use; in more advanced configurations, multiple agents collaborate on subtasks within a shared workflow.

Traditional generative models handle single prompts well but struggle with extended tasks that require planning, working memory, tool use, and adaptation to changing context. Agentic AI addresses this by combining reasoning, planning, and action execution.

01

Perception / Input Layer

Receives and encodes environmental inputs into the model's context window.

Modular

Accepts observations from the environment (user messages, tool results, file contents, API responses) and formats them as context for the base model. This may include RAG retrieval to fetch relevant documents.

02

Planning Module

Goal decomposition into actions and execution plan generation

Modular

Decomposes a high-level goal into a sequence of subgoals or actions. The agent may generate an explicit plan or reason step by step using chain-of-thought.

03

Memory

State and history management across agent loop steps

Modular

Stores and retrieves information between steps within a session (short-term memory) and optionally across sessions (long-term memory).

04

Tools / Actions Layer

Extends the model's action space with calls to external systems.

Modular

The agent is provided with callable external functions: web search, code execution, database queries, file operations, API calls, and browser control. Tool interfaces are defined through schemas such as JSON Schema, OpenAPI, and MCP.

05

Reflection / Evaluation

Output quality control and decision to continue or terminate the loop.

Modular

Evaluates whether the current result meets the success criterion. Triggers a retry, replanning, or loop termination. Corresponds to the evaluator-optimizer pattern described by Anthropic.

06

Orkiestrator

Coordinates multi-agent collaboration and manages task flow.

Modular

In multi-agent systems, it directs sub-agents, assigns tasks, and aggregates results. The orchestrator can be an LLM or a statically coded deterministic controller.

Time

…

N = number of agent loop steps; C_step = cost of a single LLM inference call (typically O(L²·d) for a Transformer with context length L). Tool call costs are added on top of this and vary independently of the model.

Agentic AI's time complexity is not an intrinsic property of the paradigm — it depends entirely on the underlying LLM and the number of reasoning–action iterations. Multi-step tasks multiply cost linearly by the number of steps, and the growing context window (accumulated history + tool outputs) increases per-step cost on each iteration. Multi-agent systems with fan-out can parallelize parts of the work, but the critical path remains sequential.

Memory complexity

…

L_ctx = current LLM context window size (in tokens); S_mem = size of external memory store (e.g. vector database) if used.

The memory required by the agentic loop itself is modest (state structures, action history), but grows linearly with the model's context window length. Systems with persistent long-term memory add separate storage costs independent of a single step.

Wąskie gardło: LLM inference per action step

Each step of the agent loop requires at least one LLM inference call. Multi-step tasks with long context windows and multiple tool calls multiply latency and computational cost linearly.

Parallelism

Conditionally parallel

Parallelism is achievable when subtasks are independent (e.g., parallel web searches, concurrent subagent execution). Sequential loops are required when each step depends on the results of previous tool calls.

Paradigm

Conditional

Input dependent

The execution path is not predetermined — it is determined at runtime through the model's reasoning over accumulated context. Workflows with predefined paths represent a degenerate case.

Toolkit

Critical

web_search + code_executionTypical of research agents.
file_read + file_write + bashTypical of coding agents.

The set of external tools available to an agent (web search, code execution, file operations, APIs, browser control). It defines the space of possible actions.

Maximum Number of Steps

Standard

10Conservative limit for short tasks.
50–200Used in long-running coding and research agents.

A hard limit on the number of reasoning-action iterations before forced termination. Guards against infinite loops.

Memory Type

Standard

in_context_only
in_context + vector_store

Whether the agent relies solely on in-context memory or also on external persistent storage (vector database, key-value store).

Number of Agents (Single vs. Multi-Agent)

Standard

1Single-agent loop.
2–10+Multi-agent orchestrator-worker system.

Whether the system uses a single agent or a network of specialized agents coordinated by an orchestrator.

Human-in-the-Loop Checkpoints

Standard

noneFully autonomous.
before_irreversible_actionsRecommended for safety-critical deployments.

Whether and at which steps the agent pauses to await human confirmation before taking irreversible actions.

Context Window Size

Standard

128k tokenów
1M tokenówRequired for very long-term tasks.

The maximum number of tokens processed by the underlying LLM in a single call. This limits the amount of accumulated history, tool outputs, and instructions that can fit within a single inference step.

Common pitfalls

Hallucinations in action

CRITICAL

Model may invoke tools with fabricated parameters or claim to have performed actions it never actually executed — leading to silent failures in multi-step pipelines.

Validate all tool calls against schemas before execution; use deterministic parsers; introduce explicit confirmation steps for irreversible actions.

Infinite loops

HIGH

Without a hard step limit or an effective termination criterion, an agent can loop indefinitely, consuming computational resources and hitting API rate limits.

Set explicit max_steps limits; implement loop detection based on repeated action signatures; use an evaluator to enforce stopping conditions.

Prompt injection via observed content

CRITICAL

Malicious instructions embedded in tool outputs (web pages, documents, emails) can hijack agent behavior by impersonating system-level instructions.

Isolate untrusted content from system instructions; require explicit user confirmation before acting on instructions found in observed content; apply content filtering.

Context window overflow

HIGH

Accumulated tool outputs and conversation history can exceed the model's context window, causing earlier steps to be silently truncated.

Implement context compaction/summarization; use external memory stores; monitor the token budget at each step.

Tool misuse and irreversible side effects

CRITICAL

Agents with access to write-enabled tools (file deletion, email sending, database writes) can cause real-world harm when acting on faulty reasoning.

Use tool sets with minimal permission scope; require human confirmation for irreversible actions; prefer reversible operations where possible.

Creeping complexity — building agents where a workflow suffices

MEDIUM

Using agentic autonomy for deterministic, well-defined tasks introduces latency, unpredictability, and failure modes that a simple workflow would avoid.

Use predefined workflows by default; introduce agentic autonomy only when a task genuinely requires dynamic decision-making across multiple unpredictable steps.

Reference implementations

LangChain Agents

Python · LangChain AI

LlamaIndex Workflows

Python · LlamaIndex

Anthropic agent patterns (reference code)official

Python · Anthropic

1995

Foundational theory of intelligent agents

Russell and Norvig formalize rational agents as entities that perceive their environment and take goal-directed actions. BDI (Belief-Desire-Intention) agent architectures are established.

2022

ReAct: Reasoning + Acting with LLMs

breakthrough

Yao et al. (2022) propose ReAct — interleaving chain-of-thought reasoning traces with action execution in LLMs, demonstrating that language models can serve as a reasoning engine within tool-augmented agentic loops.

ReAct: Synergizing Reasoning and Acting in Language Models

2023

API for tool calling and first commercial agentic systems

breakthrough

OpenAI introduced function calling in GPT-4 in June 2023. AutoGPT, BabyAGI, and LangChain agent abstractions gained widespread adoption. The term "Agentic AI" entered common industry usage.

2024

Four Agentic AI Design Patterns by Andrew Ng

Andrew Ng's series of blog posts identifies four fundamental design patterns — Reflection, Tool Use, Planning, and Multi-Agent Collaboration — widely cited as a practical taxonomy of agentic systems.

What's next for AI agentic frameworks (Andrew Ng, 2024)

2024

Anthropic "Building Effective Agents" — compositional patterns for production

breakthrough

Anthropic published practical guidelines distinguishing workflows (predefined paths) from agents (model-driven execution) and formalized five compositional patterns: prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer.

Building effective agents

2025

Model Context Protocol (MCP) standardizes tool connectivity

Anthropic publishes MCP as an open standard for connecting LLMs to external tool servers, enabling interoperable agentic ecosystems across providers.

2025

Agentic AI in Robotics — Embodied Agent Loops

LLM-based planners drive robotic actions through perception-planning-action loops, extending agentic paradigms to physical systems and connecting Agentic AI with real-world motor execution.

Hardware agnosticPRIMARY

Agentic AI is an architectural paradigm, not a specific computational kernel. Hardware requirements are entirely determined by the underlying LLM and tools, not by the agent loop itself.

GPU tensor cores or TPU are required by the underlying LLM for efficient inference; the agent orchestration layer (routing, tool calls, memory management) runs on CPU.

BUILT ON

LLM

A Large Language Model (LLM) is a class of machine learning models based on the Transformer architecture, trained on large text datasets via autoregressive language modeling (next-token prediction). These models have billions of parameters and can generate coherent text, answer questions, write code, translate languages, and perform many other language-cognitive tasks without task-specific fine-tuning. The term covers models such as GPT, LLaMA, Gemini, Claude, and Mistral. Most modern LLMs are instruction-tuned (SFT + RLHF) after the pre-training phase.

GO TO CONCEPT

CoT

Chain-of-Thought (CoT) Reasoning is a prompting technique introduced by Wei et al. (2022) in which a large language model is induced to generate a series of intermediate natural-language reasoning steps as part of its output, prior to producing a final answer. The technique was shown to significantly improve LLM performance on arithmetic, commonsense, and symbolic reasoning benchmarks where standard few-shot prompting yields flat or poor results. In the original formulation (few-shot CoT), a small number of exemplar question-answer pairs are included in the prompt, where each answer consists of a chain of thought followed by the final answer. The model learns from these demonstrations to produce its own reasoning chains. A subsequent zero-shot variant (Kojima et al., 2022) showed that appending the phrase 'Let's think step by step' to a question is sufficient to elicit reasoning chains from large models without any exemplars. CoT is an emergent property: empirical results in the originating paper show that reasoning ability via CoT prompting appears only in models above a certain parameter threshold (approximately 100B parameters for the models tested in 2022), with smaller models not benefiting or performing worse. This relationship has been revisited by subsequent work as smaller models have been fine-tuned on CoT data. Key extensions include Self-Consistency CoT (Wang et al., 2022), which samples multiple reasoning paths and selects the most frequent final answer; Tree of Thoughts (Yao et al., 2023), which frames reasoning as search over a tree of intermediate thoughts; and native reasoning models such as OpenAI o1 (2024) and DeepSeek-R1 (2025), which internalize extended reasoning through reinforcement learning on process reward signals rather than relying on prompting.

GO TO CONCEPT

Tool-augmented LLM

Tool-augmented LLM is an architectural pattern in which a large language model is equipped with access to one or more external tools that it can invoke during inference by generating structured function-call or API-call outputs. The model learns when and how to call tools by producing special tokens or structured output (e.g., JSON function calls) that are intercepted by a host runtime, executed against the tool, and whose results are returned to the model as new context for continued generation. The canonical formalization appeared in the Toolformer paper (Schick et al., Meta AI, 2023), which demonstrated that LLMs can learn to self-supervise their own tool-use through API call annotation without requiring large labeled datasets. Toolformer showed that models trained this way can decide which tools to call, when, and with which arguments, and that tool use substantially improves performance on tasks requiring fresh information, arithmetic, multilingual lookup, and question answering. The pattern encompasses several mechanisms: (1) in-context tool specification, where tool interfaces are described in the system prompt or context (JSON Schema, OpenAPI, natural language); (2) function calling APIs, where the model produces structured output matched to a defined schema and the host dispatches the call; (3) ReAct-style interleaving, where the model alternates reasoning traces with tool-use observations; and (4) parallel tool calling, where the model emits multiple tool calls simultaneously to be executed concurrently. Key implementations include OpenAI function calling (GPT-4, June 2023), Anthropic tool use (Claude, 2023), Google Gemini function calling, and the Model Context Protocol (MCP, 2024) which standardizes tool server connectivity.

GO TO CONCEPT

Connects

Tool-augmented LLM

Tool-augmented LLM is an architectural pattern in which a large language model is equipped with access to one or more external tools that it can invoke during inference by generating structured function-call or API-call outputs. The model learns when and how to call tools by producing special tokens or structured output (e.g., JSON function calls) that are intercepted by a host runtime, executed against the tool, and whose results are returned to the model as new context for continued generation. The canonical formalization appeared in the Toolformer paper (Schick et al., Meta AI, 2023), which demonstrated that LLMs can learn to self-supervise their own tool-use through API call annotation without requiring large labeled datasets. Toolformer showed that models trained this way can decide which tools to call, when, and with which arguments, and that tool use substantially improves performance on tasks requiring fresh information, arithmetic, multilingual lookup, and question answering. The pattern encompasses several mechanisms: (1) in-context tool specification, where tool interfaces are described in the system prompt or context (JSON Schema, OpenAPI, natural language); (2) function calling APIs, where the model produces structured output matched to a defined schema and the host dispatches the call; (3) ReAct-style interleaving, where the model alternates reasoning traces with tool-use observations; and (4) parallel tool calling, where the model emits multiple tool calls simultaneously to be executed concurrently. Key implementations include OpenAI function calling (GPT-4, June 2023), Anthropic tool use (Claude, 2023), Google Gemini function calling, and the Model Context Protocol (MCP, 2024) which standardizes tool server connectivity.

GO TO CONCEPT

CoT

Chain-of-Thought (CoT) Reasoning is a prompting technique introduced by Wei et al. (2022) in which a large language model is induced to generate a series of intermediate natural-language reasoning steps as part of its output, prior to producing a final answer. The technique was shown to significantly improve LLM performance on arithmetic, commonsense, and symbolic reasoning benchmarks where standard few-shot prompting yields flat or poor results. In the original formulation (few-shot CoT), a small number of exemplar question-answer pairs are included in the prompt, where each answer consists of a chain of thought followed by the final answer. The model learns from these demonstrations to produce its own reasoning chains. A subsequent zero-shot variant (Kojima et al., 2022) showed that appending the phrase 'Let's think step by step' to a question is sufficient to elicit reasoning chains from large models without any exemplars. CoT is an emergent property: empirical results in the originating paper show that reasoning ability via CoT prompting appears only in models above a certain parameter threshold (approximately 100B parameters for the models tested in 2022), with smaller models not benefiting or performing worse. This relationship has been revisited by subsequent work as smaller models have been fine-tuned on CoT data. Key extensions include Self-Consistency CoT (Wang et al., 2022), which samples multiple reasoning paths and selects the most frequent final answer; Tree of Thoughts (Yao et al., 2023), which frames reasoning as search over a tree of intermediate thoughts; and native reasoning models such as OpenAI o1 (2024) and DeepSeek-R1 (2025), which internalize extended reasoning through reinforcement learning on process reward signals rather than relying on prompting.

GO TO CONCEPT

Commonly used with

RAG

Retrieval-Augmented Generation (RAG) was introduced by Lewis et al. (2020) as a general-purpose fine-tuning recipe combining pre-trained parametric memory (a seq2seq language model, specifically BART in the original paper) with non-parametric memory (a dense vector index of Wikipedia, accessed via Dense Passage Retrieval, DPR). In the original formulation, both the retriever and the generator are fine-tuned end-to-end: given an input query x, the retriever retrieves top-k documents z from the corpus, and the generator produces an output y conditioned on x and z. Two formulations were proposed: RAG-Sequence (the same retrieved documents condition the full output sequence) and RAG-Token (different documents may be used per generated token, marginalized during generation). In widespread contemporary usage (post-2022, with the growth of LLM applications), 'RAG' has expanded to describe a broader class of retrieve-then-generate pipelines, typically with a frozen LLM, a vector store containing pre-computed dense embeddings of document chunks, and a retrieval step that fetches top-k relevant chunks based on embedding similarity to the query. The retrieved chunks are appended to the prompt as context before the LLM generates a response. This non-trainable pipeline variant is technically distinct from the original Lewis et al. formulation but is the dominant practical interpretation of RAG as of 2023–2025. The canonical modern RAG pipeline consists of an offline indexing phase (document chunking, embedding computation, storage in a vector database) and an online query phase (query embedding, approximate nearest neighbor search, context-augmented generation). Key design decisions include: chunk size and overlap, embedding model choice, retrieval strategy (dense, sparse/BM25, or hybrid), number of retrieved documents k, and context integration method (prepend to prompt, cross-attention injection, or fusion-in-decoder). RAG addresses two fundamental limitations of parametric-only LLMs: the knowledge cutoff problem (inability to access post-training information) and hallucination (generation of factually incorrect content). However, RAG introduces its own failure modes, including retrieval of irrelevant or misleading context and the LLM's susceptibility to being distracted by retrieved content that contradicts its parametric knowledge.

GO TO CONCEPT

MCP

Model Context Protocol (MCP) is an open protocol developed by Anthropic and released in November 2024. It addresses the M×N integration problem in AI systems: connecting M different LLM applications to N different external tools previously required M×N bespoke connectors. MCP defines a standardized client-host-server architecture where hosts (LLM applications) manage one or more clients, each maintaining a stateful session with a specific server. Servers expose capabilities as three primitives: Resources (structured data for context), Prompts (templated instructions), and Tools (executable functions). Clients expose two primitives: Roots (filesystem entry points) and Sampling (server-initiated LLM completions). Communication is based on JSON-RPC 2.0. Capability negotiation occurs at session initialization. The protocol is transport-agnostic and has been implemented in Python, TypeScript, C#, and Java SDKs. In December 2025, Anthropic donated MCP governance to the Agentic AI Foundation (AAIF) under the Linux Foundation.

GO TO CONCEPT

MAS

Multi-Agent Systems (MAS) are a paradigm in Distributed Artificial Intelligence in which multiple autonomous software entities — agents — interact within a shared environment to achieve individual or collective goals. Each agent perceives its environment through sensors or interfaces, reasons about its state, and acts through actuators or API calls. In the context of LLM-based MAS (emerging prominently from 2023 onward), agents are powered by large language models that provide the cognitive core (planning, reasoning, natural language communication), supplemented by memory modules, tool-use interfaces, and role-specific prompts. The system architecture defines how agents coordinate: coordination topologies include sequential pipelines, hierarchical orchestration (orchestrator-worker), parallel fan-out/fan-in, publish-subscribe messaging, and decentralized peer-to-peer communication. Core agent properties, as defined by Wooldridge and Jennings (1995), include autonomy, social ability, reactivity, and pro-activeness. In LLM-based systems, key components are: the agent (an LLM with a system prompt defining its role), a communication channel (natural language messages, structured function calls, or shared memory), an orchestrator or coordinator (managing task decomposition, routing, and state), tool-use interfaces (external APIs, code execution, web search), and a memory subsystem (short-term context, long-term vector storage). Prominent frameworks implementing LLM-based MAS include AutoGen (Microsoft, 2023), CAMEL (2023), MetaGPT (2023), CrewAI, and LangGraph.

GO TO CONCEPT

Reasoning model

A reasoning model (also: large reasoning model, LRM, reasoning language model, RLM) is a type of large language model that has been specifically post-trained to solve complex multi-step problems by explicitly generating intermediate reasoning steps before committing to a final response. Unlike standard LLMs that generate a direct response in a single forward pass, reasoning models allocate additional computation at inference time — a property known as test-time compute scaling — by producing a long internal chain of thought (CoT). The reasoning trace typically includes steps such as problem decomposition, hypothesis generation, self-verification, reflection, and correction of errors. The defining characteristics of reasoning models are: (1) post-training via large-scale reinforcement learning (RL) using reward signals based on final answer correctness (and sometimes intermediate step quality via process reward models); (2) the emergence of extended, often hidden, reasoning traces that precede the final answer; (3) a consistent empirical relationship between the length or computational budget allocated to the reasoning trace and final answer quality (test-time scaling law); (4) superior performance on verifiable tasks requiring multi-step logic, such as mathematics, competitive programming, and scientific reasoning. The term 'reasoning model' was introduced as a product category by OpenAI in September 2024 with the release of the o1-preview model. OpenAI described o1 as trained via a large-scale RL algorithm teaching the model to use chain of thought productively. The approach does not rely on explicit tree search algorithms; instead, implicit search emerges via RL-trained CoT generation. In January 2025, DeepSeek published the first detailed open technical description of this class of models in the DeepSeek-R1 paper (arXiv:2501.12948), demonstrating that reasoning capabilities can be incentivized via pure RL without supervised fine-tuning, using Group Relative Policy Optimization (GRPO) as the RL framework. Reasoning models typically employ the same base Transformer decoder architecture as standard LLMs, with the key difference residing entirely in the post-training pipeline: RL replaces or augments standard RLHF/SFT, and reward signals are grounded in verifiable outcomes. The resulting models generate substantially longer token sequences during inference (reasoning tokens), which are often hidden from end users but incur real compute costs. Performance consistently improves with both more training-time RL compute and more inference-time thinking budget.

GO TO CONCEPT

Related AI models

GLM

1

GLM-5.1

Grok

3

Grok 4

Grok 4.1

G

Grok 4.3

Title	Publisher	Type
What is Agentic AI? A concise definition of agentic AI as systems that pursue goals with limited human oversight.	IBM	article
Using tools Documentation of tools and tool use as the foundation of practical agentic systems.	OpenAI	documentation
Agentic AI Description of agentic AI as systems operating autonomously to achieve goals.	European Data Protection Supervisor	article

What is Agentic AI?

A concise definition of agentic AI as systems that pursue goals with limited human oversight.

articleIBM

Using tools

Documentation of tools and tool use as the foundation of practical agentic systems.

documentationOpenAI

Agentic AI

Description of agentic AI as systems operating autonomously to achieve goals.

articleEuropean Data Protection Supervisor

Use cases

How it works

Problem solved

Main components

Perception / Input Layer

Planning Module

Memory

Tools / Actions Layer

Reflection / Evaluation

Orkiestrator

Computational complexity

Configuration axes

Implementation

Common pitfalls

Reference implementations

History and evolution

Preferred hardware

Semantic relations

BUILT ON

Connects

Commonly used with

Related models and families

Related AI models

GLM

Grok

Sources