Shifts the software delivery model from click-driven web applications to agent-based services, where the client defines a goal in natural language and the provider delivers an autonomous agent that executes the work end-to-end.
Category
Abstraction level
Operation level
2026
Customer experience β autonomously built and optimized customer service agentsVoice agents β building and maintaining voice agents across multiple languagesKnowledge work automation β delegating entire workflows rather than individual subtasksOutcome-based billing β charging per result (resolved ticket, closed case) instead of per seatHeadless platform deployment β platforms refactored as programmatic infrastructure for agents
The client defines a goal or uploads source materials (procedures, call transcripts, recordings, documentation). The provider's agent (e.g., Ghostwriter) analyzes this data, identifies key behaviors and edge cases, generates a production-ready executive agent, and configures it across multiple channels (voice, chat, email) with built-in safeguards. A continuous improvement loop then analyzes real interactions, proposes enhancements, tests them in a sandbox, and prepares them for human approval. All interaction with the agent occurs in natural language β no UI clicks required.
The traditional SaaS model requires a human to learn an application's interface and perform work step by step through manual interaction. This scales linearly with the number of users and their cognitive constraints. AaaS addresses this by delegating execution to an AI agent, reducing human interaction to defining the goal and accepting the result.
01
Agent Scaffolding
Provides the agent with the execution scaffold required for production operation.
Layer that provides the agent with tools, memory, planning, a coherent action space, and the right task context. Without it, the agent has no stable foundation for production operation.
02
Headless Infrastructure
Enables the agent to control the platform without a UI.
Refactoring of the SaaS platform so that all functions are accessible programmatically (API, SDK) without UI dependency. Lets the agent invoke the platform directly instead of emulating clicks.
03
Validation Environment
Secure validation of changes prior to deployment
Modular
Isolated space where the agent builds and tests changes before deploying to production. Critical for safe autonomy β the agent can experiment without risking damage to the running system.
04
Agent Assembly Line Loop
Autonomous continuous improvement of production agents
Modular
Cycle of analyzing real interactions, identifying improvement opportunities, validating them, and preparing them for review. Runs autonomously in the background and lets agents improve over time.
05
Human Review Checkpoint
Human oversight of irreversible changes
Modular
Gate where changes prepared by the agent are approved by a human before deployment. Forms the foundation of trust and accountability in production AaaS deployments.
Parallelism
Conditionally parallel
Parallelism occurs primarily across clients (different clients, different production agents) and within multi-agent subsystems (e.g., concurrent sandbox tests).
Paradigm
Conditional
Input dependent
AaaS is a delivery paradigm, not a specific computational kernel. The execution mode is inherited from the underlying Agentic AI: conditional loops driven by an LLM over the platform's headless infrastructure.
Agent Autonomy Scope
Critical
proposal_onlyAgent prepares changes, a human approves them.
auto_with_rollbackAutomated deployments with rollback capability.
Scope of decisions the agent can make without human approval β from proposals requiring review to full autonomous deployments.
Billing Model
Standard
outcome_based
consumption_based
seat_based
Billing model: per seat (SaaS-like), per usage (per token / call), or per outcome (resolved case, closed ticket).
Channel Range
Standard
chat_only
voice + chat + email + 30+ languages
Number and type of channels the agent operates on: chat, voice, email, video, messengers.
Agent input modality
Standard
natural_language_prompt
multimodal (SOP + transcripts + audio + images)
Format in which the customer defines what they expect from the agent: text prompt, process documentation, transcript, audio recording, whiteboard photo.
Attempts to implement AaaS on a UI-first platform (clicks, forms) result in an agent emulating a human user β unstable and slow. Without refactoring the platform into headless infrastructure, the paradigm does not work.
Refactor the platform to an API-first/headless architecture before building an orchestrating agent. Sierra's Ghostwriter was preceded by exactly such a platform rearchitecture.
Mismatched billing model
HIGH
Using a seat-based billing model for AaaS weakens the value proposition β the customer doesn't buy seats, they buy outcomes. Misaligned billing obscures ROI and slows adoption.
Prefer outcome-based (per resolved case) or consumption-based billing; measure and report business results, not agent activity.
Lack of autonomy and trust progression
HIGH
Full agent autonomy from day one is risky β without an observed phase and incremental permission delegation, the customer won't build trust, and agent errors can undermine the whole contract.
Begin in proposal-only mode (agent prepares, human approves); expand autonomy based on measured quality and trust metrics.
Absence of continuous production agent evaluation
MEDIUM
AaaS relies on continuous improvement. Without automated evaluation of real interactions, the agent assembly line has no signal β the agent stops learning and drifts away from business changes.
Embed automated regression detection, trend exploration (analogous to Deep Research), and periodic sandbox A/B testing.
Using AaaS where SaaS would suffice
MEDIUM
For deterministic, well-defined tasks (forms, simple CRUD), AaaS adds LLM cost, latency, and unpredictability. Traditional SaaS is often faster, cheaper, and more predictable.
Adopt AaaS only when tasks are variable, unpredictable, and require reasoning over context; for stable workflows, retain SaaS.
Software as a Service β the pattern AaaS reacts against
Salesforce introduces the SaaS model based on a centrally hosted, human-operated web application. It defines the 'customer buys access to an interface' paradigm against which AaaS will later position itself.
2022
ReAct β technical foundation of the LLM agent loop
Yao et al. (2022) show that LLMs can act as a reasoning engine in loops combining thoughts with tool actions. This is the technical substrate on which AaaS becomes possible.
Anthropic β compositional patterns for production agents
Anthropic publishes guidelines distinguishing workflows from agents and formalizing five composition patterns. Practitioners gain a common vocabulary that will later ease AaaS commercialization.
Sierra publishes 'Agents as a Service' manifesto and launches Ghostwriter
breakthrough
On March 25, 2026, Bret Taylor and Clay Bavor (Sierra co-founders) publish the Agents as a Service manifesto and introduce Ghostwriter β an agent that builds agents. They coin the phrase 'prompts, not clicks' and the 'agent assembly line' concept. This is the moment the term enters public industry discourse.
AaaS is a delivery model, not a computational kernel. Hardware requirements stem from the underlying LLMs and platform tools, not from the service paradigm itself.
In practice, AaaS providers run LLM inference on GPU tensor cores or TPUs; the orchestration layer and headless API run on CPU.
BUILT ON
LLM
A Large Language Model (LLM) is a class of machine learning models based on the Transformer architecture, trained on large text datasets via autoregressive language modeling (next-token prediction). These models have billions of parameters and can generate coherent text, answer questions, write code, translate languages, and perform many other language-cognitive tasks without task-specific fine-tuning. The term covers models such as GPT, LLaMA, Gemini, Claude, and Mistral. Most modern LLMs are instruction-tuned (SFT + RLHF) after the pre-training phase.
Agentic AI denotes an architectural transition from single-turn, stateless generative models toward goal-directed systems capable of autonomous perception, planning, action, and adaptation through iterative control loops. An agentic system wraps a large language model in a runtime that gives the model access to tools (web search, code execution, APIs, file I/O), persistent memory, and feedback from prior steps. The model then decides dynamically which tools to call, in what order, and whether to loop or stop, rather than following a predefined code path.
Two primary system types are commonly distinguished: (1) Workflows, in which LLMs and tools are orchestrated through predefined code paths, and (2) Agents, in which the LLM itself directs its process and tool usage dynamically. Both can be composed into multi-agent systems where specialized agents collaborate, with one acting as orchestrator and others as subagents. Key design patterns identified by Anthropic (2024) include prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops. Andrew Ng's 2024 taxonomy describes four foundational patterns: Reflection, Tool Use, Planning, and Multi-Agent Collaboration.
Formal frameworks model agentic control loops as Partially Observable Markov Decision Processes (POMDPs). The control loop is: perceive state β reason/plan β select action β execute tool β observe result β update state β repeat. Agentic systems introduce risks not present in single-turn models, including hallucination in action, prompt injection through observed content, infinite loops, reward hacking, and tool misuse.
Tool-augmented LLM is an architectural pattern in which a large language model is equipped with access to one or more external tools that it can invoke during inference by generating structured function-call or API-call outputs. The model learns when and how to call tools by producing special tokens or structured output (e.g., JSON function calls) that are intercepted by a host runtime, executed against the tool, and whose results are returned to the model as new context for continued generation.
The canonical formalization appeared in the Toolformer paper (Schick et al., Meta AI, 2023), which demonstrated that LLMs can learn to self-supervise their own tool-use through API call annotation without requiring large labeled datasets. Toolformer showed that models trained this way can decide which tools to call, when, and with which arguments, and that tool use substantially improves performance on tasks requiring fresh information, arithmetic, multilingual lookup, and question answering.
The pattern encompasses several mechanisms: (1) in-context tool specification, where tool interfaces are described in the system prompt or context (JSON Schema, OpenAPI, natural language); (2) function calling APIs, where the model produces structured output matched to a defined schema and the host dispatches the call; (3) ReAct-style interleaving, where the model alternates reasoning traces with tool-use observations; and (4) parallel tool calling, where the model emits multiple tool calls simultaneously to be executed concurrently.
Key implementations include OpenAI function calling (GPT-4, June 2023), Anthropic tool use (Claude, 2023), Google Gemini function calling, and the Model Context Protocol (MCP, 2024) which standardizes tool server connectivity.
Agentic AI denotes an architectural transition from single-turn, stateless generative models toward goal-directed systems capable of autonomous perception, planning, action, and adaptation through iterative control loops. An agentic system wraps a large language model in a runtime that gives the model access to tools (web search, code execution, APIs, file I/O), persistent memory, and feedback from prior steps. The model then decides dynamically which tools to call, in what order, and whether to loop or stop, rather than following a predefined code path.
Two primary system types are commonly distinguished: (1) Workflows, in which LLMs and tools are orchestrated through predefined code paths, and (2) Agents, in which the LLM itself directs its process and tool usage dynamically. Both can be composed into multi-agent systems where specialized agents collaborate, with one acting as orchestrator and others as subagents. Key design patterns identified by Anthropic (2024) include prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops. Andrew Ng's 2024 taxonomy describes four foundational patterns: Reflection, Tool Use, Planning, and Multi-Agent Collaboration.
Formal frameworks model agentic control loops as Partially Observable Markov Decision Processes (POMDPs). The control loop is: perceive state β reason/plan β select action β execute tool β observe result β update state β repeat. Agentic systems introduce risks not present in single-turn models, including hallucination in action, prompt injection through observed content, infinite loops, reward hacking, and tool misuse.
Multi-Agent Systems (MAS) are a paradigm in Distributed Artificial Intelligence in which multiple autonomous software entities β agents β interact within a shared environment to achieve individual or collective goals. Each agent perceives its environment through sensors or interfaces, reasons about its state, and acts through actuators or API calls. In the context of LLM-based MAS (emerging prominently from 2023 onward), agents are powered by large language models that provide the cognitive core (planning, reasoning, natural language communication), supplemented by memory modules, tool-use interfaces, and role-specific prompts. The system architecture defines how agents coordinate: coordination topologies include sequential pipelines, hierarchical orchestration (orchestrator-worker), parallel fan-out/fan-in, publish-subscribe messaging, and decentralized peer-to-peer communication. Core agent properties, as defined by Wooldridge and Jennings (1995), include autonomy, social ability, reactivity, and pro-activeness. In LLM-based systems, key components are: the agent (an LLM with a system prompt defining its role), a communication channel (natural language messages, structured function calls, or shared memory), an orchestrator or coordinator (managing task decomposition, routing, and state), tool-use interfaces (external APIs, code execution, web search), and a memory subsystem (short-term context, long-term vector storage). Prominent frameworks implementing LLM-based MAS include AutoGen (Microsoft, 2023), CAMEL (2023), MetaGPT (2023), CrewAI, and LangGraph.
Model Context Protocol (MCP) is an open protocol developed by Anthropic and released in November 2024. It addresses the MΓN integration problem in AI systems: connecting M different LLM applications to N different external tools previously required MΓN bespoke connectors. MCP defines a standardized client-host-server architecture where hosts (LLM applications) manage one or more clients, each maintaining a stateful session with a specific server. Servers expose capabilities as three primitives: Resources (structured data for context), Prompts (templated instructions), and Tools (executable functions). Clients expose two primitives: Roots (filesystem entry points) and Sampling (server-initiated LLM completions). Communication is based on JSON-RPC 2.0. Capability negotiation occurs at session initialization. The protocol is transport-agnostic and has been implemented in Python, TypeScript, C#, and Java SDKs. In December 2025, Anthropic donated MCP governance to the Agentic AI Foundation (AAIF) under the Linux Foundation.
Retrieval-Augmented Generation (RAG) was introduced by Lewis et al. (2020) as a general-purpose fine-tuning recipe combining pre-trained parametric memory (a seq2seq language model, specifically BART in the original paper) with non-parametric memory (a dense vector index of Wikipedia, accessed via Dense Passage Retrieval, DPR). In the original formulation, both the retriever and the generator are fine-tuned end-to-end: given an input query x, the retriever retrieves top-k documents z from the corpus, and the generator produces an output y conditioned on x and z. Two formulations were proposed: RAG-Sequence (the same retrieved documents condition the full output sequence) and RAG-Token (different documents may be used per generated token, marginalized during generation).
In widespread contemporary usage (post-2022, with the growth of LLM applications), 'RAG' has expanded to describe a broader class of retrieve-then-generate pipelines, typically with a frozen LLM, a vector store containing pre-computed dense embeddings of document chunks, and a retrieval step that fetches top-k relevant chunks based on embedding similarity to the query. The retrieved chunks are appended to the prompt as context before the LLM generates a response. This non-trainable pipeline variant is technically distinct from the original Lewis et al. formulation but is the dominant practical interpretation of RAG as of 2023β2025.
The canonical modern RAG pipeline consists of an offline indexing phase (document chunking, embedding computation, storage in a vector database) and an online query phase (query embedding, approximate nearest neighbor search, context-augmented generation). Key design decisions include: chunk size and overlap, embedding model choice, retrieval strategy (dense, sparse/BM25, or hybrid), number of retrieved documents k, and context integration method (prepend to prompt, cross-attention injection, or fusion-in-decoder).
RAG addresses two fundamental limitations of parametric-only LLMs: the knowledge cutoff problem (inability to access post-training information) and hallucination (generation of factually incorrect content). However, RAG introduces its own failure modes, including retrieval of irrelevant or misleading context and the LLM's susceptibility to being distracted by retrieved content that contradicts its parametric knowledge.