Robots AtlasRobots Atlas

Agents as a Service

Shifts the software delivery model from click-driven web applications to agent-based services, where the client defines a goal in natural language and the provider delivers an autonomous agent that executes the work end-to-end.

Category
Abstraction level
Operation level
Customer experience – autonomously built and optimized customer service agentsVoice agents – building and maintaining voice agents across multiple languagesKnowledge work automation – delegating entire workflows rather than individual subtasksOutcome-based billing – charging per result (resolved ticket, closed case) instead of per seatHeadless platform deployment – platforms refactored as programmatic infrastructure for agents

The client defines a goal or uploads source materials (procedures, call transcripts, recordings, documentation). The provider's agent (e.g., Ghostwriter) analyzes this data, identifies key behaviors and edge cases, generates a production-ready executive agent, and configures it across multiple channels (voice, chat, email) with built-in safeguards. A continuous improvement loop then analyzes real interactions, proposes enhancements, tests them in a sandbox, and prepares them for human approval. All interaction with the agent occurs in natural language β€” no UI clicks required.

The traditional SaaS model requires a human to learn an application's interface and perform work step by step through manual interaction. This scales linearly with the number of users and their cognitive constraints. AaaS addresses this by delegating execution to an AI agent, reducing human interaction to defining the goal and accepting the result.

01

Agent Scaffolding

Provides the agent with the execution scaffold required for production operation.

Layer that provides the agent with tools, memory, planning, a coherent action space, and the right task context. Without it, the agent has no stable foundation for production operation.

02

Headless Infrastructure

Enables the agent to control the platform without a UI.

Refactoring of the SaaS platform so that all functions are accessible programmatically (API, SDK) without UI dependency. Lets the agent invoke the platform directly instead of emulating clicks.

03

Validation Environment

Secure validation of changes prior to deployment

Modular

Isolated space where the agent builds and tests changes before deploying to production. Critical for safe autonomy β€” the agent can experiment without risking damage to the running system.

04

Agent Assembly Line Loop

Autonomous continuous improvement of production agents

Modular

Cycle of analyzing real interactions, identifying improvement opportunities, validating them, and preparing them for review. Runs autonomously in the background and lets agents improve over time.

05

Human Review Checkpoint

Human oversight of irreversible changes

Modular

Gate where changes prepared by the agent are approved by a human before deployment. Forms the foundation of trust and accountability in production AaaS deployments.

Parallelism

Conditionally parallel

Parallelism occurs primarily across clients (different clients, different production agents) and within multi-agent subsystems (e.g., concurrent sandbox tests).

Paradigm

Conditional

Input dependent

AaaS is a delivery paradigm, not a specific computational kernel. The execution mode is inherited from the underlying Agentic AI: conditional loops driven by an LLM over the platform's headless infrastructure.

Agent Autonomy Scope

Critical
  • proposal_onlyAgent prepares changes, a human approves them.
  • auto_with_rollbackAutomated deployments with rollback capability.

Scope of decisions the agent can make without human approval β€” from proposals requiring review to full autonomous deployments.

Billing Model

Standard
  • outcome_based
  • consumption_based
  • seat_based

Billing model: per seat (SaaS-like), per usage (per token / call), or per outcome (resolved case, closed ticket).

Channel Range

Standard
  • chat_only
  • voice + chat + email + 30+ languages

Number and type of channels the agent operates on: chat, voice, email, video, messengers.

Agent input modality

Standard
  • natural_language_prompt
  • multimodal (SOP + transcripts + audio + images)

Format in which the customer defines what they expect from the agent: text prompt, process documentation, transcript, audio recording, whiteboard photo.

Guardrails Restrictiveness

Standard
  • minimal
  • regulated_industry (healthcare, finance)

Level of built-in safeguards constraining agent actions: content filtering, compliance verification, tool validation.

Common pitfalls

No truly headless platform available
CRITICAL

Attempts to implement AaaS on a UI-first platform (clicks, forms) result in an agent emulating a human user β€” unstable and slow. Without refactoring the platform into headless infrastructure, the paradigm does not work.

Refactor the platform to an API-first/headless architecture before building an orchestrating agent. Sierra's Ghostwriter was preceded by exactly such a platform rearchitecture.

Mismatched billing model
HIGH

Using a seat-based billing model for AaaS weakens the value proposition β€” the customer doesn't buy seats, they buy outcomes. Misaligned billing obscures ROI and slows adoption.

Prefer outcome-based (per resolved case) or consumption-based billing; measure and report business results, not agent activity.

Lack of autonomy and trust progression
HIGH

Full agent autonomy from day one is risky β€” without an observed phase and incremental permission delegation, the customer won't build trust, and agent errors can undermine the whole contract.

Begin in proposal-only mode (agent prepares, human approves); expand autonomy based on measured quality and trust metrics.

Absence of continuous production agent evaluation
MEDIUM

AaaS relies on continuous improvement. Without automated evaluation of real interactions, the agent assembly line has no signal β€” the agent stops learning and drifts away from business changes.

Embed automated regression detection, trend exploration (analogous to Deep Research), and periodic sandbox A/B testing.

Using AaaS where SaaS would suffice
MEDIUM

For deterministic, well-defined tasks (forms, simple CRUD), AaaS adds LLM cost, latency, and unpredictability. Traditional SaaS is often faster, cheaper, and more predictable.

Adopt AaaS only when tasks are variable, unpredictable, and require reasoning over context; for stable workflows, retain SaaS.

1999

Software as a Service β€” the pattern AaaS reacts against

Salesforce introduces the SaaS model based on a centrally hosted, human-operated web application. It defines the 'customer buys access to an interface' paradigm against which AaaS will later position itself.

2022

ReAct – technical foundation of the LLM agent loop

Yao et al. (2022) show that LLMs can act as a reasoning engine in loops combining thoughts with tool actions. This is the technical substrate on which AaaS becomes possible.

2024

Anthropic – compositional patterns for production agents

Anthropic publishes guidelines distinguishing workflows from agents and formalizing five composition patterns. Practitioners gain a common vocabulary that will later ease AaaS commercialization.

2026

Sierra publishes 'Agents as a Service' manifesto and launches Ghostwriter

breakthrough

On March 25, 2026, Bret Taylor and Clay Bavor (Sierra co-founders) publish the Agents as a Service manifesto and introduce Ghostwriter β€” an agent that builds agents. They coin the phrase 'prompts, not clicks' and the 'agent assembly line' concept. This is the moment the term enters public industry discourse.

Hardware agnosticPRIMARY

AaaS is a delivery model, not a computational kernel. Hardware requirements stem from the underlying LLMs and platform tools, not from the service paradigm itself.

In practice, AaaS providers run LLM inference on GPU tensor cores or TPUs; the orchestration layer and headless API run on CPU.

BUILT ON

LLM

A Large Language Model (LLM) is a class of machine learning models based on the Transformer architecture, trained on large text datasets via autoregressive language modeling (next-token prediction). These models have billions of parameters and can generate coherent text, answer questions, write code, translate languages, and perform many other language-cognitive tasks without task-specific fine-tuning. The term covers models such as GPT, LLaMA, Gemini, Claude, and Mistral. Most modern LLMs are instruction-tuned (SFT + RLHF) after the pre-training phase.

GO TO CONCEPT
Agentic AI

Agentic AI denotes an architectural transition from single-turn, stateless generative models toward goal-directed systems capable of autonomous perception, planning, action, and adaptation through iterative control loops. An agentic system wraps a large language model in a runtime that gives the model access to tools (web search, code execution, APIs, file I/O), persistent memory, and feedback from prior steps. The model then decides dynamically which tools to call, in what order, and whether to loop or stop, rather than following a predefined code path. Two primary system types are commonly distinguished: (1) Workflows, in which LLMs and tools are orchestrated through predefined code paths, and (2) Agents, in which the LLM itself directs its process and tool usage dynamically. Both can be composed into multi-agent systems where specialized agents collaborate, with one acting as orchestrator and others as subagents. Key design patterns identified by Anthropic (2024) include prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops. Andrew Ng's 2024 taxonomy describes four foundational patterns: Reflection, Tool Use, Planning, and Multi-Agent Collaboration. Formal frameworks model agentic control loops as Partially Observable Markov Decision Processes (POMDPs). The control loop is: perceive state β†’ reason/plan β†’ select action β†’ execute tool β†’ observe result β†’ update state β†’ repeat. Agentic systems introduce risks not present in single-turn models, including hallucination in action, prompt injection through observed content, infinite loops, reward hacking, and tool misuse.

GO TO CONCEPT
Tool-augmented LLM

Tool-augmented LLM is an architectural pattern in which a large language model is equipped with access to one or more external tools that it can invoke during inference by generating structured function-call or API-call outputs. The model learns when and how to call tools by producing special tokens or structured output (e.g., JSON function calls) that are intercepted by a host runtime, executed against the tool, and whose results are returned to the model as new context for continued generation. The canonical formalization appeared in the Toolformer paper (Schick et al., Meta AI, 2023), which demonstrated that LLMs can learn to self-supervise their own tool-use through API call annotation without requiring large labeled datasets. Toolformer showed that models trained this way can decide which tools to call, when, and with which arguments, and that tool use substantially improves performance on tasks requiring fresh information, arithmetic, multilingual lookup, and question answering. The pattern encompasses several mechanisms: (1) in-context tool specification, where tool interfaces are described in the system prompt or context (JSON Schema, OpenAPI, natural language); (2) function calling APIs, where the model produces structured output matched to a defined schema and the host dispatches the call; (3) ReAct-style interleaving, where the model alternates reasoning traces with tool-use observations; and (4) parallel tool calling, where the model emits multiple tool calls simultaneously to be executed concurrently. Key implementations include OpenAI function calling (GPT-4, June 2023), Anthropic tool use (Claude, 2023), Google Gemini function calling, and the Model Context Protocol (MCP, 2024) which standardizes tool server connectivity.

GO TO CONCEPT

EXTENDS

Agentic AI

Agentic AI denotes an architectural transition from single-turn, stateless generative models toward goal-directed systems capable of autonomous perception, planning, action, and adaptation through iterative control loops. An agentic system wraps a large language model in a runtime that gives the model access to tools (web search, code execution, APIs, file I/O), persistent memory, and feedback from prior steps. The model then decides dynamically which tools to call, in what order, and whether to loop or stop, rather than following a predefined code path. Two primary system types are commonly distinguished: (1) Workflows, in which LLMs and tools are orchestrated through predefined code paths, and (2) Agents, in which the LLM itself directs its process and tool usage dynamically. Both can be composed into multi-agent systems where specialized agents collaborate, with one acting as orchestrator and others as subagents. Key design patterns identified by Anthropic (2024) include prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops. Andrew Ng's 2024 taxonomy describes four foundational patterns: Reflection, Tool Use, Planning, and Multi-Agent Collaboration. Formal frameworks model agentic control loops as Partially Observable Markov Decision Processes (POMDPs). The control loop is: perceive state β†’ reason/plan β†’ select action β†’ execute tool β†’ observe result β†’ update state β†’ repeat. Agentic systems introduce risks not present in single-turn models, including hallucination in action, prompt injection through observed content, infinite loops, reward hacking, and tool misuse.

GO TO CONCEPT

Commonly used with

MAS

Multi-Agent Systems (MAS) are a paradigm in Distributed Artificial Intelligence in which multiple autonomous software entities β€” agents β€” interact within a shared environment to achieve individual or collective goals. Each agent perceives its environment through sensors or interfaces, reasons about its state, and acts through actuators or API calls. In the context of LLM-based MAS (emerging prominently from 2023 onward), agents are powered by large language models that provide the cognitive core (planning, reasoning, natural language communication), supplemented by memory modules, tool-use interfaces, and role-specific prompts. The system architecture defines how agents coordinate: coordination topologies include sequential pipelines, hierarchical orchestration (orchestrator-worker), parallel fan-out/fan-in, publish-subscribe messaging, and decentralized peer-to-peer communication. Core agent properties, as defined by Wooldridge and Jennings (1995), include autonomy, social ability, reactivity, and pro-activeness. In LLM-based systems, key components are: the agent (an LLM with a system prompt defining its role), a communication channel (natural language messages, structured function calls, or shared memory), an orchestrator or coordinator (managing task decomposition, routing, and state), tool-use interfaces (external APIs, code execution, web search), and a memory subsystem (short-term context, long-term vector storage). Prominent frameworks implementing LLM-based MAS include AutoGen (Microsoft, 2023), CAMEL (2023), MetaGPT (2023), CrewAI, and LangGraph.

GO TO CONCEPT
MCP

Model Context Protocol (MCP) is an open protocol developed by Anthropic and released in November 2024. It addresses the MΓ—N integration problem in AI systems: connecting M different LLM applications to N different external tools previously required MΓ—N bespoke connectors. MCP defines a standardized client-host-server architecture where hosts (LLM applications) manage one or more clients, each maintaining a stateful session with a specific server. Servers expose capabilities as three primitives: Resources (structured data for context), Prompts (templated instructions), and Tools (executable functions). Clients expose two primitives: Roots (filesystem entry points) and Sampling (server-initiated LLM completions). Communication is based on JSON-RPC 2.0. Capability negotiation occurs at session initialization. The protocol is transport-agnostic and has been implemented in Python, TypeScript, C#, and Java SDKs. In December 2025, Anthropic donated MCP governance to the Agentic AI Foundation (AAIF) under the Linux Foundation.

GO TO CONCEPT
RAG

Retrieval-Augmented Generation (RAG) was introduced by Lewis et al. (2020) as a general-purpose fine-tuning recipe combining pre-trained parametric memory (a seq2seq language model, specifically BART in the original paper) with non-parametric memory (a dense vector index of Wikipedia, accessed via Dense Passage Retrieval, DPR). In the original formulation, both the retriever and the generator are fine-tuned end-to-end: given an input query x, the retriever retrieves top-k documents z from the corpus, and the generator produces an output y conditioned on x and z. Two formulations were proposed: RAG-Sequence (the same retrieved documents condition the full output sequence) and RAG-Token (different documents may be used per generated token, marginalized during generation). In widespread contemporary usage (post-2022, with the growth of LLM applications), 'RAG' has expanded to describe a broader class of retrieve-then-generate pipelines, typically with a frozen LLM, a vector store containing pre-computed dense embeddings of document chunks, and a retrieval step that fetches top-k relevant chunks based on embedding similarity to the query. The retrieved chunks are appended to the prompt as context before the LLM generates a response. This non-trainable pipeline variant is technically distinct from the original Lewis et al. formulation but is the dominant practical interpretation of RAG as of 2023–2025. The canonical modern RAG pipeline consists of an offline indexing phase (document chunking, embedding computation, storage in a vector database) and an online query phase (query embedding, approximate nearest neighbor search, context-augmented generation). Key design decisions include: chunk size and overlap, embedding model choice, retrieval strategy (dense, sparse/BM25, or hybrid), number of retrieved documents k, and context integration method (prepend to prompt, cross-attention injection, or fusion-in-decoder). RAG addresses two fundamental limitations of parametric-only LLMs: the knowledge cutoff problem (inability to access post-training information) and hallucination (generation of factually incorrect content). However, RAG introduces its own failure modes, including retrieval of irrelevant or misleading context and the LLM's susceptibility to being distracted by retrieved content that contradicts its parametric knowledge.

GO TO CONCEPT
Agents as a Service

Manifest introducing the term 'Agents as a Service' by Bret Taylor and Clay Bavor; announcement of Ghostwriter.

blogSierra
ReAct: Synergizing Reasoning and Acting in Language Models

The technical foundation of the agentic loop on which AaaS is built.

scientific articlearXiv
Building effective agents

Compositional patterns for production agentic systems.

blogAnthropic