Robots AtlasRobots Atlas
May 7, 2026 · 6 min readGrok 4.3Grok Imagine Agent ModexAI API

Grok 4.3: xAI wins on price, not yet on quality

Grok 4.3: xAI wins on price, not yet on quality

xAI, Elon Musk's AI company, released Grok 4.3 on May 2, 2026 — a cheaper and faster alternative aimed at developers and businesses, positioned on the Pareto frontier of cost and quality. Alongside the model, xAI launched a beta version of Grok Imagine Agent Mode for multi-step creative projects. Grok 4.3 costs roughly ten times less than GPT-5.5 for equivalent benchmark runs, but still trails the market leaders in the most demanding evaluations.

Key takeaways

  • Pricing: $1.25/million input tokens and $2.50/million output tokens — approx. 40–60% cheaper than the previous Grok 4.20
  • Intelligence Index (Artificial Analysis): 53 points — 8th place, behind GPT-5.5 (60 pts) and Claude Opus 4.7 (57 pts)
  • Benchmark cost: $395 for a full test run vs $3,959 for GPT-5.5 and $4,811 for Claude Opus 4.7
  • xAI launched Grok Imagine Agent Mode (beta) — agentic mode for creative projects: films, manga sets, product stories
  • Model available via OpenRouter, xAI API, and the Hermes agent (Nous Research)

A Developer Model: Speed, Price, and Tools

According to xAI developer Eric Jiang, Grok 4.3 was built for speed, low cost, and efficient tool calls. The model runs at 100 tokens per second and features a one-million-token context window. It supports web search, X search, Python code execution, file search (RAG), and autonomous generation of Excel files, PDFs, and PowerPoint decks. Reasoning is built in by default: Grok 4.3 "thinks" before answering every request, with reasoning tokens billed at the same rate as regular output tokens. Knowledge cutoff is December 2025.

Prices dropped significantly compared to the predecessor: input costs fell by approximately 40% and output costs by around 60% versus Grok 4.20. At $1.25 per million input tokens and $2.50 per million output tokens, Grok 4.3 lands on what Artificial Analysis calls the Pareto frontier — the model that best balances quality and cost. A full benchmark run costs $395, compared to $3,959 for GPT-5.5 and $4,811 for Claude Opus 4.7.

Benchmarks: Good Value, Weaker on Hard Tasks

On the Artificial Analysis Intelligence Index, Grok 4.3 scored 53 points — slightly above Muse Spark and Claude Sonnet 4.6, and 4 points above the previous Grok 4.20. It still falls well short of the flagship models: GPT-5.5 scores 60 points, Claude Opus 4.7 and Gemini 3.1 Pro both score 57 points.

On GDPval-AA — a benchmark measuring AI performance on real-world knowledge work tasks — Grok 4.3 posted an Elo gain of 321 points to reach 1,500, outpacing Google Gemini 3.1 but remaining 276 Elo points behind GPT-5.5. Domain-specific results are mixed: Val's AI ranks the model first on CaseLaw and first on CorpFin, but 13th on challenging coding and hard math.

Independent tests by Andon Labs — which run AI models on a snack vending machine simulator — revealed a problem with autonomous action. The lab described the model's behavior as suffering from "narcolepsy problems, preferring to sleep for multiple days in a row over taking actions." For a model marketed as agentic, this is a meaningful weakness.

Grok Imagine Agent Mode: Agentic Creativity

Alongside the model, xAI released a beta version of Grok Imagine Agent Mode — a creative project interface built on an agentic AI architecture. Rather than handling single prompts, the mode manages longer creative sessions: an AI agent plans, generates, edits, and revises content in an open workspace. xAI lists example use cases as a one-minute movie, a manga set, or product stories.

The mode is accessible via the Grok web interface at grok.com/imagine and requires a paid account. Agent Mode can be activated from the input field in the bottom-left of the interface. The feature is in beta — xAI has not provided a general availability timeline.

Context: Where Does Grok 4.3 Fit in the Market?

Grok 4.3 reflects the growing polarization of the AI model market: on one side, ultra-capable (and ultra-expensive) flagship models from OpenAI and Anthropic; on the other, a fast-growing class of high-value-per-dollar models. Grok 4.3 clearly targets the second niche — and does so effectively in legal and financial tasks, where it achieves top rankings. However, the quality gap versus GPT-5.5 and Claude Opus 4.7 is real and material in applications requiring general reasoning, hard math, or complex coding.

xAI's pricing strategy is aggressive: the model is available through OpenRouter and the xAI API without usage tiers, with reasoning included in the token price — unlike OpenAI, which separates reasoning models into distinct products. Grok 4.3 is also available through the Hermes agent from Nous Research, signaling that xAI is actively building an ecosystem of partners in the agentic tooling segment.

Why This Matters

Grok 4.3 illustrates the maturing of the AI model market: not every provider needs to chase the absolute quality leader. There are applications — legal work, corporate documents, rapid content analysis — where cost and throughput matter more than performance on a hard math benchmark. If Grok 4.3 sustains its top-1 rankings in CaseLaw and CorpFin, it could become the default choice for a meaningful class of legal and financial use cases.

At the same time, the reported "narcolepsy" in agentic tasks is a warning sign. A model marketed by xAI as an agentic tool should be evaluated precisely in that dimension — and the results are not unambiguously positive. For developers building applications that require sustained autonomous action, this is critical information before deployment. Grok Imagine Agent Mode arrived in beta exactly as that weakness became publicly visible — which may be coincidence or narrative management.

What's Next

  • Grok Imagine Agent Mode is in beta with no announced production release date — xAI has not provided a general availability timeline
  • Andon Labs has announced further Grok 4.3 testing on agentic simulators — results will determine whether the "narcolepsy" issue is systemic or task-specific
  • Artificial Analysis is monitoring the model continuously — future Intelligence Index updates could shift Grok 4.3's ranking, especially if xAI releases a rapid patch

Sources

Share this article