Starting June 1, 2026, GitHub Copilot is retiring its "Premium Requests" subscription model in favor of per-token billing. The change mirrors what Anthropic and OpenAI have already introduced for enterprise API clients, but extends that logic directly to individual developers and small teams for the first time.
Key Takeaways
- From June 1, 2026: Copilot charges based on tokens consumed by the LLM, not on a per-query count
- New currency: "AI Credits" — 1 credit = $0.01 USD; Copilot Pro subscribers ($10/month) receive 1,000 credits
- Code completions and Next Edit suggestions remain free of charge
- Uber has already spent its entire 2026 AI budget in the first months of the year — 11% of code changes are now AI-generated
- The move aligns Copilot's pricing architecture with industry-standard API billing
The End of Flat-Rate AI Assistance
Until now, GitHub Copilot users received a monthly quota of "Premium Requests." Each query — whether a simple syntax question or a multi-hour refactoring task across a large repository — consumed one unit from that quota. The model was easy to understand, but economically inconsistent: Microsoft was effectively subsidizing computationally expensive tasks at a loss.
The new model mirrors how OpenAI API and Anthropic Claude API work for enterprise clients. Usage is measured in tokens: prompt input, model output, and cached context. The longer the code file submitted or the more detailed the task description, the more tokens consumed. The more powerful the model selected, the higher the per-token cost.
A token roughly corresponds to three-quarters of an English word. Ten thousand words of code equates to approximately 12,000–13,000 tokens. A developer asking Copilot to analyze an entire repository will consume significantly more credits than one asking isolated syntax questions.
Credit Tiers and Subscription Logic
GitHub is maintaining its current subscription price points, but shifting the billing unit. A Copilot Pro subscriber ($10/month) receives 1,000 AI Credits — equivalent to $10 at the rate of one credit per cent. The number of tokens a credit purchases varies across four dimensions: the LLM model selected, the input-to-output ratio, the cache size, and the feature type requested.
The practical consequence is that developers primarily using auto-completions and short-form queries are unlikely to exceed their monthly allocation. Those running autonomous coding agents against large codebases will exhaust their quota more quickly than under the previous model, and will need to purchase additional credits.
The Enterprise Signal: The Uber Case
The most instructive illustration of the cost dynamic is Uber. According to The Information, Uber's CTO disclosed that the company has already consumed its entire 2026 AI budget in the first months of the year, with 11% of all code changes now written by AI. The company's primary AI coding tool is built on Anthropic's Claude.
This is not an isolated case. As multi-agent systems become standard in development teams, token consumption scales nonlinearly. An agent does not just answer questions — it plans, verifies, and iterates, generating multiple model calls at each step. Under per-token billing, every intermediate step is reflected in the invoice.
Why Microsoft Could No Longer Absorb the Cost
Unlike Anthropic and OpenAI, which have operated on per-token API billing from inception, Microsoft was able to subsidize Copilot's usage costs with revenues from Azure and its broader software business. The rapid growth of autonomous coding agents — running unsupervised against large repositories — pushed compute costs beyond what flat-rate pricing could absorb.
GitHub has acknowledged that until the pricing change, users were effectively consuming three to eight times the token value their subscription fee would cover. The platform was running with an intentional operating subsidy — a model that works during market-acquisition phases, but becomes unsustainable at scale.
The Unintended Consequence: Higher Barrier for Exploration
The pricing change introduces a meaningful psychological friction for new users. Previously, a developer onboarding to Copilot could freely experiment with different models, ask complex questions, and build intuition about the tool's capabilities at no additional cost. From June onwards, every substantive query carries a visible credit cost.
This erodes part of Copilot's appeal as a low-friction learning environment. It is worth noting that Cursor takes a hybrid approach: a defined number of advanced queries are included in the subscription, with per-token billing applying to usage beyond that threshold.
Why This Matters for the AI Market
GitHub Copilot introduced a generation of developers to AI assistance who had never interacted with OpenAI API or Anthropic Claude API directly. The shift to per-token billing is the first time this user group will encounter AI economics at the transaction level — the same mechanics that enterprise buyers have navigated for years.
For organizations deploying agentic AI systems for autonomous code review and generation, the change forces a revision of cost assumptions. The cost of running an agent is no longer the cost of a subscription tier; it is a function of iteration count, context length, and model selection. For CTOs managing AI coding pipelines at scale, precise token cost tracking becomes a first-order budget concern.
What's Next?
- The change takes effect June 1, 2026 — users have under a month to adjust workflows and usage patterns
Codex — OpenAI's new background coding agent — operates exclusively on per-token billing, establishing this as the emerging industry standard for AI developer tools
- Further SaaS AI tools are expected to follow with comparable billing architecture changes in the coming months





