June 24, 2026

What drives unexpected AI token cost for businesses?

Rafael Hajjar

Economist

In this article

What is an AI token, and why does it matter for finance?
The agentic multiplier
Model tier migration
Multi-model proliferation
Elastic billing with no built-in ceiling
Cache adoption gaps
Billing opacity in agentic workflows
How Ramp tracks all six drivers in one place

2,000+5 star reviews

Spending made smarter

Easy-to-use cards, funds, approval flows, vendor payments —plus an average savings of 5%.¹

Unexpected AI token cost increases are driven by six structural patterns: the agentic multiplier, model tier migration, multi-model proliferation, month-over-month spend volatility, cache adoption gaps, and billing opacity in agentic workflows. Each one is nearly invisible to standard finance processes. Together, they explain most of the gap between what companies budget for AI and what they actually pay.

If you’ve received an AI invoice that surprised you, one of these six patterns is almost certainly the cause.

Driver	Root cause	Finance signal
Agentic multiplier	Agents run dozens of API calls per task	Cost spikes after deploying an automated workflow
Model tier migration	Teams upgrade models without a procurement event	Cost increase with no matching volume increase
Multi-model proliferation	Every department runs different tools, each billed separately	More AI vendors on invoices than anyone tracks
Elastic billing	Usage scales with events absent from budget models	Month-to-month swings of 30–50%+
Cache adoption gaps	Some companies pay list rate; others pay a fraction	Same provider, dramatically different effective rates
Billing opacity	Agents determine token consumption, not humans	No meaningful way to set cost ceilings upfront

What is an AI token, and why does it matter for finance?

An AI token is the base unit of computation that AI providers use to measure and bill usage. Roughly speaking, one token represents about three to four characters of text, and a typical word is one to two tokens. When an employee uses an AI tool, the provider counts every token in the request and the response, then multiplies by the model’s rate.

This billing model is fundamentally different from traditional SaaS. With software subscriptions, you pay a flat fee regardless of how much you use the tool. With token-based AI, every interaction carries a variable cost that can swing dramatically based on which model is used, how often it runs, and whether the underlying infrastructure is optimized.

That variability is what creates surprises. And unlike cloud compute costs, which have decades of FinOps tooling and institutional knowledge behind them, AI token billing is new enough that most finance teams are still building their mental model for it.

The agentic multiplier

The biggest source of AI cost surprises isn’t high usage. It’s agentic usage.

When a human uses an AI tool, they submit a prompt and receive a response. The token cost is bounded by the length of that exchange. When an AI agent executes the same task autonomously, it may plan, research, draft, verify, and retry, generating dozens or hundreds of API calls for a single task. Token consumption can multiply by a factor that doesn’t appear anywhere in the vendor’s documentation or your initial cost estimate.

Companies often approve an AI tool based on a demo where a human is guiding the interaction. Once that same tool runs agentic workflows in production, such as automated code review, document analysis, or customer research pipelines, the billing picture changes entirely. You approved the task; the agent determined the bill.

As of April 2026, automated AI sessions account for nearly half of all AI sessions Ramp tracks across its customer base. The divergence between what a human session costs and what an automated session costs is one of the least-understood variables in AI budgeting.

What to watch for: If your team recently deployed an agent for a workflow that previously involved humans, such as email triage, document review, or code generation, check whether the per-task token consumption matches what you were quoted at the demo stage.

Model tier migration

The second driver is quieter but often larger in dollar impact: teams migrate to more capable AI models without a formal decision that finance ever sees.

AI providers offer models at dramatically different price points. Budget models cost a fraction of a cent per thousand tokens. Premium reasoning models can cost 10–20x more. A team that starts on a budget model and migrates upmarket is effectively changing its cost structure without sending anyone a procurement notice.

This happens in predictable ways. A tool updates its default model. An engineer switches to a more capable model to improve output quality. A vendor recommends an upgrade as part of onboarding a new feature. None of these generate a purchase order; all of them appear on the next invoice.

The scale of this shift in Ramp’s dataset is significant. In May 2026, among the highest AI adopters, the median spend share on premium models was 14% of OpenAI + Anthropic token/API spend. That share is growing as teams default to higher-capability models for more demanding tasks—and it happened without most finance teams setting a formal policy on which model tiers their teams are authorized to use.

What to watch for: If your AI costs increased without a matching increase in token volume, model tier migration is the most likely explanation. Check your provider invoices for line items broken down by model name.

Multi-model proliferation

The number of AI vendors a company is actually using is almost always higher than what finance tracks. And each additional vendor means an additional model, an additional contract, and an additional line on the invoice.

Every team runs different tools. Every tool runs a different model. Engineering uses Cursor and Claude. Marketing uses ChatGPT and Perplexity. Customer success uses Intercom AI. Legal uses Harvey. Each tool bills separately, under separate contracts, against separate budget lines. Finance sees the aggregate only when the credit card statement or accounts payable queue assembles them.

As of April 2026, companies in Ramp’s dataset use a median of nine AI models, with an average of 16.5 models per company. Companies using 26 or more distinct models average $31,217 per month in AI spend.

The challenge isn’t any single tool. It’s the sum. Every department head believes their team’s AI tools cost a few hundred dollars a month. No one has added them up.

What to watch for: Ask each department head which AI tools their team uses and which models those tools default to. The list will almost certainly be longer than what your vendor contracts show, and longer than what’s appearing in your AP queue.

Elastic billing with no built-in ceiling

Traditional SaaS costs are predictable. AI token costs are not.

Software subscriptions don’t change month to month. You pay for seats; the bill is stable. AI token costs are elastic; they scale with usage, and usage is driven by events that don’t appear in any budget model: a new feature launch that runs thousands of agent tasks, a model update that changes consumption rates, a new team onboarding onto a shared platform.

As of April–June 2026, month-over-month AI spend swings a median of 58% in absolute terms across the businesses Ramp tracks, and 61% of businesses exceed 40% MoM volatility. Compare that to traditional SaaS subscriptions, where month-over-month cost variance is typically near zero.

Among the businesses with the highest AI adoption, token consumption has grown 1,001% year-over-year. Across all businesses on Ramp, AI spend has grown 497% year-over-year. Total spend is still rising because consumption is growing faster than prices are falling.

The practical problem for finance: a budget set in Q1 is stale by Q2. AI spending doesn’t follow the annual planning cycle.

What to watch for: Set a monthly AI spend review cadence, not a quarterly one. Month-to-month swings of 30–50% are common enough that a quarterly check creates meaningful visibility gaps.

Cache adoption gaps

One of the most impactful drivers of AI cost differences between companies is also one of the least discussed: prompt caching.

Most major AI providers offer caching, a mechanism where repeated context (a long system prompt, a document re-sent with every API call, a shared instruction set) is stored and reused rather than reprocessed at full cost. Companies that implement caching effectively pay dramatically less per token for the same workloads. Companies that don’t pay the list rate every time.

As of June 2026, 60% of businesses connected to Ramp’s AI Token Spend Management have cache hit rates of 80% or higher, while 13% have cache hit rates below 20%. The effective token cost difference between these two groups is substantial: businesses serving most requests from cache pay a fraction of the rate paid by those hitting the API cold on every call.

This doesn’t surface on provider invoices in a way that’s easy to interpret. The per-token rate looks the same. The effective cost isn’t. The difference is buried in whether your platform team or your vendors have implemented caching, a technical decision that finance typically has no visibility into and no mechanism to require.

What to watch for: Ask your engineering team whether the AI tools your company uses have caching enabled. For Anthropic and OpenAI specifically, prompt caching is a configurable feature that significantly reduces costs on repetitive workloads.

Billing opacity in agentic workflows

The final driver is structural, and it’s accelerating as agentic AI becomes a larger share of company spending.

In traditional compute, cost follows resource allocation: you provision infrastructure, you pay for that infrastructure. In agentic AI, you approve a task, but the agent determines the bill. The number of tokens consumed depends on the agent’s reasoning path: how many sub-steps it takes, how many times it retries a failed tool call, how many intermediate documents it reads before producing a final output.

This isn’t a flaw in the provider’s pricing model. It’s inherent to how large language models work. But it creates a cost category with no analogue in traditional software procurement, and no established finance process for managing it.

The implication: the usual controls don’t apply. You can’t cap spending by approving a purchase order for a fixed number of tokens. You can’t audit post-hoc by checking whether the output matches the request. You need visibility into the agent’s behavior before you can set meaningful cost expectations.

What to watch for: Before approving budget for any agentic workflow, ask the team that owns the tool to provide a monthly cost estimate per automated task and set a spending ceiling before the workflow goes live. Without a cost-per-task estimate upfront, there is no meaningful way to budget for agentic AI spend.

How Ramp tracks all six drivers in one place

Most companies encounter these drivers the same way: an invoice arrives that no one can explain. Ramp’s AI Token Spend Management is built to surface them before the bill hits.

The product connects your AI spend across providers (Anthropic, OpenAI, and others) and surfaces costs by provider, model, and team. It detects anomalies automatically, so you can catch a model migration or an agentic workflow spike before it compounds across a full billing period. And because Ramp connects through billing APIs rather than AI tools directly, it only accesses cost and usage metadata—not prompt content.

The six drivers above describe what finance teams are managing. Ramp is how they manage it.

Get early access to Ramp AI Token Spend Management

Try Ramp for free

Share with

Rafael Hajjar•Economist

Computer science and Robotics student at the University of Pennsylvania with a background in Quantitative Research at Point 72.

Ramp is dedicated to helping businesses of all sizes make informed decisions. We adhere to strict editorial guidelines to ensure that our content meets and maintains our high standards.

FAQs

Unexpected AI token cost spikes are almost always caused by one of six structural patterns: the agentic multiplier (autonomous workflows consuming far more tokens than manual ones), model tier migration (teams switching to higher-cost models without a procurement event), multi-model proliferation (costs accumulating across more tools than finance tracks), month-over-month spend volatility (elastic usage that outpaces planning cycles), cache adoption gaps (a substantial cost difference between companies that use prompt caching and those that don't), and billing opacity in agentic workflows (where the agent, not the human, determines how many tokens get consumed).

AI providers publish list prices per million tokens, but what businesses actually pay is typically far lower—prompt caching, batch discounts, model version differences, and contract terms all reduce the effective rate. Ramp’s observed aggregate average across all models and companies in its dataset is $0.72 per million tokens.

AI token costs require a different approach than traditional SaaS budgets. Because costs are elastic and driven by usage events, static annual allocations typically underestimate actual spend by Q2. A more effective model: set a monthly AI spend envelope by team, review actual versus budget monthly (not quarterly), and establish model-tier policies before deploying tools at scale. For benchmarking context, as of April 2026, among the highest AI adopters on Ramp, the median monthly AI token spend is $2,246, while the average (skewed by high-consumption outliers) is $140,842.

The agentic multiplier describes the difference in token consumption between a human-initiated AI task and the same task run autonomously by an AI agent. Because agents can plan, research, retry, and self-correct, they typically consume significantly more tokens per task than a direct human prompt—sometimes by an order of magnitude for complex workflows. Finance teams who budgeted for human-use patterns and then deployed agentic workflows often see the multiplier show up as an unexplained cost increase within one or two billing cycles.

Premium reasoning models, including Claude Opus 4, GPT-4o, and o1-class models, cost significantly more per token than budget-tier models like Claude Haiku or GPT-4o mini. Cost differences of 10–20x for similar tasks are common. Ramp’s data shows that in May 2026, among the highest AI adopters, the median business directed 14% of its OpenAI and Anthropic token spend toward premium models, reflecting how broadly teams have migrated upmarket without formal finance oversight.

Don't miss these

Spend management

How much do AI tokens cost businesses?

4 questions every finance leader should ask of their AI stack

Spend management

Preparing for the future of finance starts with context and control

“Most banks treat the back office as a cost to keep down. We treat ours as a return to compound, which is why we run it on Ramp. Now we put our clients on Ramp, too.”

Patrick Gaughen

President & COO, Hingham Institution for Savings

Read customer story

The 192-year-old bank that banks on Ramp to take the waste out of its own books

“Browserbase builds infrastructure so AI agents can do real work. Ramp is doing the same for finance. It’s not another tool. It’s a system purpose-built for AI-driven finance, and that’s why we chose Ramp as our financial operating system from day one.”

Paul Klein IV

Founder & CEO, Browserbase

Read customer story

How the startup that helped design Ramp’s procurement agent automated its own procure-to-pay

“We used to pay up to $20k a year for our AP platform. With Ramp, we’re earning back well over that amount. That's money that belongs to the mission now, not to the back-office software.”

Heidi Coffer

Chief Financial Officer, Boys & Girls Clubs of San Francisco

Read customer story

Boys & Girls Clubs of San Francisco used to pay for their finance software — now it pays them

“The tricky thing about corporate travel policy is timing. We didn't need a stricter policy. We needed the policy to show up earlier. With Ramp Travel, it finally does.”

Keith Frantz

Director of Enterprise Risk Management, Prosper

Read customer story

When Prosper put policy into its corporate travel booking flow, costs fell 15% and finance reclaimed a week every month

“We're accountable to our funders, our partners, and the families we serve. That accountability starts with how we manage every dollar. Ramp makes it easy for our team to spend wisely, track in real time, and keep overhead low so more resources reach the families navigating infertility.”

Rachel Fruchtman

CFO, Jewish Fertility Foundation

Read customer story

Jewish Fertility Foundation reclaimed 11 work weeks and put more time into serving families

“Each member of our team has an outsized impact due to our focus on using high-leverage tools like Ramp.”

Lauren Feeney

Controller, Perplexity

Read customer story

How Perplexity's finance team of 10 scales one of the fastest-growing AI startups

“With Ramp, we haven’t had to add accounting headcount to keep up with growth. The biggest takeaway is that instead of hiring our way through it, we fixed the workflow so we can keep supporting the organization as we scale.”

Melissa M.

VP of Accounting at Brandt Information Services

Read customer story

Brandt grew finance operations 3x with zero added accounting headcount

“In the public sector, every hour and every dollar belongs to the taxpayer. We can't afford to waste either. Ramp ensures we don't.”

Carly Ching

Finance Specialist, City of Ketchum

Read customer story

City of Ketchum saves 100+ hours to make every taxpayer dollar count

What drives unexpected AI token cost for businesses?

What is an AI token, and why does it matter for finance?

The agentic multiplier

Model tier migration

Multi-model proliferation

Elastic billing with no built-in ceiling

Cache adoption gaps

Billing opacity in agentic workflows

How Ramp tracks all six drivers in one place

FAQs

Why do AI token costs spike unexpectedly?

What's the difference between a list price and an effective token rate?

How should finance teams budget for AI token costs?

What is the agentic multiplier?

Which AI models cost the most to run?

Don't miss these

How much do AI tokens cost businesses?

4 questions every finance leader should ask of their AI stack

Preparing for the future of finance starts with context and control