StrategyApril 202510 min read

The Truth About AI Costs

The pricing structures that govern how AI tools are sold, used, and charged for are among the most opaque in modern technology. And if you are making decisions about AI adoption, or trying to account for AI spend inside a business, understanding these numbers is where you need to start.

Start with the thing that underpins almost all of it: The token.

Most people using AI tools have no idea what a token is. They type a question, they get an answer, it feels like a conversation. But underneath that conversation, something very specific is happening. The AI is not reading your words the way you wrote them. It is breaking them down into units, tokens, that it can process. A token is roughly four characters of text in English, which works out to approximately three quarters of a word. The sentence you just read, for example, would contain around thirty tokens.

Tokens are the unit of measurement on which almost all AI pricing is built. You are not paying per question. You are not paying per session. You are paying, in one form or another, for tokens. Input tokens, which is what you send to the model. Output tokens, which is what it sends back. In many cases, the tokens that make up the model’s own instructions, the system prompt that shapes its behaviour before you’ve typed a word, are being counted too. Every word you write, every word it generates, every invisible instruction running in the background: all of it costs.

On the face of it, token pricing sounds manageable. Major providers publish their rates. Claude, GPT-4o, Gemini: they all have pricing pages. A dollar per million tokens sounds almost trivially cheap. And for a single query, it is. Ask a question, get an answer, you’ve spent a fraction of a cent. The problem is not the unit cost. The problem is that almost nothing about how AI is actually used in a business context makes it easy to understand, predict, or control the total cost.

Here’s why. When you use an AI tool, especially one embedded in a product rather than accessed directly through an API, you are almost never shown how many tokens you are consuming. There is no meter running in the corner of the screen. There is no notification when you have used half your allocation. There is no receipt. The consumer interfaces that most people use, the chat windows, the productivity integrations, the creative tools, tuck it all away. You interact with something that feels like a conversation, and the economics of that conversation are invisible.

This creates a problem that compounds quickly at enterprise level. Imagine a business with fifty people all using an AI tool on a company subscription. Some are using it for quick queries. Others are using it for extended document drafting, analysis, research, translation, and code generation. The token consumption across those fifty users is wildly variable. But the billing is flat. The finance director has no way of knowing whether the business is getting exceptional value or paying for capacity it barely touches. More significantly, as usage grows and the business considers scaling, there is no consumption data to model from.

The situation is different, but not straightforwardly better, when businesses move to API access, which is the route that allows direct integration of AI capabilities into products, workflows, and internal tools. Here the pricing is per token and therefore theoretically transparent. In practice, understanding what any given workflow will cost requires knowing how many tokens each step consumes, which depends on the length of the inputs, the length of the outputs, the size of the system prompt, whether the model is being asked to reason through a problem before answering, and a range of other variables that are not always easy to predict in advance.

Context windows add another layer. The context window is the amount of information an AI model can hold in its working memory at once. Early models had small context windows. You could send a short conversation, get a reply, and that was that. Modern models have dramatically larger context windows, some now running to a million tokens or more. That sounds like good news, and in many ways it is. You can feed a model an entire document, a full conversation history, a long brief, and it can work with all of it at once. But every token in that context window costs money. If you are maintaining a long conversation with an AI, you are not just paying for the latest exchange. Depending on how the application is built, you may be resending the entire conversation history with every message, paying to re-process everything that came before. The bill for a long, complex interaction is not the sum of individual queries. It can be considerably more.

Then there are the costs that are not in the pricing page at all. Fine-tuning a model on your own data, which is what you need to do if you want AI that genuinely understands your business, your tone, your clients, and your standards, carries its own costs. Storing and retrieving that data has costs. Running AI at scale requires infrastructure that has costs. And the time your people spend prompting, reviewing, correcting, and integrating AI outputs is a cost too, even if it doesn’t appear on an invoice.

Part of the answer is that the industry is young and pricing models are still evolving. The truth is that nothing creates friction like a cost counter running in real time. You spend more when you don’t see the meter. Every subscription business knows this.

But the deeper answer is that pricing transparency in AI would expose something that the providers would rather leave ambiguous: the relationship between what you pay and what you get is not linear, not predictable, and not always in your favour. The same task can cost very different amounts depending on how it is approached, which model is used, how the prompt is structured, and how the output is generated. A business that understood its token consumption in detail could make informed decisions about all of those things. Providers have limited incentive to make that easy.

There are also competitive reasons behind all of this. If it were straightforward to compare the true cost of running a workflow on one model versus another, switching would be easier and loyalty would be harder to maintain. The friction of not knowing creates a kind of lock-in that is more powerful than any contract.

For enterprises trying to build a genuine picture of AI value, this matters enormously. You cannot calculate return on investment without understanding cost. You cannot budget accurately without consumption data. You cannot make informed decisions about scaling without knowing what scaling costs.

What would transparency actually look like? At minimum, it would mean real-time usage dashboards that show token consumption per user, per team, and per task type. It would mean cost estimates before you run a process, and cost actuals after. It would mean clear documentation of what is being counted, including system prompts, reasoning chains, and cached versus live computation. It would mean pricing that scales in ways that are genuinely predictable, so that a business can model its costs at ten users and have reasonable confidence in what the cost at a hundred users will look like.

Some of this is beginning to emerge. Certain API providers offer usage dashboards. Some enterprise contracts include consumption reporting. A small number of tools are starting to surface cost information in their interfaces. But it is nowhere near standard, and the gap between what is available and what a serious business needs to make informed decisions remains significant.

In the meantime, there are things you can do. If you are on a flat subscription, run a structured audit of how your team is actually using the tool and what categories of work it is being applied to. That won’t give you token counts, but it will give you a value map. If you are using or considering API access, build token estimation into your workflow design from the start, not as an afterthought.

AI is not a utility in the way that electricity is a utility, where you know the unit cost, you can see the meter, and the bill reflects what you used. It is more like hiring a contractor whose day rate is low but whose expenses are unpredictable, whose working methods affect how long the job takes, and who sends a single monthly invoice with minimal itemisation. You can work with that. But you need to go in with your eyes open.

The businesses that will get the best long-term value from AI are not necessarily the ones spending the most. They are the ones who understand what they are spending, why, and what they are getting for it. That requires tough questions and pushing harder than most providers would prefer for the numbers, and the answers.

Share 𝕏

Book a Discovery Call

Back to Insights