top of page

THE HYPE: AI Cost Management Is the New Line on Your P&L

  • 6 days ago
  • 14 min read
Assorted British coins piled on a metal coaster on a wooden table, warm light and shallow focus. representing money saved in relation to token management.
AI Cost Management Is the New Line on Your P&L. By Mark Evans MBA

Most of the conversation about what AI costs is being held in the wrong place. It fixes on the wrong things. Which model is cleverest, what the licence runs to, whether the subscription pays for itself. All of it treats AI as something you buy. It isn’t. It’s something you meter, and the meter runs every time the thing does its job, in fractions of a penny, thousands of times a day, on an invoice almost nobody in the building is reading. The blind spot even has a name now: AI cost management.


Here is the claim, stated plainly. If you run a business with twenty people or two hundred and AI is doing real work somewhere inside it, there is a cost building in your accounts right now with no line, no owner and no forecast. The question that decides whether any of it makes you money has already moved, from which model is the cleverest to what it costs you to deliver one unit of work. Pounds per outcome. That’s the number this piece is about, and most boards can’t produce it yet.


A few months ago I made the governance case for treating AI tokens as capital, in a piece called The Hidden Cost of Context. It landed, and the reply I heard most from directors was the question I’d set them: what’s our token budget per customer? Good question. It’s also only the control question, the one you ask to stop a cost bolting. The frontier has already moved past it, and most finance functions haven’t looked up.


This is about where it has moved to. Token cost has stopped being something you govern in a risk register and review after the fact. It’s becoming a line of your cost of goods sold, and the discipline that matters now isn’t control. It’s forecasting. That’s a harder problem than it sounds, for a reason that runs flat against common sense.


From overhead to cost of goods sold

For a while, AI sat to one side of the business. A clever tool your people dipped into when it suited them. A cost that behaved like an overhead: bury it in a software line, govern it loosely, worst case, waste a little. That era is closing fast. Once a model is inside the work itself, drafting the quote, triaging the claim, and answering the customer at two in the morning, its cost stops being an overhead at all. It becomes a cost of production. Part of what each unit of your output costs to make sat in gross margin next to materials and labour.


That changes who should care about the number. An overhead is something finance keeps half an eye on. A cost of goods sold is something the whole business prices around. It belongs in your unit economics, your pricing, and the margin you quote to win the job. A firm still filing its AI spend under “software subscriptions” while a model handles a third of its customer contact is mis-stating its own cost base, and it’ll keep doing it right up to the morning a competitor who actually knows their per-unit cost prices the work below them and walks off with it.


One for the accountants. How this spend is treated for tax depends on where it sits: software licence, consumption cost, or possibly part of an R&D claim. Worth a conversation with yours before the year-end, not after.


The paradox that breaks the forecast

Here’s where it gets genuinely difficult, because it runs against the grain of everything that feels obvious. The cost of AI is collapsing. That collapse is exactly why your bill is going to climb.

Stanford’s 2025 AI Index, the most authoritative annual stocktake of the field, found the price of running a model at GPT-3.5 level of capability fell more than 280-fold in about eighteen months, from twenty dollars per million tokens at the end of 2022 to seven cents by late 2024 (Stanford HAI, 2025). Epoch AI, whose data underpins that index, reckons the cost of a given level of performance is dropping somewhere between nine and nine hundred times a year, depending on the task (Stanford HAI, 2025). On the face of it, a cost falling that fast should look after itself.


It won’t. A nineteenth-century economist explains why. In 1865 William Stanley Jevons noticed that when steam engines got more efficient and coal went further, Britain didn’t burn less coal. It burned far more, because cheaper power opened up uses that had never been worth it before (Jevons, 1865). Efficiency drove total consumption up, not down. Watching the price of AI fall off a cliff in January 2025, Microsoft’s Satya Nadella reached for exactly this idea, predicting that as AI got cheaper and more capable its use wouldn’t shrink, it would explode (Nadella, 2025). The billing data is proving him right. The new reasoning models think in longer chains and burn many times the tokens per answer. Agents loop, retry and call each other, multiplying consumption at every step. Menlo Ventures, which models enterprise AI spend, put the global generative-AI total at around thirty-seven billion dollars in 2025, roughly triple the year before (Menlo Ventures, 2025). Unit price in freefall, total bill soaring, at the same time.


For anyone building a forecast, that’s treacherous ground. The instinct is to assume that because each call keeps getting cheaper, the line flattens or falls. The opposite happens, because volume rises faster than price drops. “It’s getting cheaper” is the most dangerous assumption you can write into next year’s numbers.


FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

A falling price per use does not mean a falling bill. If your AI usage is growing faster than the per-token price is dropping, and it almost certainly is, your total cost rises even while every individual call gets cheaper. Forecast the volume, not just the rate.


A cost with no owner, and no forecast

There’s a well-documented example of what happens when a big, growing cost is left in a silo with nobody’s name against it and no honest forecast. It belongs to Uber.

For years Uber poured money into a self-driving division, the Advanced Technologies Group. On paper the logic held. Drivers were the company’s single biggest expense, so a car that drove itself would one day wipe it out. The future was worth paying for. The present was the problem. By the unit’s final stretch it was burning something like half a billion dollars a year, according to people close to its finances who spoke to Bloomberg, and in 2020 alone Uber’s own filings showed the division and its sister projects losing more than three hundred million dollars (Fortune, 2020; TechCrunch, 2020). It ended in a way that ought to be taught in every finance class. In December 2020 Uber didn’t sell the division for a fortune. It paid to be rid of it, handing the unit to a startup called Aurora and tipping in four hundred million dollars of its own money for a minority stake (TechCrunch, 2020). The thing it had nurtured for years as the key to its future had become the thing its own investors most wanted gone.


In the last piece I described a UK firm whose AI project drifted from a £400k business case to £1.2m, and whose post-mortem turned up the damning line: nobody owned the context architecture. It had fallen between the CIO and the CDO. Uber is that same failure with three more zeros on it. Scale doesn’t change the mechanism. A cost nobody specifically owns, and nobody is projecting forward, drifts. And in a consumption cost, drift compounds, quietly, until it’s the line that decides your year.


AI cost forecasting is the new discipline

Last time I recommended token budgets and monthly variance reports. I stand by that, as far as it goes. Variance reporting is the rear-view mirror, though. It tells you what already leaked. The discipline the next eighteen months actually demand looks forward, and it’s genuinely new, because finance is being asked to model a cost unlike anything it’s handled before: a unit price falling by an order of magnitude a year, a volume rising faster than that, and a total that tracks how useful, and how used, your AI has become.


What it looks like in practice is less exotic than it sounds. The whole enterprise AI conversation has shifted from raw capability to pounds per outcome, and the sooner your board makes the same move the better. Stop reporting AI as a single monthly total. Start expressing it per unit of work: the cost to handle an enquiry, draft a quote, process a claim, resolve a ticket.


Take a fifty-person services firm running three AI workflows, by way of illustration. A chatbot for customer enquiries. An assistant that drafts quotes. An automation that matches invoices. The chatbot costs fractions of a penny per enquiry on a light model. The quote assistant, on something more capable, runs to a few pence per quote. The invoice automation, a reasoning model grinding through each document in several steps, might cost twenty to forty pence per invoice. Add it up and the firm spends maybe four or five hundred pounds a month, a number easily lost inside a cloud bill. Express it per outcome, and the picture sharpens. The enquiry costs next to nothing. The quote costs pennies. The invoice line is the one to watch, because it scales straight off your sales volume and runs on your most expensive model. That’s the forecast you need, and it’s nothing a business that already knows its margins can’t build.


Project that line as volume times a falling unit price, and run it under more than one scenario, because the gap between “usage doubles” and “usage grows tenfold” is the gap that decides whether your margins hold. Then price your work against it, and set a ceiling per workflow so a runaway agent shows up on a dashboard long before it shows up in the accounts.


Two objections always come back. The first: the cost is falling so fast there’s no point forecasting it. That’s the Jevons trap dressed up as a strategy, and it’s exactly how you end up explaining a surprise to your board. The second: this is finance’s job, not the board’s, cost control rather than strategy. It’s not. The moment token cost lands in your cost of goods sold, it starts driving your pricing, and pricing is the most strategic lever you own. A rival who can forecast their per-unit AI cost can push it into thin-margin work you can’t profitably touch, and hollow out your advantage from below. That’s a competitive question. It belongs in the boardroom, not buried in a finance pack.


AI token budgets that survive contact with reality

A token budget by department is easy to set up and easy to kid yourself with. The metering is the simple bit. Most providers now hand you project-level keys and spend dashboards, and a thin gateway in front of them can tag every call to a cost centre and feed monthly variance to the CFO, exactly as I argued last time. The hard part is what happens when a number meets a human being with work to do.


Hard caps that stop someone mid-task don’t control cost. They move it somewhere you can’t see. A developer who hits a ceiling at four o’clock on a deadline doesn’t down tools and file a variance report. They reach for a personal account, a free tier, and a different tool nobody is metering, and you’ve swapped a visible line on a dashboard for an invisible data-governance problem. For people, caps should work like alerts and escalation points, not circuit breakers. Show them their own consumption, flag the outliers, and have the conversation. Save the hard ceilings for the automated workflows that can bolt on their own.


That points to the rule I’d put at the centre of any token budget: cap the product, not the people. Take a developer on a mid-five-figure salary who spends a couple of hundred pounds of tokens a month and gets back a fifth of their week. That’s one of the most profitable trades in the business, and capping it hard to claw back the two hundred quid is penny-wise and pound-foolish in its purest form. The place that genuinely needs a hard ceiling is the high-volume, customer-facing automation inside your cost of goods sold, where consumption climbs in lockstep with sales and one badly-behaved agent can chew through a margin while nobody’s looking. Budget tight where the cost-to-value ratio is poor. Stay out of the way where it’s obviously good. Most firms manage the exact reverse: petty limits on their best people and the expensive automation left to run unwatched.

One caveat, to keep it honest. That logic holds while a knowledge worker’s token cost stays small against their salary, and heavy use of agents or deep-research tools can break it quietly. Re-test it every year. Don’t set it once and forget.


Buy, rent, or build: the cost shape behind the model

Every leader I speak to wants to know which model to back, and nearly all of them frame it as a question of price per token. Wrong frame. The options worth comparing have completely different cost shapes and completely different risk profiles, and the decision is about matching those to the job, not chasing the headline rate.

A hosted frontier API (the cloud service you rent from a provider like OpenAI, Anthropic or Google) is pure variable cost. You’re live tomorrow, you carry no infrastructure; you pay for what you use. The trade is that the meter runs forever, and someone else’s pricing committee sets your unit cost.


Switch to a cheap open API like DeepSeek and you slash that per-token rate. This is where the most expensive mistake in the whole field hides in plain sight. The saving is printed on the invoice. The risk isn’t. DeepSeek’s own privacy policy stores user data on servers in China, and within months of its January 2025 launch a long line of governments moved against it. The US Navy and NASA blocked it. New York, Texas and Virginia banned it from official devices. A bipartisan bill went before Congress, and a congressional committee branded it a profound threat to national security (U.S. House of Representatives, 2025; BankInfoSecurity, 2025). Australia, Italy, Ireland, Taiwan, South Korea and the Czech Republic followed, the Czech cyber agency warning that Chinese law can compel a company to hand its data to the state (Foundation for Defense of Democracies, 2025). For a UK business the exposure isn’t state secrets. It’s ordinary, and it’s serious: your contracts, your financials, your customer records, your intellectual property crossing a border into a jurisdiction you don’t control. Weigh that invisible risk against the visible saving honestly, because in most regulated or data-sensitive settings the risk dwarfs it.


There’s a distinction here that settles most of the argument, and almost nobody draws it cleanly. The risk attaches to the hosted Chinese service, where your data travels to China. It doesn’t attach to the open model weights themselves. DeepSeek’s weights are open. Run them on your own infrastructure or a UK-region host you trust, and the data-to-China problem largely vanishes while the cost and capability gains stay. That’s not a free pass. There are still open questions about what behaviours are baked into any model’s weights, so you sandbox and test rather than assume. It does, though, turn a blunt “Chinese model, too risky” into the far more useful question of where the thing actually runs.


Which brings you to the third shape. A local or self-hosted open model, Llama, Mistral, or DeepSeek’s own weights, flips the economics on its head: high fixed costs and near-zero marginal cost. The figures from the last piece still apply: an eight-GPU node at roughly £10,000 to £12,000 a month whether it serves one query or a million. Below a certain volume that’s far dearer per use than just renting an API. Above your break-even it’s dramatically cheaper, with no vendor lock-in and your data kept at home. The catch is utilisation. Idle silicon is the most expensive AI you can buy, so this only pays if you keep those GPUs genuinely busy.


The capability objection, that the cheaper or open models can’t match the best, is fading faster than most boards realise. Stanford’s index found the gap between the leading open and closed models narrowing from eight per cent to under two per cent on some benchmarks in a single year (Stanford HAI, 2025), and DeepSeek’s arrival was the moment the industry accepted that frontier-level reasoning no longer needs frontier-level spend. Paying top dollar for the most expensive model, by default, isn’t the safe choice it used to look.


So it’s a portfolio decision, not a single pick. Rent a variable API for spiky, low-volume, experimental work. Build or self-host for high-volume, steady-state, or sensitive workloads. Route the cheap and the confidential to a model you control; send only the genuinely hard reasoning to a frontier provider. Match the cost shape and the risk to the use case, and the question of which model is “best” mostly answers itself.


FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

The cheapest token can carry the most expensive risk. Before anyone adopts a model on price, ask two questions: where does our data physically go, and who can be compelled to hand it over? A saving on the invoice is no saving at all if it parks your contracts and customer records in a jurisdiction you don’t control.


What AI cost management means for a Scottish SME

The last piece spoke mostly to the boards of listed companies and the governance machinery around them. Most of my work is at the other end of the market. Owner-managed firms across Scotland that know their numbers cold, can tell you the margin on every job and the running cost of every van on the road, and have no intention whatsoever of standing up a Context Governance Committee. They don’t need one. For them the whole discipline shrinks to a single number they can own. If you’re wondering whether any of this applies to you yet, the threshold is lower than you’d think. If AI is doing real work anywhere in your business, one chatbot or one automated report, it already does.

What does it cost you to serve one customer? Now there’s a model somewhere in the loop, and where does that number go as you lean on AI harder? Answer that, forecast it, and you’ve got everything the FTSE 350 governance apparatus is straining to produce without the apparatus. Fail to answer it, and that’s the line missing from your accounts, the one the meter fills in whether you’re watching or not.


The trajectory from the last piece holds, only sharper. AI keeps getting cheaper per use and more woven into everything you do, which means you’ll use far more of it, which means a cost that grows as you succeed instead of shrinking as you economise. The firms that come through this won’t be the ones with the biggest AI budgets or the boldest adoption. They’ll be the ones who can tell you, without flinching, what a unit of their work costs to produce now that a machine helps produce it and what it’ll cost when they’re running ten times the volume. Draw that line on your P&L before the meter draws it for you. Everything else is hype.


FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

Three questions for your next meeting. What does it cost us to serve one customer now there’s AI in the loop? Who owns and forecasts that line? And what does it become if our usage grows tenfold while the price per use keeps falling? If the room can’t answer, you aren’t forecasting AI. You’re hoping.


Frequently asked questions

What is AI cost management?

AI cost management means treating the cost of running AI, the tokens a model consumes every time it works, as a forecastable line in your accounts rather than a vague software expense. It belongs in your cost of goods sold, and it drives your pricing. Most businesses don’t track it yet.


How do you forecast AI token costs?

Express the cost per unit of work, per enquiry, per quote, per claim, then project it as volume times a falling unit price under more than one usage scenario. The trap is assuming a falling price per token means a falling bill. Volume usually rises faster than price drops.


Is it safe to use DeepSeek or other Chinese AI models in business?

The hosted DeepSeek service stores user data on servers in China, which has led a long list of governments to restrict it, so it’s a poor fit for sensitive or regulated work. The open weights, run on your own infrastructure, are a different question. The risk is about where your data goes, not the model itself.


Should a Scottish SME build or buy its AI?

Match the cost shape to the job. Rent a hosted API for spiky, low-volume work. Self-host an open model for high-volume or sensitive workloads where you can keep the hardware busy. Most firms end up with a mix.


How can a 360 Strategy AI consultant help with AI cost management?

We help Scottish SMEs put AI cost on the P&L: a per-outcome cost model, a forecast you can price against, and a build-or-buy decision matched to your data and your volume. Talk to an AI consultant in Scotland.


Where to start

If you can’t yet say what a unit of your work costs to produce now there’s AI in the loop, that’s where to start. 360 Strategy provides AI consulting in Scotland, helping businesses put AI cost on the P&L and forecast it. That’s the work.


Mark Evans MBA is founder of 360 Strategy, a marketing strategy and AI consultancy based in Scotland.


References and sources available on request.

Comments


bottom of page