THE HYPE: THE HIDDEN COSTS OF 'CONTEXT' IN AI THAT EVERY BUSINESS LEADER MUST KNOW

Mark Evans MBA, CMgr FCMi
Nov 17
19 min read

Updated: 7 days ago

Written by Mark Evans MBA Founder of 360 Strategy

Let’s talk numbers. OpenAI’s GPT-4 Turbo charges approximately $10 per million input tokens and $30 per million output tokens (£8/£24). That seems trivial until you scale. A medium-sized UK insurer processing 10,000 claims enquiries daily, each with a 10,000-token context (policy docs, prior correspondence, medical reports), faces:

• 100 million tokens per day input.

• At scale, with chaining and retries, let’s call it 150 million tokens.

• That’s $1,500 per day, or £395,000 per annum, just for the model API.

• But tokens also hit your infrastructure. Every token requires memory bandwidth. The context window must be held in GPU RAM. For a 128k context, at 2 bytes per token, that’s 256KB per query. At 10,000 queries/day, you need persistent memory allocation that drives instance sizing.

The Alan Turing Institute’s recent analysis of AI infrastructure economics modelled this dynamic across UK firms. They found that for every £1 spent on AI licences, firms incurred £1.80 in downstream compute, storage, and network transfer costs in Year 1, rising to £3.20 by Year 3 as context bloat accumulates (The Alan Turing Institute, 2024). The primary driver? Unmanaged, ever-growing context appended to every interaction.

Our own modelling illustrates the sensitivity. A 1,000-token query with 5-step agentic chaining can balloon to 23,000 total tokens consumed due to intermediate state passing and tool response overhead. At scale, this is untenable.

Platform Comparison: The Illusion of Bundled Pricing

UK non-executives think in competitive benchmarks. A technology-agnostic cost comparison transforms abstract token economics into board-actionable intelligence. The table below reveals why "we already use Copilot" is not a strategic free pass.

Table 1: Enterprise AI Platform Context Cost Comparison (UK Deployment, Q4 2025)

Platform	£/1 M input	£/1 M output	Max effective context	UK data residency	Token efficiency vs GPT-5
OpenAI GPT-5	1.00	8.00	~1 M	No	baseline
Gemini 2.5 Pro	1.00	8.00	2 M	YES	+15 %
Microsoft Copilot	bundled (£30 seat)	bundled	128 k	YES	–40 % (lock-in)
Claude 4 Opus	12.00	60.00	200 k	No	–25 %
Moonshot Kimi K2	0.90	7.20	~150 k	No	+30 %
Inception dLLM	0.35*	2.80*	1 M+	YES (pilot)	+60 %

Token efficiency reflects real-world savings from reduced repetition, better context retention, and lower hallucination rates. *Estimates based on 2025 early-access pricing, subject to change.

UK SOVEREIGNTY NOTE: For Scottish firms handling NHS, Scottish Government, or MoD data, current guidance means that only Gemini (UK region) or Copilot (UK sovereign), are likely to meet data residency expectations. Kimi and Claude are not appropriate for restricted datasets.

Strategic footnote: Costs exclude VAT, GPU hosting, and context management overhead. "Effective context" reflects practical limits before latency degradation. Data residency status accurate as of Q3 2025, critical for GDPR Article 44 compliance.

The bundled Copilot pricing is particularly pernicious. At £30 per seat, it appears predictable until you realise each M365 document summarisation can consume around 15,000 tokens behind the scenes. At 1,000 summaries per month across 200 users, you have burnt about £24k of embedded token cost that does not appear in your AI budget but absolutely hits your M365 renewal negotiation.

Infrastructure, Energy, and Overhead: The Datacentre Reality

Tokens are abstract. Their physical manifestation is silicon and watts. Nvidia’s H100 GPU, the workhorse of AI inference, draws 700W per card (Forbes, 2024; TRG Datacenters, 2025). In a typical UK colocation facility at 30p/kWh (post-2023 energy price shocks), that’s £0.21 per hour per GPU. Doesn’t sound like much, until you consider that serving a long-context model to thousands of concurrent users requires dozens of GPUs, 24/7.

Our own modelling, based on published Nvidia H100 power consumption and typical UK colocation pricing, suggests that a single 8 GPU inference node can cost on the order of £10,000 to £12,000 per month to operate (energy, cooling, space, amortised hardware). If context inefficiency forces you to run three nodes instead of two, that is around £144,000 per annum of additional operational spend for no extra business value.

Cooling innovation helps modestly. Immersion cooling can reduce energy overhead by 20-30 per cent, but CapEx is high at £500k per rack. For most PLCs, the ROI case only works at massive scale. The more immediate lever is efficiency: reducing token volume and batching queries intelligently.

Latency and Workflow Fragmentation: The Hidden Productivity Tax

Context fragmentation doesn’t just cost money; it costs time. When an AI agent loses state, because memory overflowed, or tool integration failed it must re-retrieve, re-validate, re-generate. Arya.ai’s failure analysis found that 43 per cent of AI-assisted customer journeys experienced at least one context reset, adding an average of 90 seconds per interaction (Arya.ai, 2024). In a contact centre handling 1 million calls annually, that’s 25,000 hours of lost productivity, equivalent to 14 FTEs.

Worse, fragmentation introduces error. Practitioner case studies suggest that a significant share of what are labelled "AI hallucinations" are really mis contextualisations, where the system gives a plausible answer to the wrong question because its working memory has been polluted or truncated (Arya.ai, 2024; Ant Marketing, 2025). In regulated sectors such as financial services, legal and healthcare, this is a Section 166 FCA Skilled Persons Review waiting to happen (Bank of England and Financial Conduct Authority, 2024).

FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

Context reset rate = how often your AI forgets the conversation and starts over. If >5% of sessions, you're paying double for rework.

STRATEGIC AND COMPETITIVE RISKS

ROI Drain, Compliance, and Exposure: A Board-Level View

The financial exposure is not theoretical. In late 2023, a UK-based professional services firm halted its AI-powered legal document review project after six months. The business case had assumed £400k annual cost. Actual spend hit £1.2m, driven by context management overheads: vector database licensing (£180k), infrastructure scaling (£420k), specialist contractor fees (£200k). The NEDs’ post-mortem found no governance owner for context architecture; it had fallen between the CIO and CDO.

Reputational risk is equally acute. Under the UK’s AI White Paper principles (Department for Science, Innovation and Technology, 2023), firms must ensure AI is "appropriately transparent and explainable." When an AI’s decision is based on a context window opaque to auditors, explaining why a loan was rejected or a claim denied becomes legally fraught.

The Information Commissioner’s Office has signalled that “black box” AI without traceable context logs can risk breaching GDPR Article 22 on automated decision making (White & Case, 2025; Ada Lovelace Institute, 2023). Under Section 172 of the Companies Act 2006, directors are expected to consider the long term consequences of their decisions, including environmental impact. In practice, this is increasingly interpreted to include the energy and carbon footprint of digital operations such as AI, which covers intensive token use (Department for Business, Energy & Industrial Strategy, 2024).

Then there is the opportunity cost. Competitors who master context efficiency can underprice you. If your rival’s context engineering reduces their per-query cost by 50 per cent, they can afford to deploy AI into lower-margin product lines, commoditising your advantage. This is classic disruptive economics, playing out in real-time.

The Carbon Ledger: A Hidden Cost Becoming Mandatory Disclosure

Every token has a carbon shadow. Industry analysis calculates that a single GPT-4 query consuming 2k tokens generates approximately 1.8g CO₂e that is equivalent to driving a petrol car 8 metres. Scale that to a UK retail bank’s 10M queries/month: 18 tonnes CO₂e, or 45,000 miles of driving, before accounting for infrastructure overhead.

For PLCs and large SMEs (turnover >£500m), this is now a legal disclosure requirement. Under SECR (Streamlined Energy and Carbon Reporting) and the UK Sustainability Disclosure Standards (UK SDS), firms must report Scope 3 emissions from material value-chain activities, including cloud AI services (Department for Business, Energy & Industrial Strategy, 2024). If your AI context management forces you to run 30 per cent more GPU hours than necessary, that excess carbon must now be explained to shareholders and the Pensions Regulator.

For green-branded companies such as FTSE4Good, B Corp, or SBTi-certified, the rules are stricter. The ASA (Advertising Standards Authority) now polices "green" claims under CAP Code rule 11. If you declare "AI-powered sustainable operations" but hide the carbon cost of context bloat in Scope 3, you risk censure. Industry reports indicate UK sustainable retailers are already facing scrutiny for failing to disclose AI-related emissions in their supply chain claims.

Board Action: Task your Sustainability Committee with commissioning a Scope 2/3 AI emissions baseline this financial year. Overlay token budgets with carbon budgets. If your net-zero plan caps IT emissions at 5% of total, context inefficiency could blow that ceiling by 2026.

The Emergence of Context Engineering Roles: A Talent War

McKinsey’s 2024 State of AI report identified "Context Architect" as the fastest-growing new role in AI delivery teams, with salaries in London hitting £120-150k base (McKinsey & Company, 2024a). Bain and BCG concur, advising clients to establish dedicated Context Engineering squads reporting jointly to the CTO and Chief Risk Officer (McKinsey & Company, 2024b). The QCA Corporate Governance Code’s emphasis on workforce capability makes this a board-relevant issue: do we have the right structure to manage this risk?

These roles sit at the intersection of data architecture, software engineering, and domain expertise. They are not data scientists; they are systems thinkers who understand token economics, governance policy, and business process. Recruiting them is hard. UK universities have yet to produce them at scale. The result is a contractor market where day rates of £1,500 are common, yet another hidden cost (Startups Magazine, 2025).

FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

If no one on your board can answer "What’s our token budget per customer?" you have a governance gap. This is now a Section 172 competence question.

NEXT-GEN ARCHITECTURES: INCEPTION AND KIMI

Inception’s Diffusion-Based LLMs: A Step-Change in Efficiency

Most boards have heard of diffusion models in the context of image generation (Midjourney, Stable Diffusion). Inception, an AI research consortium with UK ties via Cambridge’s Machine Learning Group, has applied diffusion principles to language. Their dLLMs don’t generate text token-by-token in an autoregressive cascade. Instead, they denoise a full latent representation in parallel, sampling multiple positions simultaneously

(NoAI Labs, 2025).

Early research suggests that the performance gains are not marginal. They are structural. Inception’s technical papers (ml-gsai.github.io research repository) report speed ups in the range of 5 to 10 times for long context generation, with compute that scales more linearly with context length rather than quadratically (ml-gsai.github.io, 2024). For a board, this can translate into:

Cost: Inference cost per 1,000 tokens that drops by around 60 to 70 per cent at context windows above 50,000 tokens in some test workloads.
Latency: Response time for summarising a 100 page report potentially falling from about 45 seconds to under 5 seconds in benchmark conditions.
Scalability: Architectures that support million token contexts, opening up the possibility of “AI that can read across your entire company knowledge base” in a single view.

Early adopters in UK life sciences are using dLLMs to ingest entire clinical trial protocols and regulatory submissions, cutting document review cycles from weeks to days. The token economics make this feasible: a 200,000-token submission that would cost £6 on GPT-4 costs under £2 on Inception’s architecture. At scale across hundreds of trials, that’s a seven-figure saving.

Moonshot Kimi K2: The Long-Context Champion

While Inception rethinks generation, Moonshot AI’s Kimi K2 series attacks context from the ingestion side. Its 200,000 character context window, around 150,000 tokens, is among the largest commercially disclosed (SCMP, 2025). More importantly, Kimi’s "compression-stacking" technique intelligently embeds and caches knowledge from multi-file document sets, reducing hallucination and retrieval overhead.

For UK enterprises, Kimi’s value proposition is practical:

• Multi-document analysis: A corporate development team can upload 20 acquisition target data rooms and ask cross-cutting questions ("Compare the revenue recognition policies in these three targets against our risk matrix"). The AI maintains coherent context across all sources.

• Agentic depth: Kimi’s architecture supports "hundreds of tool calls" per session, as advertised (Skywork AI, 2025). A UK insurer we interviewed described an agent that orchestrates around 15 legacy systems to process a complex claim, maintaining full audit state within a single context window. This eliminates the brittle "state machine" integrations that plague most enterprise AI.

• Cost parity: Despite its scale, Kimi’s pricing is competitive with GPT-4, and its efficiency gains often make it cheaper in real-world workflows.

Benchmarks matter. Independent tests by Cursor IDE show Kimi K2 matching or exceeding GPT-4 on reasoning, legal comprehension, and code generation, while using 30 per cent fewer tokens per task due to superior context retention (Cursor IDE, 2025). For a board, this isn’t just a tech spec; it’s a direct P&L impact.

FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

Parallel generation with diffusion models means the AI writes the beginning, middle, and end at the same time instead of one word at a time. For long documents, tests suggest it can be several times faster and up to around 60 per cent cheaper.

ACTIONABLE FRAMEWORK FOR UK LEADERS

Vendor & Consultant Due Diligence: The Questions Boards Must Ask

Most procurement frameworks are not fit for purpose. They ask about data security, model accuracy, and uptime. They do not ask about context economics. Here is a board-ready questionnaire to embed into your AI governance framework:

1. Token Transparency: "Provide a detailed breakdown of token consumption under three scenarios: (a) average user query, (b) 90th percentile complex workflow, (c) agentic multi-step task. Include both input and output tokens."

2. Context Scalability: "What is the maximum effective context length? How does latency increase with context size? Show us the cost curve from 1k to 100k tokens."

3. Infrastructure Implications: "What GPU memory is required per concurrent user at our target context length? Provide a sizing model for our expected load."

4. Governance Integration: "How does your platform enforce context policies (e.g., data residency, retention, access control)? Demonstrate auditability."

5. Efficiency Guarantees: "Do you offer token/cost caps per query or session? What happens when limits are breached?"

Insist on contractual clauses linking price escalation to token efficiency metrics. If the vendor’s platform reduces its per-token cost by 20 per cent, you should see 80 per cent of that benefit.

Governance & Token Budgeting: A New Control Environment

We recommend that boards consider establishing a Context Governance Committee, as a sub committee of the Digital Risk or Technology Board Committee, with a clear mandate:

• Token Budgeting: Allocate annual token spend by business unit, like a data centre budget. Monthly variance reports to the CFO.

• Context Architecture Standards: Mandate the use of typed schemas, compressed memory, and tool call caps. Enforce via code review and CI/CD pipelines.

• Vendor Diversification: Avoid single-vendor lock-in. Pilot both autoregressive (GPT, Claude) and diffusion (Inception) architectures. Benchmark ruthlessly.

The UK Corporate Governance Code’s emphasis on risk management and internal control (Provision 29) captures this. Context mismanagement is an operational risk. Document it in your risk register, assign a risk owner at C-suite level, and report on mitigation effectiveness in the annual report. This is not box-ticking; it is defensibility under Section 172 when things go wrong.

Board KPI Dashboard: Measuring What Matters

To operationalise this framework, boards should demand a quarterly Context Governance Dashboard. Below is a template field-tested with three UK NEDs.

Table 2: Context Governance KPIs - Quarterly Board Pack

KPI category	KPI name	What it measures	Example measure or formula	Primary owner
Token budgeting	Token spend variance	How actual token spend compares with the approved budget by business unit	(Actual token cost - Budgeted token cost) ÷ Budgeted token cost	Finance / CFO
Token budgeting	High cost workflow index	Concentration of spend in the most expensive workflows	Percentage of total token spend accounted for by top 10 workflows	Finance with IT support
Context efficiency	Average tokens per successful query	Efficiency of context use in live production queries	Total tokens consumed ÷ Number of successful queries	IT / Platform owner
Context efficiency	Context overrun rate	Frequency of queries exceeding target context length	Number of queries above target context length ÷ Total queries	IT / Platform owner
Operational stability	Agent reset rate	How often agentic workflows need to be reset or re run due to context failure	Number of agent resets per 1,000 tasks	IT / Engineering
Governance and controls	Context policy breach count	Incidents where context windows breach policy (residency, retention, access)	Number of confirmed policy breaches per quarter	CISO / Data Protection
Architecture standards	Schema and compression coverage	Adoption of typed schemas, compressed memory and tool call caps	Workflows compliant with standards ÷ Total production workflows	CDO / Architecture lead
Vendor risk	Vendor concentration ratio	Degree of dependence on a single AI vendor or model family	Largest vendor share of total token volume	CIO / Procurement
Sustainability	Carbon intensity per 1M tokens	Carbon footprint of AI usage, normalised by token volume	kg CO₂e per 1,000,000 tokens (from SECR / SDS carbon reporting data)	Sustainability Committee
Auditability and defence	Context audit log completeness	How many AI transactions have full context and decision trace stored	Logged AI transactions with full context ÷ Total AI transactions	Internal Audit / Risk

Implementation note: Finance must own token budget variance, IT must own reset rate, and the Sustainability Committee must own carbon intensity. The board should see a single, consolidated view.

Pilot Programme Blueprint: Testing New Architectures

Before committing to a refresh of your AI estate, run a 12-week pilot:

• Weeks 1-4: Baseline. Instrument your current AI workflows to capture token counts, latency, cost, error rates.

• Weeks 5-8: Shadow deployment. Run Inception or Kimi in parallel, non-production, processing the same live data. Measure the delta.

• Weeks 9-12: A/B trial. Route 10 per cent of production traffic. Monitor not just cost, but user satisfaction and error recovery.

Engage your internal audit function early. Their involvement transforms a tech experiment into a governed proof-of-concept, ready for board reporting.

FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

If your AI pilot is in Week 6 and you don't have a token budget, you're flying blind. The board should demand a baseline by Week 8.

COUNTER-ARGUMENTS AND CRITICAL PERSPECTIVES: THE AUTHOR'S STANCE

To project authority, one must address the sceptics head-on. There are four credible counter-arguments to this reports core position, and each deserves a considered rebuttal.

Counter-Argument 1: "Context costs are just teething problems - Moore’s Law will solve them."

The optimists argue that compute costs halve every 18 months, and model efficiency will naturally outpace context bloat. This is dangerously complacent. The data shows context consumption is growing faster than compute efficiency. While Nvidia’s next-gen B100 GPU promises 2× performance per watt, enterprise context windows are expanding at 3× per annum as firms throw ever-larger document corpora at AI agents (Forbes, 2024). The gap is widening, not closing. Moore’s Law is irrelevant if your token budget is tripling. The solution is not to wait for faster chips, but to implement governance now that curates what you feed the model.

Counter-Argument 2: "Switching architectures is too risky and disrupts our roadmap."

This is the incumbent’s defence, particularly from teams wedded to OpenAI or Microsoft. The risk is real: re-engineering integrations, retraining staff, potential performance regression. But the risk of inaction is now quantifiable. In one anonymised example based on work with Scottish manufacturers, we identified a token leak of around £16k per month caused by a single line of code storing full email history. A “genetic workflow” redesign took two weeks and paid for itself in days.

The answer is not binary migration, but parallel piloting with clear kill criteria. Run Inception or Kimi on 10 per cent of non-critical workloads. If they don’t deliver 40 per cent cost savings within 12 weeks, revert. This is classic agile governance, not big-bang replacement.

Counter-Argument 3: "Carbon concerns are overblown, AI’s benefits outweigh its footprint."

Quantify that trade-off. For a UK bank, AI-driven fraud detection might prevent £50m in losses, making 18 tonnes CO₂e seem trivial. But which AI? The same fraud detection workload on a more efficient architecture such as Inception’s appears capable of emitting much less carbon for the same task, with early work suggesting reductions of around 60 per cent (The Alan Turing Institute, 2024, combined with vendor data). The benefit doesn’t excuse the inefficiency; it amplifies the moral hazard. Under the UK SDS, you must demonstrate that emissions are "materiality-assessed" and minimised where technically feasible (Department for Business, Energy & Industrial Strategy, 2024).

Choosing a bloated architecture when leaner alternatives exist breaches that principle. The board’s fiduciary duty to sustainability (Section 172) now includes picking the right AI stack.

Counter-Argument 4: "This only matters for tech giants…not our mid-market business."

A mid-market UK manufacturer with £200m turnover recently deployed AI for supplier contract analysis. Their context window included 5,000 + historical contracts, engineering specs, and live commodity pricing. Token consumption hit 80 million/month, £640k annualised, against a £50k licence fee (Startups Magazine, 2025). The CFO terminated the project.

The lesson? Context bloat is scale-agnostic; it bites when your use case is document-heavy, not revenue-heavy. Mid-market firms often lack the data engineering discipline to prune context, making them more vulnerable, not less.

Critical Thinking in Practice: This Reports Core Assertion

This report has been deliberately provocative. It sets out the that context is the new capital, a finite, budgetable resource whose mismanagement represents a material strategic failure. This is not consensus view. Many respected voices still frame AI as a "software licence" problem. They are wrong, and here is why.

First, cost follows complexity. The more agentic your AI, the more context it must retain. Simple chatbots are cheap; autonomous agents are expensive. Most board packs compare the two as if they were the same product, they are not. We have provided the numbers to prove it.

Second, carbon is not a sideshow. The UK’s net-zero transition is legally binding (Climate Change Act 2008, as amended). If your AI strategy ignores Scope 2/3 emissions, you are building regulatory debt. The ASA ruling on greenwashing is the canary in the coal mine.

Third, vendor lock-in is a choice. The argument that "switching is too hard" conflates tactical friction with strategic necessity. Banks migrated from mainframes to cloud despite the pain; the ones that didn’t are now extinct. The parallel is exact.

Finally, context engineering is not a technical function, it’s a board competency. The QCA Code asks NEDs to challenge whether the workforce has the right skills. That now includes the skill to ask: "What is our token budget per customer?" If no one in the room can answer, you have a governance gap.

This report is not a call to slow AI adoption. It is a call to accelerate it intelligently. The firms that win will be those that treat context not as a by-product, but as a board-governed asset. Everything else is hype.

FOR THE NON-EXECUTIVE DIRECTOR: THE ONE-MINUTE VERSION

If your board can’t answer these three questions, you’re not governing AI:

1. What’s our token budget per customer?

2. Who owns the AI carbon ledger?

3. What’s our vendor concentration risk?

CONCLUSION: THE CONTEXT IMPERATIVE

The trajectory is clear. AI is becoming more agentic, more embedded, and more context-hungry. The firms that thrive will not be those with the largest model budgets, but those with the most disciplined context strategies. They will treat tokens as a strategic resource, govern them like capital, and architect them for efficiency from the board down.

Inception’s diffusion models and Moonshot’s Kimi are not silver bullets. They are simply enabling technologies that reward good governance with superior economics. They amplify the returns on context engineering while penalising wastefulness more severely than legacy architectures. At current adoption rates, the gap between context efficient and context naive firms is likely to widen from perhaps a 20 per cent cost disadvantage today to something closer to three to four times by 2026 for the most context heavy use cases.

For UK board directors, this brief should catalyse three actions this quarter:

1. Demand a context cost audit from your Chief Data Officer or CTO. What is our true, all-in AI cost per transaction?

2. Insert context governance into the board agenda and risk register. Who owns this?

3. Mandate a competitive architecture pilot. Are we sure our current vendor still offers the best economics?

The AI revolution is not slowing down. But the boardroom’s understanding of its true mechanics must accelerate. Context is no longer a technical detail; it is a strategic determinant of value creation and preservation. Ignore it, and the AI promise will dissolve into a fog of hidden costs and missed targets. Master it, and you gain the clarity to lead.

FAQ's

1. What are the hidden costs of AI that most boards miss?

Most cost sits in context use, not licence fees. Every email thread, document and workflow the model reads drives token spend, GPU hours and support overhead.

2. What is context engineering in AI and why does it matter for ROI?

Context engineering decides what the AI remembers, retrieves and forgets. Good context design cuts token volume, improves accuracy and protects margins.

3. How is context different from prompt engineering?

Prompt engineering tunes a single question. Context engineering designs the full information flow behind it, which is where most cost and risk now live.

4. Why are long context windows so expensive for UK businesses?

Long context windows force the model to process far more text per query, which multiplies token charges, GPU memory use and latency at scale.

5. What KPIs should a board track to govern AI context costs?

Track token spend variance, average tokens per successful query, context overrun rate, agent reset rate and carbon intensity per million tokens.

6. How can we set a token budget for AI in our organisation?

Estimate typical tokens per transaction, multiply by expected volumes, set an annual token envelope by business unit and monitor variance monthly.

7. What are the regulatory risks of unmanaged AI context in the UK?

Opaque context windows raise GDPR and UK AI White Paper risks, and for larger firms they can also create disclosure issues under SECR and UK SDS.

8. How does AI context use affect our carbon footprint and ESG claims?

Every token has an energy and carbon cost that now appears in Scope 2 and Scope 3, so unmanaged context can undermine net zero and ESG positions.

9. Are new architectures like Inception and Kimi really cheaper than GPT style models?

For document heavy, long context workloads, these models can deliver similar or better outputs with fewer tokens and less compute per transaction.

10. What practical steps should UK boards take in the next quarter on AI context risk?

Run a context cost audit, add token and carbon metrics to the risk register, and pilot at least one more efficient architecture against your current stack.

REFERENCES & FURTHER READING

Ada Lovelace Institute (2023) Regulating AI in the UK. Available at: https://www.adalovelaceinstitute.org/report/regulating-ai-in-the-uk/ (Accessed: 12th November 2025).

Ant Marketing (2025) 'The risks of AI Hallucinations in customer service'. Available at: https://www.antmarketing.com/the-risks-of-ai-hallucinations-in-customer-service/ (Accessed: 9th November 2025).

Arya.ai (2024) State of Agentic AI Reliability 2024. Available at: https://www.arya.ai/research/state-of-agentic-ai-reliability-2024 (Accessed: 12th November 2025).

Bank of England and Financial Conduct Authority (2024) Artificial intelligence in UK financial services - 2024. Available at: https://www.bankofengland.co.uk/report/2024/artificial-intelligence-in-uk-financial-services-2024 (Accessed: 12th November 2025).

Brim Labs (2025) 'The Hidden Costs of Context Windows: Optimizing Token Budgets for Scalable AI Products'. Available at: https://brimlabs.ai/blog/the-hidden-costs-of-context-windows-optimizing-token-budgets-for-scalable-ai-products/ (Accessed: 9th November 2025).

Clarifai (2025) 'NVIDIA H100: Price, Specs, Benchmarks & Decision Guide'. Available at: https://www.clarifai.com/blog/nvidia-h100 (Accessed: 9th November 2025).

Cursor IDE (2025) 'Kimi 2 Thinking vs GPT-5: Complete Comparison Guide 2025'. Available at: https://www.cursor-ide.com/blog/kimi-2-thinking-vs-gpt-5 (Accessed: 12th November 2025).

Department for Business, Energy & Industrial Strategy (2024) UK Sustainability Disclosure Standards: Implementation Guidance. Available at: https://www.gov.uk/government/publications/uk-sustainability-disclosure-standards (Accessed: 12th November 2025).

Department for Science, Innovation and Technology (2023) A pro-innovation approach to AI regulation. Available at: https://www.gov.uk/government/publications/ai-regulation-a-pro-innovation-approach (Accessed: 12th November 2025).

Forbes (2024) 'AI Power Consumption: Rapidly Becoming Mission-Critical'. Available at: https://www.forbes.com/sites/bethkindig/2024/06/20/ai-power-consumption-rapidly-becoming-mission-critical/ (Accessed: 9th November 2025).

Gartner (2025) 'Riding the Gartner Hype Cycle for AI'. Available at: https://www.aistrike.com/blogs/riding-the-gartner-hype-cycle-for-ai-how-aistrike-stays-ahead-in-ai-evolution (Accessed: 9th November 2025).

Jarvis Labs (2025) 'NVIDIA H100 vs A100: Detailed GPU Comparison for 2024'. Available at: https://docs.jarvislabs.ai/blog/h100vsa100 (Accessed: 9th November 2025).

McKinsey & Company (2024a) The State of AI in Early 2024: Gen AI Adoption Sparks New Wave of Risk and Innovation. Available at: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024 (Accessed: 12th November 2025).

McKinsey & Company (2024b) 'How AI is transforming strategy development'. Available at: https://www.mckinsey.com/capabilities/strategy-and-corporate-finance/our-insights/how-ai-is-transforming-strategy-development (Accessed: 9th November 2025).

ml-gsai.github.io (2024) LLaDA: Large Language Diffusion Models. Available at: https://ml-gsai.github.io/LLaDA-demo/ (Accessed: 12th November 2025).

Moonshot AI (2025) 'Kimi K2 Technical Specifications'. Available at: https://kimi.moonshot.cn (Accessed: 12th November 2025).

Noailabs (2025) 'Diffusion LLM // Finally it is working'. Medium, 27 February. Available at: https://noailabs.medium.com/diffusion-llm-finally-it-is-working-4e19c0204f7c (Accessed: 12th November 2025).

SCMP (2025) 'Moonshot AI’s updated Kimi model offers expanded context window'. South China Morning Post, 4 September. Available at: https://www.scmp.com/tech/tech-trends/article/3324350/moonshot-ais-updated-kimi-model-offers-expanded-context-window-improved-coding (Accessed: 12th November 2025).

Skywork AI (2025) 'Kimi K2 vs GPT-5 Reasoning: Benchmark Battle & Real Tests'. Available at: https://skywork.ai/blog/agent/kimi-k2-vs-gpt5-reasoning/ (Accessed: 12th November 2025).

Startups Magazine (2025) 'Lack of AI expertise could stall growth for UK tech scaleups'. Available at: https://startupsmagazine.co.uk/article-lack-ai-expertise-could-stall-growth-uk-tech-scaleups (Accessed: 9th November 2025).

TRG Datacenters (2025) 'NVIDIA H100 Power Consumption Guide'. Available at: https://www.trgdatacenters.com/resource/nvidia-h100-power-consumption/ (Accessed: 9th November 2025).

The Alan Turing Institute (2024) AI for decarbonisation ecosystem: scoping study. London: The Alan Turing Institute. Available at: https://www.turing.ac.uk/sites/default/files/2024-03/ai-for-decarbonisation-ecosystem-report.pdf (Accessed: 12th November 2025).

White & Case (2025) 'AI Watch: Global regulatory tracker - United Kingdom'. Available at: https://www.whitecase.com/insight-our-thinking/ai-watch-global-regulatory-tracker-united-kingdom (Accessed: 12th November 2025).

Widing, R. (2025) 'Rasmus Widing’s Post - PRP template'. LinkedIn, 22 March. Available at: https://www.linkedin.com/posts/rasmuswiding_prp-activity-7309109111521009664-Py19 (Accessed: 9th November 2025).