Back to Insights
AI StrategyMarch 2026

The Dual Curve: Why AI Costs Are Falling Faster Than You Think

Moore's Law is predictable. Algorithmic efficiency is relentless. Together they are making AI cheaper faster than anyone expected. A case study in compounding cost curves.

MM

Michael Murray

Managing Partner, Abeba Co

Share

The AI industry has a forecasting problem, and it is working in your favor.

Most executives evaluating AI adoption are watching one cost curve: hardware. Moore's Law tells us that computing power roughly doubles every two years for the same price. That is the curve everyone knows. It is predictable, well-documented, and already priced into most strategic plans.

But there is a second curve that almost no one outside the research community is tracking, and it is moving faster.

Wright's Law Meets Machine Learning

In 1936, Theodore Wright observed that the cost of manufacturing airplanes decreased by a consistent percentage every time cumulative production doubled. The insight was simple but powerful: experience compounds. The more you build, the better you get at building, and the savings are not linear. They are exponential.

Wright's Law has since been validated across industries, from solar panels to lithium-ion batteries to semiconductor manufacturing. Something analogous is happening in AI, though the mechanism is different. Instead of factory-floor learning curves, the global AI research community is producing compounding algorithmic efficiency gains through directed research and open collaboration.

Every month, researchers publish new techniques that make AI systems faster, cheaper, and more efficient on the exact same hardware. These gains are not as predictable as a manufacturing learning curve. They arrive in bursts, driven by breakthroughs rather than production volume. But the cumulative effect is strikingly similar: costs falling faster than hardware improvements alone can explain.

A Case Study: TurboQuant
Compression
16-bit to 3-bit

Key-value cache compression drops memory requirements by more than 5x.

Performance
Up to 8x

The same NVIDIA H100 hardware can serve the same workloads materially faster.

Tradeoff
Zero

Google reports no accuracy loss, no retraining, and no hardware change.

This week, Google Research published TurboQuant, a compression algorithm that demonstrates exactly how powerful the algorithmic curve has become.

The technical details matter less than the result: TurboQuant compresses a critical component of large language models, the key-value cache, essentially the system's working memory, down to 3 bits per value. The standard is 16 bits. That is a reduction of more than 5x, and in practice Google reports 6 to 10x effective compression.

The remarkable part is what it does not sacrifice. Zero accuracy loss. No retraining required. No new hardware. The same NVIDIA H100 GPUs that companies are already running achieve up to 8x faster performance on the same workloads, purely from a better algorithm.

A practical caveat: TurboQuant's open-source implementation is not yet publicly available, and production deployment requires real integration engineering, latency profiling under live traffic, and ecosystem maturity. Community experiments in MLX report around 5x compression with 99.5% quality retention, with open-source code widely expected around Q2 2026. The benchmark results are real. The path from benchmark to production is not instant. That distinction matters for planning purposes.

TurboQuant also applies to vector search, the technology that powers semantic search and retrieval systems across the AI stack. Same story: dramatically smaller indices, faster queries, superior accuracy, and no dataset-specific tuning required.

Think of it as Wright's Law applied to research output rather than factory floors. The cumulative investment of the global AI research community is producing compounding efficiency gains that have nothing to do with the next chip generation. The trajectory is less predictable than a manufacturing learning curve, but the direction is unmistakable.

The Dual Curve Effect

When you stack the algorithmic curve on top of the hardware curve, the combined rate of cost decline is steeper than either one alone.

What Changed in the Last 18 Months
Inference costs

Roughly 10x lower across major API providers.

Context windows

Expanded from 8K to 128K+ while unit costs kept falling.

Quantization

Moved from 8-bit to 4-bit to 3-bit with better fidelity preservation.

Competitive pressure

Open-source models closed the gap and accelerated pricing pressure further.

Each of these developments was driven primarily by algorithmic innovation, not hardware upgrades. The GPUs did not get 10x cheaper. The math got 10x smarter.

What This Does Not Mean

The dual curve is real, but it is not a magic wand. Three things worth naming clearly:

Cheaper inference does not mean cheaper AI programs.

Per-token API costs are only one component of deploying AI in an organization. Data preparation, workflow redesign, change management, security review, and ongoing governance are not on either curve. The infrastructure is getting cheaper. The organizational work is not.

Efficiency gains are not evenly distributed.

Some workloads will benefit dramatically from techniques like TurboQuant. Others will see modest improvement. The dual curve is a macro trend, not a guarantee for every use case.

The dual curve makes early investments depreciate faster.

An organization that locked into a specific inference stack 18 months ago is now paying significantly more than it needs to. The real advantage goes to organizations that build operational agility, the ability to rapidly adopt new techniques, swap model versions, and re-optimize serving stacks, not just organizations that move first.

There is also a dynamic that economists call Jevons' Paradox: when something becomes more efficient, total consumption often increases rather than decreases. History suggests that cheaper inference leads to more inference, not necessarily lower total spend. That is not a reason to avoid AI. It is a reason to plan for what you will do with the capacity that cheaper costs unlock. The dual curve gives you a strategic choice: do more for the same budget, or do the same for less. That framing matters for the P&L conversation.

What This Means for Strategic Planning

If you are an executive evaluating AI investment, the dual curve has three practical implications.

1. Your cost assumptions are probably too conservative

If you modeled AI costs declining at the rate of Moore's Law alone, your projections are overstating the cost of AI by the time you reach year two of implementation. The algorithmic curve is compressing timelines.

2. Waiting without learning is not a cost strategy

The dual curve rewards operational agility, not just early adoption. Organizations building internal competency now will be positioned to absorb compounding efficiency gains as they arrive. Organizations that have not built that muscle will fumble the opportunity even when the economics are overwhelmingly favorable. The risk is not moving too early. It is arriving unprepared.

3. Your advisors should be tracking both curves

Any advisor who talks about AI costs as if they are static, or who only references hardware roadmaps, is giving you an incomplete picture. The algorithmic curve is where the action is, and it moves faster than most planning cycles can keep up with.

How to Track This Yourself

You do not need to hire an AI consultant to monitor the dual curve. Three practices will keep you calibrated:

1. Benchmark quarterly. Compare your per-unit AI costs against the major API providers (OpenAI, Anthropic, Google) every 90 days. Track the delta. If you are not seeing 30-50% annual declines, you are leaving money on the table.

2. Architect for switchability. Build systems with swappable model backends. The organizations that capture the dual curve are the ones that can adopt a better model or technique within days, not months.

3. Watch the research, not just the products. Papers like TurboQuant hit arxiv months before they show up in commercial APIs. A quarterly scan of top ML conferences (NeurIPS, ICML, ICLR) gives you a 6-12 month preview of where costs are headed.

The Compounding Advantage

The most important insight from TurboQuant is not the specific technique. It is what it represents: one of dozens of algorithmic breakthroughs published every quarter, each one making AI systems more efficient without requiring a single dollar of new hardware investment.

Moore's Law is predictable. Algorithmic efficiency progress is relentless. Together, they create a compounding cost advantage for organizations that build the operational muscle to absorb these gains as they arrive.

The question is not whether AI will become affordable enough for your business. For many workloads, it already is. The question is whether your organization is building the competency to capture the next wave of efficiency gains, or whether you will still be standing up your first implementation while competitors are on their third.

The dual curve does not wait.

MM

Michael Murray

Michael Murray is the Managing Partner of Abeba Co, an AI accelerator that helps organizations build and operate intelligent systems. For more on how algorithmic efficiency gains are reshaping the economics of AI adoption, visit abeba.co.

Share

Ready to Model the Real Economics of AI?

The winners will not just buy AI. They will plan around the compounding cost curve, then build operating leverage before the rest of the market catches up.