Why power is the new bottleneck — Circuit Copilot Glossary

For decades the constraint on computing was silicon: how many transistors could you fit on a die. The hard limit now is electricity — both delivering it to the data center and removing the heat it generates.

For decades, the constraint on computing was silicon: how many transistors could you fit on a die. That's no longer the hard limit. The hard limit now is electricity — both how much you can deliver to a data center, and how much heat you can remove from it.

The numbers, briefly

A single NVIDIA H100 GPU draws around 700 W under load. A B200 draws ~1,200 W. A training cluster with 100,000 GPUs is roughly a 100 MW facility — comparable to the electrical demand of a small city.

When OpenAI / Microsoft / Meta talk about training the next frontier model, the conversation is no longer "can we build a chip fast enough?" It's "can we get a 1 GW power connection to a site within 18 months?" The answer is increasingly: not really, not unless you build your own.

Where the power actually goes

Inside a modern GPU, transistors flip billions of times per second. Each switch dissipates a tiny amount of energy. Multiply by ~16,000 cores switching at 2 GHz and you get heat — hundreds of watts per chip, all flowing through a few square centimeters of silicon.

Up to about 2005, processors got faster by raising the clock frequency. Then heat density hit a wall: at ~100 W/cm² you can't reliably cool a chip with air. The industry pivoted to multi-core (more transistors at lower frequency) and parallelism — which is exactly what makes GPUs so effective for AI.

The data center constraint

A modern hyperscale data center is fundamentally a power plant with computers attached. Roughly half the cost goes to power and cooling infrastructure, not the IT equipment. PUE (Power Usage Effectiveness) measures this: 1.0 would mean every watt goes to compute. Best-in-class hyperscalers hit ~1.1; older facilities are at 1.5+.

Liquid cooling has become standard for AI training racks because air cooling can't keep up with 50–100 kW per rack. Cold-plate direct-to-chip is the current generation; immersion cooling is coming for higher densities.

Why hyperscalers are buying nuclear plants

In 2024–25, multiple US hyperscalers signed deals to restart or build nuclear facilities specifically for AI compute. Amazon's Three Mile Island restart, Microsoft's deals around small modular reactors, Google's contracts with Kairos Power — all responses to the same problem: the grid in the places they want to build can't deliver the power they need on the timeline they need it.

Wind and solar can't deliver baseload at the scale and reliability AI training needs (you don't pause a $500M training run because it's cloudy). Natural gas works but is bad for the carbon targets these companies have committed to. Nuclear is the remaining option for carbon-free baseload — which is why a workload nobody thought about a decade ago is now reshaping the energy grid.

The implication for ML engineers

Energy efficiency of your model is no longer a secondary concern. A 2× more efficient inference engine means 2× the throughput from the same data center capacity — and you can't easily get more data center capacity. This is part of why quantization (INT8, FP8, INT4), distillation, and architectural efficiency (MoE, MQA, sparse attention) have become first-class concerns, not afterthoughts.