What's actually inside a GPU?
A GPU is not a "faster CPU." It's thousands of small, dumb workers all doing the same operation at once on different data. That's why it accelerates matrix math — and that's the whole reason it accelerates AI.
Open up an NVIDIA H100 die shot and you'll see 132 Streaming Multiprocessors (SMs), each containing 128 CUDA cores plus specialized Tensor Cores for matrix-multiply-and-accumulate. That's roughly 16,000 simple arithmetic units all running simultaneously.
The key insight: a CPU optimizes for latency (finish one task
fast). A GPU optimizes for throughput (finish 16,000 tasks at
once, even if each one is slower). Training a neural network is mostly
multiplying large matrices — exactly the workload GPUs were designed for,
even before AI was the point.
The other half of a modern GPU is memory bandwidth. HBM3 stacks deliver ~3 TB/s of bandwidth to those cores. Starve them of data and the compute is useless. This is why "compute" and "memory" are both quoted in GPU specs.