Pipelining: The Processor Technique That Keeps Every Stage Busy
Pipelining splits instruction execution into stages running simultaneously on different data, enabling one instruction to complete per clock cycle at peak throughput.
Pipelining is a performance technique that splits a multi-step process into stages executed simultaneously on different data — like an assembly line where each station works on a different item. ## Classic CPU Pipeline A traditional 5-stage pipeline: **Fetch** → **Decode** → **Execute** → **Memory** → **Writeback**. While one instruction is being executed, the next is being decoded, and a third is being fetched. At peak throughput, one instruction completes per clock cycle despite each taking 5 cycles individually. ## Hazards Pipelining isn't free — three types of hazards reduce efficiency: - **Data hazards**: An instruction needs a result that hasn't been computed yet (solved by forwarding/bypassing or stalling) - **Control hazards**: Branch instructions mean the pipeline may have fetched wrong instructions (solved by branch prediction) - **Structural hazards**: Two stages need the same hardware resource simultaneously ## Depth Tradeoffs Deeper pipelines (more stages) increase clock speed but also increase the penalty for mispredicted branches — all wrongly-fetched instructions must be flushed. Intel's Pentium 4 (2000) pushed to 20+ pipeline stages for high clock speeds but suffered from branch misprediction penalties. ## Beyond CPUs The same principle applies in FPGAs Explained: How Field Programmable Gate Arrays Work and When to Use Them (pipelining data processing paths for throughput), GPU (Graphics Processing Unit): From Rendering Pixels to Training AI shader execution, and even software (instruction-level parallelism, network packet processing).