Why Computers Aren't More Modular: The Interconnect Bottleneck
Hardware modularity is limited by interconnect speeds (inter-chip communication is ~100x slower than intra-chip), software parallelization difficulty, and Amdahl's Law. Apple's integrated SoC approach avoids the bottleneck.
The intuition that you should be able to combine multiple older GPUs or CPUs to match a newer one is appealing but runs into fundamental architectural limitations. Why simple hardware aggregation doesn't work: **The interconnect bottleneck**: Communication between separate chips is orders of magnitude slower than communication within a single chip. Data moving between two GPUs over PCIe is ~100x slower than data moving within one GPU's internal memory. For most tasks, the processors spend more time waiting for data transfers than actually computing. **Software complexity**: Programs must be explicitly written to split work across multiple processors. This "parallelization" is trivial for some workloads (rendering frames, cryptocurrency mining) but extremely difficult or impossible for others (sequential logic, real-time game physics, most general computing). **Diminishing returns**: Even with perfectly parallel workloads, Amdahl's Law states that the speedup from adding processors is limited by the portion of work that cannot be parallelized. If even 5% of a task must run sequentially, adding 100 processors gives at most a 20x speedup. What does work: - GPU clusters for AI training and cryptocurrency mining (embarrassingly parallel) - Render farms for video and 3D (each frame is independent) - Distributed computing projects like Folding@Home (independent work units) Apple's M-series chips represent a different approach: instead of connecting separate components, integrate everything (CPU, GPU, Neural Engine, memory) onto one chip with shared memory, eliminating the interconnect bottleneck entirely.