CPU//cache
Small, fast SRAM sitting between the core and main RAM
Small, fast SRAM sitting between the core and main RAM
Organized in levels — L1 (per-core, ~4 cycles), L2 (per-core or shared, ~12 cycles), L3 (shared across cores, ~40 cycles)
Exploits temporal and spatial locality — most programs re-access the same data and nearby addresses repeatedly.
Cache misses stall the pipeline and dominate performance on memory-bound workloads.
Compare with cache (disk-level) and cache (CDN/proxy-level) — same principle, different latency scale.