ML//NPU

2026-02-05

Dedicated accelerator for ML inference workloads, optimized for matrix multiply-accumulate at low precision (INT8, FP16)

Dedicated accelerator for ML inference workloads, optimized for matrix multiply-accumulate at low precision (INT8, FP16)

Found inside modern SoC designs (Apple Neural Engine, Qualcomm Hexagon, Google Tensor)

Offloads inference from the CPU and GPU, achieving higher throughput per watt.

Programmable via frameworks like Core ML, NNAPI, or ONNX Runtime.

NPU is passive; awakens the MCU only if a pattern is detected.

Used for continuous events (sound) where always-on sensor filtering isn't enough.