ML//Training//distributed training
- Training across multiple GPUs/nodes.
Training across multiple GPUs/nodes.
Data parallelism: same model on every GPU, split the batch. Model parallelism: split the model across GPUs. Pipeline parallelism: split by layer.
Communication overhead is the bottleneck — NVLink and InfiniBand interconnects matter as much as the GPUs themselves.