PCIe SLI NVLink

This revision is from 2024/06/24 10:32. You can Restore it.

PCIe versions:

  • PCIe 3.0: 8 GT/s per lane
  • PCIe 4.0: 16 GT/s per lane
  • PCIe 5.0: 32 GT/s per lane
  • PCIe 6.0: 64 GT/s per lane

SLI (Scalable Link Interface):

  • Older technology for linking multiple GPUs
  • Limited to 2-4 GPUs
  • Lower bandwidth compared to NVLink

NVLink:

  • High-bandwidth interconnect for GPU-to-GPU communication
  • Much faster than PCIe and SLI
  • Supports up to 8 GPUs (depends on GPU model)

Impact on LLM training vs. inference:

Training:

  • Requires high bandwidth for data transfer between GPUs
  • NVLink is preferable for multi-GPU setups
  • Higher PCIe versions beneficial for CPU-GPU data transfer

Inference:

  • Generally less demanding on inter-GPU communication
  • PCIe is often sufficient, especially for single-GPU setups
  • NVLink can still improve performance in multi-GPU inference scenarios

Maximal PCIe slot configuration of physical slots with realistic speed assignments relative to the bus lanes to the CPU and Chipset. Most consumer CPUs provide 16-24 PCIe lanes. High-end desktop (HEDT) and server CPUs can offer 40-128 lanes. PCIe lanes from the chipset provide an additional 4-24 lanes, depending on the chipset. PCIe to CPU run at PCIe 3.0: ~985 MB/s per lane, PCIe 4.0: ~1.97 GB/s per lane, PCIe 5.0: ~3.94 GB/s per lane, x16 slot running at x16 will have 16 times the bandwidth of a single lane.

  

📝 📜 ⏱️ ⬆️