Distributing A.I across cards

This revision is from 2024/06/21 19:26. You can Restore it.

  1. BIOS Settings: Check your motherboard BIOS settings to ensure all PCIe slots are enabled and set to their maximum bandwidth.
  2. OS Support: Use an operating system that supports multiple GPUs, such as a recent version of Linux (Ubuntu, for example)
  3. Install Drivers: Install the latest drivers for your GPUs. For NVIDIA GPUs, download and install the latest drivers from the NVIDIA website.
  4. CUDA Toolkit: Install the CUDA toolkit compatible with your GPU drivers. Follow the installation instructions on the NVIDIA CUDA Toolkit website.
  5. cuDNN: Install the cuDNN library compatible with your CUDA version. Download it from the NVIDIA cuDNN page and follow the installation instructions.
  6. Frameworks: Install the machine learning frameworks that support multi-GPU setups. For LLMs, popular frameworks include TensorFlow and PyTorch.

Use nvidia-smi to monitor GPU usage and ensure all GPUs are being utilized.

nvidia-smi

Some codes...

python3 -m venv myenv

source myenv/bin/activate

pip install tensorflow

pip install torch

Tensorflow: tf.distribute.MirroredStrategy

  

📝 📜 ⏱️ ⬆️