Distributing A.I across cards
This revision is from 2024/06/21 19:29. You can Restore it.
- BIOS Settings: Check your motherboard BIOS settings to ensure all PCIe slots are enabled and set to their maximum bandwidth.
- OS Support: Use an operating system that supports multiple GPUs, such as a recent version of Linux (Ubuntu, for example)
- Install Drivers: Install the latest drivers for your GPUs. For NVIDIA GPUs, download and install the latest drivers from the NVIDIA website.
- CUDA Toolkit: Install the CUDA toolkit compatible with your GPU drivers. Follow the installation instructions on the NVIDIA CUDA Toolkit website.
- cuDNN: Install the cuDNN library compatible with your CUDA version. Download it from the NVIDIA cuDNN page and follow the installation instructions.
- Frameworks: Install the machine learning frameworks that support multi-GPU setups. For LLMs, popular frameworks include TensorFlow and PyTorch.
Use nvidia-smi to monitor GPU usage and ensure all GPUs are being utilized.
nvidia-smi
Some codes...
python3 -m venv myenv
source myenv/bin/activate
pip install tensorflow
pip install torch
Tensorflow: tf.distribute.MirroredStrategy
import tensorflow as tf
# Load your modelmodel = tf.keras.models.load_model('path_to_your_model')
# Strategy for multi-GPU inferencestrategy = tf.distribute.MirroredStrategy()
with strategy.scope():
# Use the model for inference
predictions = model.predict(your_input_data)
PyTorch: torch.nn.DataParallel
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load your model and tokenizermodel_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Move model to GPU and wrap with DataParallelmodel = model.to('cuda')
model = torch.nn.DataParallel(model)
# Inference functiondef infer(texts):
inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
inputs = {k: v.to('cuda') for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
return outputs
# Example usagetexts = ["This is a sample text", "Another sample text"]
outputs = infer(texts)
another with more details...
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load your model and tokenizermodel_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Move model to GPU and wrap with DataParallelmodel = model.to('cuda')
model = torch.nn.DataParallel(model)