59 lines
1.5 KiB
Markdown
59 lines
1.5 KiB
Markdown
# ROCm & AMD GPU Computing Skill
|
|
|
|
## Hardware
|
|
- GPU: AMD Radeon RX 6800 XT (Navi 21)
|
|
- VRAM: 16GB
|
|
- Architecture: gfx1030
|
|
- Kernel: 6.8.0-55-generic (with amdgpu-dkms)
|
|
|
|
## Installed ROCm Components
|
|
- ROCm 6.3.2
|
|
- rocminfo
|
|
- rocm-smi
|
|
- hip-runtime-amd
|
|
- miopen-hip
|
|
- rocblas
|
|
- rocfft
|
|
- rocrand
|
|
|
|
## Essential Commands
|
|
|
|
### GPU Monitoring
|
|
```bash
|
|
rocm-smi # GPU status
|
|
rocm-smi --showproductname # GPU info
|
|
rocm-smi --showpid # Show processes
|
|
rocminfo # Detailed ROCm info
|
|
```
|
|
|
|
### Environment Variables
|
|
```bash
|
|
# Add to ~/.bashrc for permanent setup
|
|
export PATH=/opt/rocm/bin:$PATH
|
|
export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
|
|
```
|
|
|
|
### PyTorch with ROCm
|
|
```bash
|
|
# Install PyTorch for ROCm
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
|
|
|
|
# Verify GPU access
|
|
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"
|
|
```
|
|
|
|
### Docker with ROCm
|
|
```bash
|
|
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest
|
|
```
|
|
|
|
## Performance Tuning
|
|
- Set `HSA_OVERRIDE_GFX_VERSION=10.3.0` for compatibility
|
|
- GPU temperature: Check with `rocm-smi` (normal <85°C)
|
|
- Power limit: 264W (default)
|
|
|
|
## Common Issues
|
|
- "No ROCm-capable GPU": Check kernel is 6.8.0-55, not 6.17
|
|
- Missing libraries: Ensure /opt/rocm/lib is in LD_LIBRARY_PATH
|
|
- Permission denied: User must be in 'render' and 'video' groups
|