CBCFacil v8.0 - Refactored with AMD GPU support

2026-01-09 13:05:46 -03:00
parent cb17136f21
commit b017504c52
54 changed files with 7251 additions and 3670 deletions
--- a/amd/ROCM_SETUP.md
+++ b/amd/ROCM_SETUP.md
@@ -0,0 +1,187 @@
+# 🚀 ROCm Setup para AMD GPU
+
+## ✅ Estado Actual del Sistema
+
+**GPU**: AMD Radeon RX 6800 XT
+**ROCm**: 6.0
+**PyTorch**: 2.5.0+rocm6.0
+
+## 📋 Comandos Esenciales
+
+### Verificar GPU
+```bash
+# Información básica de la GPU
+lspci | grep -i vga
+
+# Estado en tiempo real de ROCm
+rocm-smi
+
+# Información detallada del sistema
+rocminfo
+```
+
+### Variables de Entorno Críticas
+```bash
+# CRÍTICO para gfx1030 (RX 6000 series)
+export HSA_OVERRIDE_GFX_VERSION=10.3.0
+
+# Agregar al ~/.bashrc o ~/.zshrc
+echo 'export HSA_OVERRIDE_GFX_VERSION=10.3.0' >> ~/.bashrc
+source ~/.bashrc
+```
+
+### Verificar PyTorch con ROCm
+```bash
+# Test básico de PyTorch
+python3 -c "
+import torch
+print(f'PyTorch: {torch.__version__}')
+print(f'ROCm disponible: {torch.cuda.is_available()}')
+print(f'Dispositivos: {torch.cuda.device_count()}')
+if torch.cuda.is_available():
+    print(f'GPU: {torch.cuda.get_device_name(0)}')
+    print(f'Memoria: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB')
+"
+
+# Benchmark rápido
+python3 -c "
+import torch, time
+a = torch.randn(4096, 4096, device='cuda')
+b = torch.randn(4096, 4096, device='cuda')
+start = time.time()
+c = torch.matmul(a, b)
+torch.cuda.synchronize()
+print(f'GPU time: {time.time() - start:.4f}s')
+"
+```
+
+## 🧪 Script de Stress Test
+
+### Ejecutar Stress Test (2 minutos)
+```bash
+python3 /home/ren/gpu/rocm_stress_test.py
+```
+
+## 🔧 Troubleshooting
+
+### Si ROCm no detecta la GPU:
+```bash
+# Verificar módulos del kernel
+lsmod | grep amdgpu
+lsmod | grep kfd
+
+# Recargar módulos
+sudo modprobe amdgpu
+sudo modprobe kfd
+
+# Verificar logs
+dmesg | grep amdgpu
+```
+
+### Si PyTorch no encuentra ROCm:
+```bash
+# Reinstalar PyTorch con ROCm
+pip uninstall torch torchvision torchaudio
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
+```
+
+### Si hay errores de memoria:
+```bash
+# Limpiar cache de GPU
+python3 -c "import torch; torch.cuda.empty_cache()"
+
+# Verificar uso de memoria
+rocm-smi --meminfo
+```
+
+## 📊 Monitoreo Continuo
+
+### Terminal 1 - Monitor en tiempo real
+```bash
+watch -n 1 rocm-smi
+```
+
+### Terminal 2 - Información detallada
+```bash
+rocm-smi --showtemp --showmeminfo vram --showmeminfo all
+```
+
+## 💡 Ejemplos de Uso
+
+### Cargar modelo en GPU
+```python
+import torch
+from transformers import AutoModel
+
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+print(f"Usando dispositivo: {device}")
+
+model = AutoModel.from_pretrained("bert-base-uncased")
+model = model.to(device)
+
+# Los tensores ahora se procesarán en la GPU
+inputs = torch.tensor([1, 2, 3]).to(device)
+```
+
+### Entrenamiento en GPU
+```python
+import torch
+import torch.nn as nn
+
+device = torch.device("cuda")
+model = tu_modelo().to(device)
+criterion = nn.CrossEntropyLoss().to(device)
+optimizer = torch.optim.Adam(model.parameters())
+
+for epoch in range(epochs):
+    for batch in dataloader:
+        inputs, labels = batch
+        inputs, labels = inputs.to(device), labels.to(device)
+
+        optimizer.zero_grad()
+        outputs = model(inputs)
+        loss = criterion(outputs, labels)
+        loss.backward()
+        optimizer.step()
+```
+
+## 🎯 Optimizaciones
+
+### Para mejor rendimiento:
+```python
+# Usar mixed precision (más rápido en RDNA2)
+from torch.cuda.amp import autocast, GradScaler
+
+scaler = GradScaler()
+with autocast():
+    output = model(inputs)
+    loss = criterion(output, targets)
+
+scaler.scale(loss).backward()
+scaler.step(optimizer)
+scaler.update()
+```
+
+## 📈 Comandos Útiles
+
+```bash
+# Ver versión de ROCm
+rocm-smi --version
+
+# Verificar HSA
+rocminfo
+
+# Test de compatibilidad
+python3 /opt/rocm/bin/rocprofiler-compute-test.py
+
+# Verificar BLAS
+python3 -c "import torch; print(torch.backends.mps.is_available())"  # False en AMD
+```
+
+## ⚡ Performance Tips
+
+1. **Siempre mueve datos a GPU**: `.to(device)`
+2. **Usa batch sizes grandes**: Aprovecha los 16GB de VRAM
+3. **Mixed precision**: Acelera el entrenamiento 1.5-2x
+4. **DataLoader con num_workers**: Carga datos en paralelo
+5. **torch.cuda.synchronize()**: Para benchmarks precisos