Sistema completo de detección de highlights con VLM y análisis de gameplay

- Implementación de detector híbrido (Whisper + Chat + Audio + VLM) - Sistema de detección de gameplay real vs hablando - Scene detection con FFmpeg - Soporte para RTX 3050 y RX 6800 XT - Guía completa en 6800xt.md para próxima IA - Scripts de filtrado visual y análisis de contexto - Pipeline automatizado de generación de videos
2026-02-19 17:38:14 +00:00
parent c1c66a7d9a
commit 00180d0b1c
45 changed files with 10636 additions and 260 deletions
--- a/monitoring_report.md
+++ b/monitoring_report.md
@@ -0,0 +1,109 @@
+# GPU/CPU Monitoring Report - Twitch Highlight Detector
+
+## System Information
+- **GPU**: NVIDIA GeForce RTX 3050 (8192 MiB)
+- **Driver**: 580.126.09
+- **Device**: cuda (CUDA requested and available)
+
+## Execution Summary
+- **Total Runtime**: ~10.5 seconds
+- **Process Completed**: Successfully
+- **Highlights Found**: 1 (4819s - 4833s, duration: 14s)
+
+## GPU Utilization Analysis
+
+### Peak GPU Usage
+- **Single Peak**: 100% GPU SM utilization (1 second only)
+- **Location**: During RMS calculation phase
+- **Memory Usage**: 0-4 MiB (negligible)
+
+### Average GPU Utilization
+- **Overall Average**: 3.23%
+- **During Processing**: ~4% (excluding idle periods)
+- **Memory Utilization**: ~1% (4 MiB / 8192 MiB)
+
+### Timeline Breakdown
+1. **Chat Analysis**: < 0.1s (CPU bound)
+2. **FFmpeg Audio Extraction**: 8.5s (CPU bound - FFmpeg threads)
+3. **Audio Decode**: 9.1s (CPU bound - soundfile library)
+4. **CPU->GPU Transfer**: 1.08s (PCIe transfer)
+5. **GPU Processing**:
+   - Window creation: 0.00s (GPU)
+   - RMS calculation: 0.12s (GPU - **100% spike**)
+   - Peak detection: 0.00s (GPU)
+
+## CPU vs GPU Usage Breakdown
+
+### CPU-Bound Operations (90%+ of runtime)
+1. **FFmpeg audio extraction** (8.5s)
+   - Process: ffmpeg
+   - Type: Video/audio decoding
+   - GPU usage: 0%
+   
+2. **Soundfile audio decoding** (9.1s overlap)
+   - Process: Python soundfile
+   - Type: WAV decoding
+   - GPU usage: 0%
+
+3. **Chat JSON parsing** (< 0.5s)
+   - Process: Python json module
+   - Type: File I/O + parsing
+   - GPU usage: 0%
+
+### GPU-Bound Operations (< 1% of runtime)
+1. **Audio tensor operations** (0.12s total)
+   - Process: PyTorch CUDA kernels
+   - Type: RMS calculation, window creation
+   - GPU usage: 100% (brief spike)
+   - Memory: Minimal tensor storage
+
+2. **GPU Memory allocation**
+   - Audio tensor: ~1.2 GB (308M samples × 4 bytes)
+   - Chat tensor: < 1 MB
+   - Calculation buffers: < 100 MB
+
+## Conclusion
+
+### **FAIL: GPU not utilized**
+
+**Reason**: Despite the code successfully using PyTorch CUDA for tensor operations, GPU utilization is minimal because:
+
+1. **Bottleneck is CPU-bound operations**:
+   - FFmpeg audio extraction (8.5s) - 0% GPU
+   - Soundfile WAV decoding (9.1s) - 0% GPU
+   - These operations cannot use GPU without CUDA-accelerated libraries
+
+2. **GPU processing is trivial**:
+   - Only 0.12s of actual CUDA kernel execution
+   - Operations are too simple to saturate GPU
+   - Memory bandwidth underutilized
+
+3. **Architecture mismatch**:
+   - Audio processing on GPU is efficient for large batches
+   - Single-file processing doesn't provide enough parallelism
+   - RTX 3050 designed for larger workloads
+
+## Recommendations
+
+### To actually utilize GPU:
+1. **Use GPU-accelerated audio decoding**:
+   - Replace FFmpeg with NVIDIA NVDEC
+   - Use torchaudio with CUDA backend
+   - Implement custom CUDA audio kernels
+
+2. **Batch processing**:
+   - Process multiple videos simultaneously
+   - Accumulate audio batches for GPU
+   - Increase tensor operation complexity
+
+3. **Alternative: Accept CPU-bound nature**:
+   - Current implementation is already optimal for single file
+   - GPU overhead may exceed benefits for small workloads
+   - Consider multi-threaded CPU processing instead
+
+## Metrics Summary
+- **GPU utilization**: 3.23% average (FAIL - below 10% threshold)
+- **CPU usage**: High during FFmpeg/soundfile phases
+- **Memory usage**: 4 MiB GPU / 347 MB system
+- **Process efficiency**: 1 highlight / 10.5 seconds
+