Files
twitch-highlight-detector/monitoring_report.md
renato97 00180d0b1c Sistema completo de detección de highlights con VLM y análisis de gameplay
- Implementación de detector híbrido (Whisper + Chat + Audio + VLM)
- Sistema de detección de gameplay real vs hablando
- Scene detection con FFmpeg
- Soporte para RTX 3050 y RX 6800 XT
- Guía completa en 6800xt.md para próxima IA
- Scripts de filtrado visual y análisis de contexto
- Pipeline automatizado de generación de videos
2026-02-19 17:38:14 +00:00

110 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# GPU/CPU Monitoring Report - Twitch Highlight Detector
## System Information
- **GPU**: NVIDIA GeForce RTX 3050 (8192 MiB)
- **Driver**: 580.126.09
- **Device**: cuda (CUDA requested and available)
## Execution Summary
- **Total Runtime**: ~10.5 seconds
- **Process Completed**: Successfully
- **Highlights Found**: 1 (4819s - 4833s, duration: 14s)
## GPU Utilization Analysis
### Peak GPU Usage
- **Single Peak**: 100% GPU SM utilization (1 second only)
- **Location**: During RMS calculation phase
- **Memory Usage**: 0-4 MiB (negligible)
### Average GPU Utilization
- **Overall Average**: 3.23%
- **During Processing**: ~4% (excluding idle periods)
- **Memory Utilization**: ~1% (4 MiB / 8192 MiB)
### Timeline Breakdown
1. **Chat Analysis**: < 0.1s (CPU bound)
2. **FFmpeg Audio Extraction**: 8.5s (CPU bound - FFmpeg threads)
3. **Audio Decode**: 9.1s (CPU bound - soundfile library)
4. **CPU->GPU Transfer**: 1.08s (PCIe transfer)
5. **GPU Processing**:
- Window creation: 0.00s (GPU)
- RMS calculation: 0.12s (GPU - **100% spike**)
- Peak detection: 0.00s (GPU)
## CPU vs GPU Usage Breakdown
### CPU-Bound Operations (90%+ of runtime)
1. **FFmpeg audio extraction** (8.5s)
- Process: ffmpeg
- Type: Video/audio decoding
- GPU usage: 0%
2. **Soundfile audio decoding** (9.1s overlap)
- Process: Python soundfile
- Type: WAV decoding
- GPU usage: 0%
3. **Chat JSON parsing** (< 0.5s)
- Process: Python json module
- Type: File I/O + parsing
- GPU usage: 0%
### GPU-Bound Operations (< 1% of runtime)
1. **Audio tensor operations** (0.12s total)
- Process: PyTorch CUDA kernels
- Type: RMS calculation, window creation
- GPU usage: 100% (brief spike)
- Memory: Minimal tensor storage
2. **GPU Memory allocation**
- Audio tensor: ~1.2 GB (308M samples × 4 bytes)
- Chat tensor: < 1 MB
- Calculation buffers: < 100 MB
## Conclusion
### **FAIL: GPU not utilized**
**Reason**: Despite the code successfully using PyTorch CUDA for tensor operations, GPU utilization is minimal because:
1. **Bottleneck is CPU-bound operations**:
- FFmpeg audio extraction (8.5s) - 0% GPU
- Soundfile WAV decoding (9.1s) - 0% GPU
- These operations cannot use GPU without CUDA-accelerated libraries
2. **GPU processing is trivial**:
- Only 0.12s of actual CUDA kernel execution
- Operations are too simple to saturate GPU
- Memory bandwidth underutilized
3. **Architecture mismatch**:
- Audio processing on GPU is efficient for large batches
- Single-file processing doesn't provide enough parallelism
- RTX 3050 designed for larger workloads
## Recommendations
### To actually utilize GPU:
1. **Use GPU-accelerated audio decoding**:
- Replace FFmpeg with NVIDIA NVDEC
- Use torchaudio with CUDA backend
- Implement custom CUDA audio kernels
2. **Batch processing**:
- Process multiple videos simultaneously
- Accumulate audio batches for GPU
- Increase tensor operation complexity
3. **Alternative: Accept CPU-bound nature**:
- Current implementation is already optimal for single file
- GPU overhead may exceed benefits for small workloads
- Consider multi-threaded CPU processing instead
## Metrics Summary
- **GPU utilization**: 3.23% average (FAIL - below 10% threshold)
- **CPU usage**: High during FFmpeg/soundfile phases
- **Memory usage**: 4 MiB GPU / 347 MB system
- **Process efficiency**: 1 highlight / 10.5 seconds