Sistema completo de detección de highlights con VLM y análisis de gameplay
- Implementación de detector híbrido (Whisper + Chat + Audio + VLM) - Sistema de detección de gameplay real vs hablando - Scene detection con FFmpeg - Soporte para RTX 3050 y RX 6800 XT - Guía completa en 6800xt.md para próxima IA - Scripts de filtrado visual y análisis de contexto - Pipeline automatizado de generación de videos
This commit is contained in:
109
monitoring_report.md
Normal file
109
monitoring_report.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# GPU/CPU Monitoring Report - Twitch Highlight Detector
|
||||
|
||||
## System Information
|
||||
- **GPU**: NVIDIA GeForce RTX 3050 (8192 MiB)
|
||||
- **Driver**: 580.126.09
|
||||
- **Device**: cuda (CUDA requested and available)
|
||||
|
||||
## Execution Summary
|
||||
- **Total Runtime**: ~10.5 seconds
|
||||
- **Process Completed**: Successfully
|
||||
- **Highlights Found**: 1 (4819s - 4833s, duration: 14s)
|
||||
|
||||
## GPU Utilization Analysis
|
||||
|
||||
### Peak GPU Usage
|
||||
- **Single Peak**: 100% GPU SM utilization (1 second only)
|
||||
- **Location**: During RMS calculation phase
|
||||
- **Memory Usage**: 0-4 MiB (negligible)
|
||||
|
||||
### Average GPU Utilization
|
||||
- **Overall Average**: 3.23%
|
||||
- **During Processing**: ~4% (excluding idle periods)
|
||||
- **Memory Utilization**: ~1% (4 MiB / 8192 MiB)
|
||||
|
||||
### Timeline Breakdown
|
||||
1. **Chat Analysis**: < 0.1s (CPU bound)
|
||||
2. **FFmpeg Audio Extraction**: 8.5s (CPU bound - FFmpeg threads)
|
||||
3. **Audio Decode**: 9.1s (CPU bound - soundfile library)
|
||||
4. **CPU->GPU Transfer**: 1.08s (PCIe transfer)
|
||||
5. **GPU Processing**:
|
||||
- Window creation: 0.00s (GPU)
|
||||
- RMS calculation: 0.12s (GPU - **100% spike**)
|
||||
- Peak detection: 0.00s (GPU)
|
||||
|
||||
## CPU vs GPU Usage Breakdown
|
||||
|
||||
### CPU-Bound Operations (90%+ of runtime)
|
||||
1. **FFmpeg audio extraction** (8.5s)
|
||||
- Process: ffmpeg
|
||||
- Type: Video/audio decoding
|
||||
- GPU usage: 0%
|
||||
|
||||
2. **Soundfile audio decoding** (9.1s overlap)
|
||||
- Process: Python soundfile
|
||||
- Type: WAV decoding
|
||||
- GPU usage: 0%
|
||||
|
||||
3. **Chat JSON parsing** (< 0.5s)
|
||||
- Process: Python json module
|
||||
- Type: File I/O + parsing
|
||||
- GPU usage: 0%
|
||||
|
||||
### GPU-Bound Operations (< 1% of runtime)
|
||||
1. **Audio tensor operations** (0.12s total)
|
||||
- Process: PyTorch CUDA kernels
|
||||
- Type: RMS calculation, window creation
|
||||
- GPU usage: 100% (brief spike)
|
||||
- Memory: Minimal tensor storage
|
||||
|
||||
2. **GPU Memory allocation**
|
||||
- Audio tensor: ~1.2 GB (308M samples × 4 bytes)
|
||||
- Chat tensor: < 1 MB
|
||||
- Calculation buffers: < 100 MB
|
||||
|
||||
## Conclusion
|
||||
|
||||
### **FAIL: GPU not utilized**
|
||||
|
||||
**Reason**: Despite the code successfully using PyTorch CUDA for tensor operations, GPU utilization is minimal because:
|
||||
|
||||
1. **Bottleneck is CPU-bound operations**:
|
||||
- FFmpeg audio extraction (8.5s) - 0% GPU
|
||||
- Soundfile WAV decoding (9.1s) - 0% GPU
|
||||
- These operations cannot use GPU without CUDA-accelerated libraries
|
||||
|
||||
2. **GPU processing is trivial**:
|
||||
- Only 0.12s of actual CUDA kernel execution
|
||||
- Operations are too simple to saturate GPU
|
||||
- Memory bandwidth underutilized
|
||||
|
||||
3. **Architecture mismatch**:
|
||||
- Audio processing on GPU is efficient for large batches
|
||||
- Single-file processing doesn't provide enough parallelism
|
||||
- RTX 3050 designed for larger workloads
|
||||
|
||||
## Recommendations
|
||||
|
||||
### To actually utilize GPU:
|
||||
1. **Use GPU-accelerated audio decoding**:
|
||||
- Replace FFmpeg with NVIDIA NVDEC
|
||||
- Use torchaudio with CUDA backend
|
||||
- Implement custom CUDA audio kernels
|
||||
|
||||
2. **Batch processing**:
|
||||
- Process multiple videos simultaneously
|
||||
- Accumulate audio batches for GPU
|
||||
- Increase tensor operation complexity
|
||||
|
||||
3. **Alternative: Accept CPU-bound nature**:
|
||||
- Current implementation is already optimal for single file
|
||||
- GPU overhead may exceed benefits for small workloads
|
||||
- Consider multi-threaded CPU processing instead
|
||||
|
||||
## Metrics Summary
|
||||
- **GPU utilization**: 3.23% average (FAIL - below 10% threshold)
|
||||
- **CPU usage**: High during FFmpeg/soundfile phases
|
||||
- **Memory usage**: 4 MiB GPU / 347 MB system
|
||||
- **Process efficiency**: 1 highlight / 10.5 seconds
|
||||
|
||||
Reference in New Issue
Block a user