Files

renato97 00180d0b1c Sistema completo de detección de highlights con VLM y análisis de gameplay

- Implementación de detector híbrido (Whisper + Chat + Audio + VLM)
- Sistema de detección de gameplay real vs hablando
- Scene detection con FFmpeg
- Soporte para RTX 3050 y RX 6800 XT
- Guía completa en 6800xt.md para próxima IA
- Scripts de filtrado visual y análisis de contexto
- Pipeline automatizado de generación de videos

2026-02-19 17:38:14 +00:00

3.4 KiB

Raw Blame History

GPU/CPU Monitoring Report - Twitch Highlight Detector

System Information

GPU: NVIDIA GeForce RTX 3050 (8192 MiB)
Driver: 580.126.09
Device: cuda (CUDA requested and available)

Execution Summary

Total Runtime: ~10.5 seconds
Process Completed: Successfully
Highlights Found: 1 (4819s - 4833s, duration: 14s)

GPU Utilization Analysis

Peak GPU Usage

Single Peak: 100% GPU SM utilization (1 second only)
Location: During RMS calculation phase
Memory Usage: 0-4 MiB (negligible)

Average GPU Utilization

Overall Average: 3.23%
During Processing: ~4% (excluding idle periods)
Memory Utilization: ~1% (4 MiB / 8192 MiB)

Timeline Breakdown

Chat Analysis: < 0.1s (CPU bound)
FFmpeg Audio Extraction: 8.5s (CPU bound - FFmpeg threads)
Audio Decode: 9.1s (CPU bound - soundfile library)
CPU->GPU Transfer: 1.08s (PCIe transfer)
GPU Processing:
- Window creation: 0.00s (GPU)
- RMS calculation: 0.12s (GPU - 100% spike)
- Peak detection: 0.00s (GPU)

CPU vs GPU Usage Breakdown

CPU-Bound Operations (90%+ of runtime)

FFmpeg audio extraction (8.5s)
- Process: ffmpeg
- Type: Video/audio decoding
- GPU usage: 0%
Soundfile audio decoding (9.1s overlap)
- Process: Python soundfile
- Type: WAV decoding
- GPU usage: 0%
Chat JSON parsing (< 0.5s)
- Process: Python json module
- Type: File I/O + parsing
- GPU usage: 0%

GPU-Bound Operations (< 1% of runtime)

Audio tensor operations (0.12s total)
- Process: PyTorch CUDA kernels
- Type: RMS calculation, window creation
- GPU usage: 100% (brief spike)
- Memory: Minimal tensor storage
GPU Memory allocation
- Audio tensor: ~1.2 GB (308M samples × 4 bytes)
- Chat tensor: < 1 MB
- Calculation buffers: < 100 MB

Conclusion

FAIL: GPU not utilized

Reason: Despite the code successfully using PyTorch CUDA for tensor operations, GPU utilization is minimal because:

Bottleneck is CPU-bound operations:
- FFmpeg audio extraction (8.5s) - 0% GPU
- Soundfile WAV decoding (9.1s) - 0% GPU
- These operations cannot use GPU without CUDA-accelerated libraries
GPU processing is trivial:
- Only 0.12s of actual CUDA kernel execution
- Operations are too simple to saturate GPU
- Memory bandwidth underutilized
Architecture mismatch:
- Audio processing on GPU is efficient for large batches
- Single-file processing doesn't provide enough parallelism
- RTX 3050 designed for larger workloads

Recommendations

To actually utilize GPU:

Use GPU-accelerated audio decoding:
- Replace FFmpeg with NVIDIA NVDEC
- Use torchaudio with CUDA backend
- Implement custom CUDA audio kernels
Batch processing:
- Process multiple videos simultaneously
- Accumulate audio batches for GPU
- Increase tensor operation complexity
Alternative: Accept CPU-bound nature:
- Current implementation is already optimal for single file
- GPU overhead may exceed benefits for small workloads
- Consider multi-threaded CPU processing instead

Metrics Summary

GPU utilization: 3.23% average (FAIL - below 10% threshold)
CPU usage: High during FFmpeg/soundfile phases
Memory usage: 4 MiB GPU / 347 MB system
Process efficiency: 1 highlight / 10.5 seconds

3.4 KiB Raw Blame History Unescape Escape

GPU/CPU Monitoring Report - Twitch Highlight Detector

System Information

Execution Summary

GPU Utilization Analysis

Peak GPU Usage

Average GPU Utilization

Timeline Breakdown

CPU vs GPU Usage Breakdown

CPU-Bound Operations (90%+ of runtime)

GPU-Bound Operations (< 1% of runtime)

Conclusion

FAIL: GPU not utilized

Recommendations

To actually utilize GPU:

Metrics Summary

3.4 KiB

Raw Blame History