Files
twitch-highlight-detector/monitoring_report.md
renato97 00180d0b1c Sistema completo de detección de highlights con VLM y análisis de gameplay
- Implementación de detector híbrido (Whisper + Chat + Audio + VLM)
- Sistema de detección de gameplay real vs hablando
- Scene detection con FFmpeg
- Soporte para RTX 3050 y RX 6800 XT
- Guía completa en 6800xt.md para próxima IA
- Scripts de filtrado visual y análisis de contexto
- Pipeline automatizado de generación de videos
2026-02-19 17:38:14 +00:00

3.4 KiB
Raw Blame History

GPU/CPU Monitoring Report - Twitch Highlight Detector

System Information

  • GPU: NVIDIA GeForce RTX 3050 (8192 MiB)
  • Driver: 580.126.09
  • Device: cuda (CUDA requested and available)

Execution Summary

  • Total Runtime: ~10.5 seconds
  • Process Completed: Successfully
  • Highlights Found: 1 (4819s - 4833s, duration: 14s)

GPU Utilization Analysis

Peak GPU Usage

  • Single Peak: 100% GPU SM utilization (1 second only)
  • Location: During RMS calculation phase
  • Memory Usage: 0-4 MiB (negligible)

Average GPU Utilization

  • Overall Average: 3.23%
  • During Processing: ~4% (excluding idle periods)
  • Memory Utilization: ~1% (4 MiB / 8192 MiB)

Timeline Breakdown

  1. Chat Analysis: < 0.1s (CPU bound)
  2. FFmpeg Audio Extraction: 8.5s (CPU bound - FFmpeg threads)
  3. Audio Decode: 9.1s (CPU bound - soundfile library)
  4. CPU->GPU Transfer: 1.08s (PCIe transfer)
  5. GPU Processing:
    • Window creation: 0.00s (GPU)
    • RMS calculation: 0.12s (GPU - 100% spike)
    • Peak detection: 0.00s (GPU)

CPU vs GPU Usage Breakdown

CPU-Bound Operations (90%+ of runtime)

  1. FFmpeg audio extraction (8.5s)

    • Process: ffmpeg
    • Type: Video/audio decoding
    • GPU usage: 0%
  2. Soundfile audio decoding (9.1s overlap)

    • Process: Python soundfile
    • Type: WAV decoding
    • GPU usage: 0%
  3. Chat JSON parsing (< 0.5s)

    • Process: Python json module
    • Type: File I/O + parsing
    • GPU usage: 0%

GPU-Bound Operations (< 1% of runtime)

  1. Audio tensor operations (0.12s total)

    • Process: PyTorch CUDA kernels
    • Type: RMS calculation, window creation
    • GPU usage: 100% (brief spike)
    • Memory: Minimal tensor storage
  2. GPU Memory allocation

    • Audio tensor: ~1.2 GB (308M samples × 4 bytes)
    • Chat tensor: < 1 MB
    • Calculation buffers: < 100 MB

Conclusion

FAIL: GPU not utilized

Reason: Despite the code successfully using PyTorch CUDA for tensor operations, GPU utilization is minimal because:

  1. Bottleneck is CPU-bound operations:

    • FFmpeg audio extraction (8.5s) - 0% GPU
    • Soundfile WAV decoding (9.1s) - 0% GPU
    • These operations cannot use GPU without CUDA-accelerated libraries
  2. GPU processing is trivial:

    • Only 0.12s of actual CUDA kernel execution
    • Operations are too simple to saturate GPU
    • Memory bandwidth underutilized
  3. Architecture mismatch:

    • Audio processing on GPU is efficient for large batches
    • Single-file processing doesn't provide enough parallelism
    • RTX 3050 designed for larger workloads

Recommendations

To actually utilize GPU:

  1. Use GPU-accelerated audio decoding:

    • Replace FFmpeg with NVIDIA NVDEC
    • Use torchaudio with CUDA backend
    • Implement custom CUDA audio kernels
  2. Batch processing:

    • Process multiple videos simultaneously
    • Accumulate audio batches for GPU
    • Increase tensor operation complexity
  3. Alternative: Accept CPU-bound nature:

    • Current implementation is already optimal for single file
    • GPU overhead may exceed benefits for small workloads
    • Consider multi-threaded CPU processing instead

Metrics Summary

  • GPU utilization: 3.23% average (FAIL - below 10% threshold)
  • CPU usage: High during FFmpeg/soundfile phases
  • Memory usage: 4 MiB GPU / 347 MB system
  • Process efficiency: 1 highlight / 10.5 seconds