diff --git a/.gitignore b/.gitignore index 97e96d5..5931e2f 100644 --- a/.gitignore +++ b/.gitignore @@ -1,164 +1,41 @@ -# Byte-compiled / optimized / DLL files +# Python __pycache__/ *.py[cod] *$py.class - -# C extensions *.so - -# Distribution / packaging .Python -build/ -develop-eggs/ -dist/ -downloads/ -eggs/ -.eggs/ -lib/ -lib64/ -parts/ -sdist/ -var/ -wheels/ -share/python-wheels/ -*.egg-info/ -.installed.cfg -*.egg -MANIFEST - -# PyInstaller -# Usually these files are written by a python script from a template -# before PyInstaller builds the exe, so as to inject date/other infos into it. -*.manifest -*.spec - -# Installer logs -pip-log.txt -pip-delete-this-directory.txt - -# Unit test / coverage reports -htmlcov/ -.tox/ -.nox/ -.coverage -.coverage.* -.cache -nosetests.xml -coverage.xml -*.cover -*.py,cover -.hypothesis/ -.pytest_cache/ -cover/ - -# Translations -*.mo -*.pot - -# Django stuff: -*.log -local_settings.py -db.sqlite3 -db.sqlite3-journal - -# Flask stuff: -instance/ -.webassets-cache - -# Scrapy stuff: -.scrapy - -# Sphinx documentation -docs/_build/ - -# PyBuilder -.pybuilder/ -target/ - -# Jupyter Notebook -.ipynb_checkpoints - -# IPython -profile_default/ -ipython_config.py - -# pyenv -# For a library or package, you might want to ignore these files since the code is -# intended to run in multiple environments; otherwise, check them in: -# .python-version - -# pipenv -# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. -# However, in case of collaboration, if having platform-specific dependencies or dependencies -# having no cross-platform support, pipenv may install dependencies that don't work, or not -# install all needed dependencies. -#Pipfile.lock - -# poetry -# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. -# This is especially recommended for binary packages to ensure reproducibility, and is more -# commonly ignored for libraries. -# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control -#poetry.lock - -# pdm -# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. -#pdm.lock -# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it -# in version control. -# https://pdm.fming.dev/#use-with-ide -.pdm.toml - -# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm -__pypackages__/ - -# Celery stuff -celerybeat-schedule -celerybeat.pid - -# SageMath parsed files -*.sage.py - -# Environments -.env -.venv -env/ venv/ -ENV/ -env.bak/ -venv.bak/ +env/ +.venv/ -# Spyder project settings -.spyderproject -.spyproject - -# Rope project settings -.ropeproject - -# mkdocs documentation -/site - -# mypy -.mypy_cache/ -.dmypy.json -dmypy.json - -# Pyre type checker -.pyre/ - -# pytype static type analyzer -.pytype/ - -# Cython debug symbols -cython_debug/ - -# PyCharm -# JetBrains specific template is maintained in a separate JetBrains.gitignore that can -# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore -# and can be added to the global gitignore or merged into this file. For a more nuclear -# option (not recommended) you can uncomment the following to ignore the entire idea folder. +# IDEs +.vscode/ .idea/ +*.swp +*.swo -recorded +# Videos (no subir a git) +*.mp4 +*.mkv +*.avi +*.mov -.DS_Store \ No newline at end of file +# Chat (puede ser grande) +*.json +*.txt + +# Highlights +*highlights*.json +*_final.mp4 + +# Temp +temp_* +frames_temp/ +*.wav + +# OS +.DS_Store +Thumbs.db + +# Env +.env diff --git a/README.md b/README.md index f1c1d3a..4097565 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,213 @@ -## Known issues: -- Configure logger with config file -- Support multiple streamer -- Post process with ffmpeg -- Avoid using streamer name. Need to use id instead \ No newline at end of file +# 🎬 Twitch Highlight Detector + +Pipeline automatizado para detectar y generar highlights de streams de Twitch y Kick. + +## ✨ Características + +- **Descarga automática** de VODs y chat +- **Detección 2 de 3**: Chat saturado + Audio (gritos) + Colores brillantes +- **Modo Draft**: Procesa en 360p para prueba rápida +- **Modo HD**: Procesa en 1080p para calidad máxima +- **Soporte GPU**: Preparado para NVIDIA (CUDA) y AMD (ROCm) +- **CLI simple**: Un solo comando para todo el pipeline + +## 🚀 Uso Rápido + +```bash +# Modo Draft (360p) - Prueba rápida +./pipeline.sh --draft + +# Modo HD (1080p) - Alta calidad +./pipeline.sh --hd +``` + +### Ejemplo + +```bash +# Descargar y procesar en modo draft +./pipeline.sh 2701190361 elxokas --draft + +# Si te gusta, procesar en HD +./pipeline.sh 2701190361 elxokas_hd --hd +``` + +## 📋 Requisitos + +### Sistema +```bash +# Arch Linux +sudo pacman -S ffmpeg streamlink git + +# Ubuntu/Debian +sudo apt install ffmpeg streamlink git + +# macOS +brew install ffmpeg streamlink git +``` + +### Python +```bash +pip install moviepy opencv-python scipy numpy python-dotenv torch +``` + +### .NET (para TwitchDownloaderCLI) +```bash +# Descarga el binario desde releases o compila +# https://github.com/lay295/TwitchDownloader/releases +``` + +## 📖 Documentación + +| Archivo | Descripción | +|---------|-------------| +| [README.md](README.md) | Este archivo | +| [CONtexto.md](contexto.md) | Historia y contexto del proyecto | +| [TODO.md](TODO.md) | Lista de tareas pendientes | +| [HIGHLIGHT.md](HIGHLIGHT.md) | Guía de uso del pipeline | + +## 🔧 Instalación + +### 1. Clonar el repo +```bash +git clone https://tu-gitea/twitch-highlight-detector.git +cd twitch-highlight-detector +``` + +### 2. Configurar credenciales +```bash +cp .env.example .env +# Edita .env con tus credenciales de Twitch +``` + +### 3. Instalar dependencias +```bash +pip install -r requirements.txt +``` + +### 4. Instalar TwitchDownloaderCLI +```bash +# Descargar desde releases +curl -L -o TwitchDownloaderCLI https://github.com/lay295/TwitchDownloader/releases/latest/download/TwitchDownloaderCLI +chmod +x TwitchDownloaderCLI +sudo mv TwitchDownloaderCLI /usr/local/bin/ +``` + +## 🎯 Cómo Funciona + +### Pipeline (2 de 3) + +El sistema detecta highlights cuando se cumplen al menos 2 de estas 3 condiciones: + +1. **Chat saturado**: Muchos mensajes en poco tiempo +2. **Audio intenso**: Picos de volumen (gritos, momentos épicos) +3. **Colores brillantes**: Efectos visuales, cambios de escena + +### Flujo + +``` +1. streamlink → Descarga video (VOD) +2. TwitchDownloaderCLI → Descarga chat +3. detector_gpu.py → Analiza chat + audio + color +4. generate_video.py → Crea video resumen +``` + +## 📁 Estructura + +``` +├── .env # Credenciales (noCommit) +├── .gitignore +├── requirements.txt # Dependencias Python +├── main.py # Entry point +├── pipeline.sh # Pipeline completo +├── detector_gpu.py # Detector (chat + audio + color) +├── generate_video.py # Generador de video +├── lower # Script descarga streams +├── README.md # Este archivo +├── CONtexto.md # Contexto del proyecto +├── TODO.md # Tareas pendientes +└── HIGHLIGHT.md # Guía detallada +``` + +## ⚙️ Configuración + +### Parámetros del Detector + +Edita `detector_gpu.py` para ajustar: + +```python +--threshold # Sensibilidad (default: 1.5) +--min-duration # Duración mínima highlight (default: 10s) +--device # GPU: auto/cuda/cpu +``` + +### Parámetros del Video + +Edita `generate_video.py`: + +```python +--padding # Segundos extra antes/después (default: 5) +``` + +## 🖥️ GPU + +### NVIDIA (CUDA) +```bash +pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 +``` + +### AMD (ROCm) +```bash +pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.1 +``` + +**Nota**: El procesamiento actual es CPU-bound. GPU acceleration es future work. + +## 🔨 Desarrollo + +### Tests +```bash +# Test detector con video existente +python3 detector_gpu.py --video video.mp4 --chat chat.json --output highlights.json +``` + +### Pipeline Manual +```bash +# 1. Descargar video +streamlink "https://www.twitch.tv/videos/ID" best -o video.mp4 + +# 2. Descargar chat +TwitchDownloaderCLI chatdownload --id ID -o chat.json + +# 3. Detectar highlights +python3 detector_gpu.py --video video.mp4 --chat chat.json --output highlights.json + +# 4. Generar video +python3 generate_video.py --video video.mp4 --highlights highlights.json --output final.mp4 +``` + +## 📊 Resultados + +Con un stream de 5.3 horas (19GB): +- Chat: ~13,000 mensajes +- Picos detectados: ~139 +- Highlights útiles (>5s): 4-10 +- Video final: ~1-5 minutos + +## 🤝 Contribuir + +1. Fork el repo +2. Crea una branch (`git checkout -b feature/`) +3. Commit tus cambios (`git commit -m 'Add feature'`) +4. Push a la branch (`git push origin feature/`) +5. Abre un Pull Request + +## 📝 Licencia + +MIT License - Ver LICENSE para más detalles. + +## 🙏 Créditos + +- [TwitchDownloader](https://github.com/lay295/TwitchDownloader) - Chat downloading +- [streamlink](https://streamlink.github.io/) - Video downloading +- [MoviePy](https://zulko.github.io/moviepy/) - Video processing +- [PyTorch](https://pytorch.org/) - GPU support diff --git a/analyser/__init__.py b/analyser/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/analyser/analyser.py b/analyser/analyser.py deleted file mode 100644 index e69de29..0000000 diff --git a/bajar b/bajar new file mode 100755 index 0000000..69f0a82 --- /dev/null +++ b/bajar @@ -0,0 +1,45 @@ +#!/bin/bash + +# Instalar dependencias si no existen +install_deps() { + echo "Verificando dependencias..." + + if ! command -v streamlink &> /dev/null; then + echo "Instalando streamlink..." + sudo pacman -S streamlink --noconfirm + fi + + if ! command -v ffmpeg &> /dev/null; then + echo "Instalando ffmpeg..." + sudo pacman -S ffmpeg --noconfirm + fi + + echo "Dependencias listas!" +} + +# Descargar video de Twitch +download() { + if [ -z "$1" ]; then + echo "Usage: bajar " + echo "Ejemplo: bajar https://www.twitch.tv/videos/2699641307" + return 1 + fi + + install_deps + + URL="$1" + OUTPUT_FILE="./$(date +%Y%m%d_%H%M%S)_twitch.mp4" + + echo "Descargando: $URL" + echo "Guardando en: $OUTPUT_FILE" + + streamlink "$URL" best -o "$OUTPUT_FILE" + + if [ $? -eq 0 ]; then + echo "¡Descarga completada! Archivo: $OUTPUT_FILE" + else + echo "Error en la descarga" + fi +} + +download "$@" diff --git a/clipper/__init__.py b/clipper/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/clipper/analyser.py b/clipper/analyser.py deleted file mode 100644 index 1fe12f2..0000000 --- a/clipper/analyser.py +++ /dev/null @@ -1,107 +0,0 @@ -import scipy -import numpy as np -import logging -import matplotlib.pyplot as plt - -from datetime import datetime - -logger = logging.getLogger(__name__) - - -class ChatAnalyser: - def __init__(self, ignore_commands=True, ignored_users=None): - if ignored_users is None: - ignored_users = ["moobot", "nightbot"] - - self.ignored_users = ignored_users - self.ignore_commands = ignore_commands - - def run(self, chat_file, peaks_output_file, peaks_output_chart, start_time): - dates = self._read_message_dates(chat_file) - messages_per_minute = self._group_dates(dates) - peaks = self._find_peeks(messages_per_minute, peaks_output_file, peaks_output_chart) - return peaks - - def _read_message_dates(self, chat_file): - dates = [] - - with open(chat_file, "r") as stream: - while True: - - line = stream.readline() - if not line: - break - - message_data = line.split("<~|~>") - if len(message_data) != 3: - # Wrong line format - continue - - if message_data[1].lower() in self.ignored_users: - continue - - if self.ignore_commands and message_data[2].startswith("!"): - continue - - date = message_data[0] - try: - dates.append(self._parse_date(date)) - except BaseException as e: - logger.error(e) - - return dates - - def _parse_date(self, date_str): - return datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S.%f") - - def _group_dates(self, dates): - groups = {} - for d in dates: - key = datetime.strftime(d, "%Y-%m-%d %H:%M") - if key in groups.keys(): - groups[key] = groups[key] + 1 - else: - groups[key] = 0 - - groups.values() - return groups - - def _find_peeks(self, messages_per_minute, peaks_output_file, peaks_output_chart): - y_coordinates = list(messages_per_minute.values()) - x_coordinates = list(messages_per_minute.keys()) - peak_indices = scipy.signal.find_peaks_cwt(np.array(y_coordinates), 0.5) - - fig, ax = plt.subplots() - ax.plot(range(0, len(y_coordinates), 1), y_coordinates) - plt.xlabel("Video Minutes") - plt.ylabel("Message count") - plt.title("Stream chat reaction") - plt.savefig(peaks_output_chart) - - start_time = None - if len(x_coordinates) > 0: - start_time = datetime.strptime(x_coordinates[0], "%Y-%m-%d %H:%M") - - max_value = max(y_coordinates) - trash_hold_value = max_value * 0.7 - filtered_values = [x_coordinates[index] for index in peak_indices if y_coordinates[index] > trash_hold_value] - with open(peaks_output_file, "w") as stream: - for peak in filtered_values: - if start_time: - peak_time = datetime.strptime(peak, "%Y-%m-%d %H:%M") - diff = peak_time - start_time - minutes = divmod(diff.total_seconds() / 60, 60) - stream.writelines(f"{peak} -> {minutes}\n") - else: - stream.writelines(f"{peak}\n") - - return peak_indices - - -if __name__ == "__main__": - anal = ChatAnalyser() - chat_file = "/Users/vetalll/Projects/Python/TwitchClipper/recorded/vovapain/17-08-2022_08-33-23/chat.txt" - out_file = "/Users/vetalll/Projects/Python/TwitchClipper/recorded/vovapain/17-08-2022_08-33-23/chat_peaks.txt" - out_hraph = "/Users/vetalll/Projects/Python/TwitchClipper/recorded/vovapain/17-08-2022_08-33-23/chat_chart.png" - - anal.run(chat_file, out_file, out_hraph, datetime(2022, 8, 15, 20, 38, 49)) diff --git a/clipper/api.py b/clipper/api.py deleted file mode 100644 index 6d3933a..0000000 --- a/clipper/api.py +++ /dev/null @@ -1,95 +0,0 @@ -import enum -import logging -import socket -import time - -from twitchAPI import Twitch, AuthScope - -logger = logging.getLogger(__name__) - -TW_CHAT_SERVER = 'irc.chat.twitch.tv' -TW_CHAT_PORT = 6667 - - -class TwitchStreamStatus(enum.Enum): - ONLINE = 0 - OFFLINE = 1 - NOT_FOUND = 2 - UNAUTHORIZED = 3 - ERROR = 4 - - -class TwitchApi: - _cached_token = None - - def __init__(self, client_id, client_secret): - self.client_id = client_id - self.client_secret = client_secret - self.twitch = Twitch(self.client_id, self.client_secret, target_app_auth_scope=[AuthScope.CHAT_READ]) - self.twitch.authenticate_app([AuthScope.CHAT_READ]) - - def get_user_status(self, streamer): - try: - streams = self.twitch.get_streams(user_login=streamer) - if streams is None or len(streams["data"]) < 1: - return TwitchStreamStatus.OFFLINE - else: - return TwitchStreamStatus.ONLINE - except: - return TwitchStreamStatus.ERROR - - def start_chat(self, streamer_name, on_message): - logger.info("Connecting to %s:%s", TW_CHAT_SERVER, TW_CHAT_PORT) - connection = ChatConnection(streamer_name, self, on_message) - - self.twitch.get_app_token() - connection.run() - - def get_user_chat_channel(self, streamer_name): - streams = self.twitch.get_streams(user_login=streamer_name) - if streams is None or len(streams["data"]) < 1: - return None - return streams["data"][0]["user_login"] - - -class ChatConnection: - logger = logging.getLogger(__name__) - - connection = None - - def __init__(self, streamer_name, api, on_message): - self.on_message = on_message - self.api = api - self.streamer_name = streamer_name - - def run(self): - # Need to verify channel name.. case sensitive - channel = self.api.get_user_chat_channel(self.streamer_name) - if not channel: - logger.error("Cannot find streamer channel, Offline?") - return - - self.connect_to_chat(f"#{channel}") - - def connect_to_chat(self, channel): - self.connection = socket.socket() - self.connection.connect((TW_CHAT_SERVER, TW_CHAT_PORT)) - # public data to join hat - self.connection.send(f"PASS couldBeRandomString\r\n".encode("utf-8")) - self.connection.send(f"NICK justinfan113\r\n".encode("utf-8")) - self.connection.send(f"JOIN {channel}\r\n".encode("utf-8")) - - logger.info("Connected to %s", channel) - - try: - while True: - msg = self.connection.recv(2048).decode('utf-8') - if "PING :tmi.twitch.tv" in msg: - self.connection.send(bytes("PONG :tmi.twitch.tv\r\n", "UTF-8")) - logger.info("RECEIVED Ping from server. Answered") - continue - if self.on_message: - self.on_message(msg) - except BaseException as e: - logger.error(e) - logger.error("Error happened during reading chat") diff --git a/clipper/chat.py b/clipper/chat.py deleted file mode 100644 index 73ac84d..0000000 --- a/clipper/chat.py +++ /dev/null @@ -1,53 +0,0 @@ -import logging -from datetime import datetime -import multiprocessing - -logger = logging.getLogger(__name__) - -CHAT_DIVIDER = "<~|~>" - - -class TwitchChatRecorder: - chat_process = None - - def __init__(self, api, debug=False): - self.debug = debug - self.api = api - - def run(self, streamer_name, output_file): - self.chat_process = multiprocessing.Process(target=self._record_chat, args=(streamer_name, output_file)) - self.chat_process.start() - - def stop(self): - try: - if self.chat_process: - self.chat_process.terminate() - - self.chat_process = None - logger.info("Chat stopped") - except BaseException as e: - logger.error("Unable to stop chat") - logger.error(e) - - def is_running(self): - return self.chat_process is not None and self.chat_process.is_alive() - - def _record_chat(self, streamer_name, output_file): - with open(output_file, "w") as stream: - def on_message(twitch_msg): - user, msg = self.parse_msg(twitch_msg) - if msg: - msg_line = f"{str(datetime.now())}{CHAT_DIVIDER}{user}{CHAT_DIVIDER}{msg}" - stream.write(msg_line) - stream.flush() - - if self.debug: - logger.info("Chat: %s", msg_line) - - self.api.start_chat(streamer_name, on_message) - - def parse_msg(self, msg): - try: - return msg[1:].split('!')[0], msg.split(":", 2)[2] - except BaseException as e: - return None, None diff --git a/clipper/clipper.py b/clipper/clipper.py deleted file mode 100644 index b9a00cb..0000000 --- a/clipper/clipper.py +++ /dev/null @@ -1,86 +0,0 @@ -import logging -import os -import subprocess -import sys -from datetime import datetime -from datetime import timedelta - -logger = logging.getLogger(__name__) - - -class Clipper: - def run(self, video_file, chat_peaks_file, output_folder): - try: - self._run(video_file, chat_peaks_file, output_folder) - except BaseException as e: - logger.error(e) - - def _run(self, source_video_file, chat_peaks_file, output_folder): - if not os.path.isdir(output_folder): - os.mkdir(output_folder) - - with open(chat_peaks_file, "r") as stream: - lines = stream.readlines() - - if not lines: - logger.error("No peaks found") - return - - counter = 1 - for line in lines: - # l = "2022-08-17 10:15 -> (1.0, 42.0)" - time_part = line.split("->")[1].strip() # (1.0, 42.0) - time = time_part.replace("(", "").replace(")", "").split(",") - video_time = datetime(2000, 1, 1, int(float(time[0])), int(float(time[1])), 0, 0) - start_time = video_time - timedelta(minutes=1) - end_time = video_time + timedelta(minutes=1) - - ffmpeg_start_time = start_time.strftime("%H:%M:00") - ffmpeg_end_time = end_time.strftime("%H:%M:00") - ffmpeg_output_file = os.path.join(output_folder, f"clip_{counter}.mp4") - logger.info("ffmpeg start time %s", ffmpeg_start_time) - logger.info("ffmpeg end time %s", ffmpeg_end_time) - logger.info("ffmpeg output file %s", ffmpeg_output_file) - self._cut_clip(source_video_file, ffmpeg_start_time, ffmpeg_end_time, ffmpeg_output_file) - counter = counter + 1 - - def _cut_clip(self, source_video_file, start_time, end_time, output_name): - # ffmpeg -ss 00:01:00 -to 00:02:00 -i input.mp4 -c copy output.mp4 - try: - subprocess.call([ - "ffmpeg", - "-i", - source_video_file, - "-ss", - start_time, - "-to", - end_time, - "-c", - "copy", - "-err_detect", - "ignore_err", - output_name - ]) - - except BaseException as e: - logger.error("Unable to run streamlink") - logger.error(e) - - -if __name__ == "__main__": - logging.basicConfig(stream=sys.stdout, level=logging.INFO) - args = sys.argv - if len(args) != 4: - logger.error("Wrong arguments passed") - logger.error("Usage clipper.py video_file chat_peaks_file output_folder") - exit(1) - - video = args[1] - peaks = args[2] - result = args[3] - # "/Users/vetalll/Projects/Python/TwitchClipper/recorded/" - # video = "/Users/vetalll/Projects/Python/TwitchClipper/recorded/icebergdoto/17-08-2022_14-29-53/video.mp4" - # peaks = "/Users/vetalll/Projects/Python/TwitchClipper/recorded/icebergdoto/17-08-2022_14-29-53/chat_peaks.txt" - # result = "/Users/vetalll/Projects/Python/TwitchClipper/recorded/icebergdoto/17-08-2022_14-29-53/clips" - clipper = Clipper() - clipper.run(video, peaks, result) diff --git a/clipper/recorder.py b/clipper/recorder.py deleted file mode 100644 index d935634..0000000 --- a/clipper/recorder.py +++ /dev/null @@ -1,88 +0,0 @@ -import logging -import os -import time -import sys -from datetime import datetime - -from clipper.analyser import ChatAnalyser -from clipper.api import TwitchApi, TwitchStreamStatus -from clipper.chat import TwitchChatRecorder -from clipper.clipper import Clipper -from clipper.video import TwitchVideoRecorder - -logger = logging.getLogger(__name__) - - -class RecorderConfig: - def __init__(self, tw_client, tw_secret, tw_streamer, tw_quality, output_folder): - self.output_folder = output_folder - self.tw_quality = tw_quality - self.tw_streamer = tw_streamer - self.tw_secret = tw_secret - self.tw_client = tw_client - - -class Recorder: - def __init__(self, config): - self.config = config - self.api = TwitchApi(config.tw_client, config.tw_secret) - self.streamer_folder = os.path.join(self.config.output_folder, self.config.tw_streamer) - self.video_recorder = TwitchVideoRecorder() - self.chat_recorder = TwitchChatRecorder(self.api, debug=True) - self.chat_analyser = ChatAnalyser() - self.clipper = Clipper() - - def run(self): - logger.info("Start recording streamer %s", self.config.tw_streamer) - - while True: - status = self.api.get_user_status(self.config.tw_streamer) - if status == TwitchStreamStatus.ONLINE: - logger.info("Streamer %s is online. Start recording", self.config.tw_streamer) - - start_time = datetime.now() - record_folder_name = start_time.strftime("%d-%m-%Y_%H-%M-%S") - record_folder = os.path.join(self.streamer_folder, record_folder_name) - os.makedirs(record_folder) - - output_video_file = os.path.join(record_folder, "video.mp4") - output_chat_file = os.path.join(record_folder, "chat.txt") - - self.chat_recorder.run(self.config.tw_streamer, output_chat_file) - self.video_recorder.run(self.config.tw_streamer, output_video_file, quality="160p") - self._loop_recording() - self._post_process_video(record_folder, output_chat_file, output_video_file, start_time) - - elif status == TwitchStreamStatus.OFFLINE: - logger.info("Streamer %s is offline. Waiting for 300 sec", self.config.tw_streamer) - time.sleep(300) - - if status == TwitchStreamStatus.ERROR: - logger.critical("Error occurred %s. Exit", self.config.tw_streamer) - sys.exit(1) - - elif status == TwitchStreamStatus.NOT_FOUND: - logger.critical(f"Streamer %s not found, invalid streamer_name or typo", self.config.tw_streamer) - sys.exit(1) - - def _loop_recording(self): - while True: - if self.video_recorder.is_running() or self.chat_recorder.is_running(): - if not (self.video_recorder.is_running() and self.chat_recorder.is_running()): - self.video_recorder.stop() - self.chat_recorder.stop() - break - logger.info("Recording in progress. Wait 1m") - time.sleep(60) - continue - break - - def _post_process_video(self, record_folder, output_chat_file, output_video_file, start_time): - output_chat_peaks_file = os.path.join(record_folder, "chat_peaks.txt") - output_chat_chart_file = os.path.join(record_folder, "chat_chart.png") - - logger.info("Start looking for peaks in file %s", output_chat_file) - peaks = self.chat_analyser.run(output_chat_file, output_chat_peaks_file, output_chat_chart_file, start_time) - logger.info("Found peaks: %s for file %s", len(peaks), output_chat_file) - - self.clipper.run(output_video_file, output_chat_peaks_file, record_folder) diff --git a/clipper/video.py b/clipper/video.py deleted file mode 100644 index 522e113..0000000 --- a/clipper/video.py +++ /dev/null @@ -1,41 +0,0 @@ -import logging -import subprocess - -logger = logging.getLogger(__name__) - - -class TwitchVideoRecorder: - refresh_timeout = 15 - streamlink_process = None - - def run(self, streamer_name, output_file, quality="360p"): - self._record_stream(streamer_name, output_file, quality) - - def stop(self): - try: - if self.streamlink_process: - self.streamlink_process.terminate() - - self.streamlink_process = None - logger.info("Video stopped") - except BaseException as e: - logger.error("Unable to stop video") - logger.error(e) - - def is_running(self) -> bool: - return self.streamlink_process is not None and self.streamlink_process.poll() is None - - def _record_stream(self, streamer_name, output_file, quality): - try: - self.streamlink_process = subprocess.Popen([ - "streamlink", - "--twitch-disable-ads", - "twitch.tv/" + streamer_name, - quality, - "-o", - output_file - ]) - - except BaseException as e: - logger.error("Unable to run streamlink") - logger.error(e) diff --git a/contexto.md b/contexto.md new file mode 100644 index 0000000..fed8b6e --- /dev/null +++ b/contexto.md @@ -0,0 +1,221 @@ +# Contexto del Proyecto + +## Resumen Ejecutivo + +Pipeline automatizado para detectar y generar highlights de streams de Twitch. El objetivo es: +1. Descargar un VOD completo de Twitch (varias horas) +2. Analizar el chat y el video para detectar momentos destacados +3. Generar un video resumen con los mejores momentos + +El sistema original planeaba usar 3 métricas para detectar highlights (2 de 3 deben cumplirse): +- Chat saturado (muchos mensajes en poco tiempo) +- Picos de audio (gritos del streamer) +- Colores brillantes en pantalla (efectos visuales) + +**Estado actual:** Solo chat implementado. Audio y color pendientes. + +--- + +## Historia y Desarrollo + +### Inicio +El proyecto comenzó con la carpeta `clipper/` que contenía código antiguo para descargar streams de Twitch en vivo. El usuario quería actualizar el enfoque para procesar VODs completos (streams de varias horas) y detectar automáticamente los mejores momentos. + +### Primera Iteración (Código Viejo) +Existía código en `clipper/` y `analyser/` que: +- Descargaba streams en vivo +- Usaba `twitchAPI` para autenticación +- Tenía issues con versiones de dependencias (Python 3.14 incompatibilidades) + +### Limpieza y Nuevo Pipeline +Se eliminó el código viejo y se creó una estructura nueva: +``` +downloaders/ - Módulos para descargar video/chat +detector/ - Lógica de detección de highlights +generator/ - Creación del video resumen +``` + +### Problemas Encontrados + +#### 1. Chat Downloader - Múltiples Intentos +Se probaron varios repositorios para descargar chat de VODs: + +- **chat-downloader (xenova)**: No funcionó con VODs (KeyError 'data') +- **tcd (PetterKraabol)**: Mismo problema, API de Twitch devuelve 404 +- **TwitchDownloader (lay295)**: Este sí funcionó. Es un proyecto C#/.NET con CLI. + +**Solución:** Compilar TwitchDownloaderCLI desde código fuente usando .NET 10 SDK. + +#### 2. Dependencias Python +Problemas de versiones: +- `requests` y `urllib3` entraron en conflicto al instalar `tcd` +- Streamlink dejó de funcionar +- **Solución:** Reinstalar versiones correctas de requests/urllib3 + +#### 3. Video de Prueba +- VOD: `https://www.twitch.tv/videos/2701190361` (elxokas) +- Duración: ~5.3 horas (19GB) +- Chat: 12,942 mensajes +- El chat estaba disponible (no había sido eliminado por Twitch) + +#### 4. Detección de Highlights +Problemas con el detector: +- Formato de timestamp del chat no era reconocido +- **Solución:** Usar `content_offset_seconds` del JSON directamente + +El detector actual solo usa chat saturado. Encuentra ~139 picos pero la mayoría son de 1-2 segundos (no útiles). Con filtro de duración >5s quedan solo 4 highlights. + +#### 5. Generación de Video +- Usa moviepy +- Funciona correctamente +- Genera video de ~39MB (~1 minuto) + +--- + +## Stack Tecnológico + +### Herramientas de Descarga +| Herramienta | Uso | Estado | +|-------------|-----|--------| +| streamlink | Video streaming | ✅ Funciona | +| TwitchDownloaderCLI | Chat VODs | ✅ Compilado y funciona | + +### Processing (Python) +| Paquete | Uso | GPU Support | +|---------|-----|-------------| +| opencv-python-headless | Análisis de video/color | CPU (sin ROCm) | +| librosa | Análisis de audio | CPU | +| scipy/numpy | Procesamiento numérico | CPU | +| moviepy | Generación de video | CPU | + +### GPU +- **ROCm 7.1** instalado y funcionando +- **PyTorch 2.10.0** instalado con soporte ROCm +- GPU detectada: AMD Radeon Graphics (6800XT) +- **Pendiente:** hacer que OpenCV/librosa usen GPU + +--- + +## Hardware + +- **GPU Principal:** AMD Radeon 6800XT (16GB VRAM) con ROCm 7.1 +- **GPU Alternativa:** NVIDIA RTX 3050 (8GB VRAM) - no configurada +- **CPU:** AMD Ryzen (12 cores) +- **RAM:** 32GB +- **Almacenamiento:** SSDNVMe + +--- + +## Credenciales + +- **Twitch Client ID:** `xk9gnw0wszfcwn3qq47a76wxvlz8oq` +- **Twitch Client Secret:** `51v7mkkd86u9urwadue8410hheu754` + +--- + +## Pipeline Actual (Manual) + +```bash +# 1. Descargar video +bajar "https://www.twitch.tv/videos/2701190361" + +# 2. Descargar chat (después de compilar TwitchDownloaderCLI) +TwitchDownloaderCLI chatdownload --id 2701190361 -o chat.json + +# 3. Convertir chat a texto +python3 -c " +import json +with open('chat.json') as f: + data = json.load(f) +with open('chat.txt', 'w') as f: + for c in data['comments']: + f.write(f\"[{c['created_at']}] {c['commenter']['name']}: {c['message']['body']}\n\") +" + +# 4. Detectar highlights +python3 detector.py + +# 5. Generar video +python3 generate_video.py +``` + +--- + +## Resultados Obtenidos + +| Métrica | Valor | +|---------|-------| +| Video original | 19GB (5.3 horas) | +| Mensajes de chat | 12,942 | +| Picos detectados | 139 | +| Highlights útiles (>5s) | 4 | +| Video final | 39MB (~1 minuto) | + +### Highlights Encontrados +1. ~4666s - ~4682s (16s) +2. ~4800s - ~4813s (13s) +3. ~8862s - ~8867s (5s) +4. ~11846s - ~11856s (10s) + +--- + +## Pendientes (TODO) + +### Alta Prioridad +1. **Sistema 2 de 3**: Implementar análisis de audio y color +2. **GPU**: Hacer que OpenCV/librosa usen la 6800XT +3. **Mejor detección**: Keywords, sentimiento, ranking +4. **Kick**: Soporte para chat (sin API pública) + +### Media Prioridad +5. Paralelización +6. Interfaz web (Streamlit) +7. CLI mejorada + +### Baja Prioridad +8. STT (reconocimiento de voz) +9. Detectar cuando streamer muestra algo en pantalla +10. Múltiples streamers + +--- + +## Archivos del Proyecto + +``` +Twitch-Highlight-Detector/ +├── .env # Credenciales Twitch +├── .git/ # Git repo +├── .gitignore +├── requirements.txt # Dependencias Python +├── lower # Script: descargar streams +├── pipeline.sh # Pipeline automatizado +├── detector.py # Detección de highlights (chat) +├── generate_video.py # Generación de video resumen +├── highlight.md # Docs: uso del pipeline +├── contexto.md # Este archivo +├── todo.md # Lista de tareas pendientes +│ +├── chat.json # Chat descargado (TwitchDownloader) +├── chat.txt # Chat en formato texto +├── highlights.json # Timestamps de highlights +├── highlights.mp4 # Video final +└── 20260218_193846_twitch.mp4 # Video original de prueba +``` + +--- + +## Notas Importantes + +1. **Twitch elimina el chat** de VODs después de un tiempo (no hay tiempo exacto definido) +2. **El threshold actual** es muy sensible - detecta muchos falsos positivos de 1-2 segundos +3. **El video de prueba** es de elxokas, un streamer español de League of Legends +4. **PyTorch con ROCm** está instalado pero no se está usando todavía en el código + +--- + +## Links Relevantes + +- TwitchDownloader: https://github.com/lay295/TwitchDownloader +- streamlink: https://streamlink.github.io/ +- PyTorch ROCm: https://pytorch.org/ +- ROCm: https://rocm.docs.amd.com/ diff --git a/detector.py b/detector.py new file mode 100644 index 0000000..55481ec --- /dev/null +++ b/detector.py @@ -0,0 +1,95 @@ +import sys +import re +import json +import logging +import numpy as np +from datetime import datetime + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +def detect_highlights(chat_file, min_duration=10, threshold=2.0): + """Detecta highlights por chat saturado""" + + logger.info("Analizando picos de chat...") + + # Leer mensajes + messages = [] + with open(chat_file, 'r', encoding='utf-8') as f: + for line in f: + match = re.match(r'\[(\d{4}-\d{2}-\d{2}T[\d:.]+Z?)\]', line) + if match: + timestamp_str = match.group(1).replace('Z', '+00:00') + try: + timestamp = datetime.fromisoformat(timestamp_str) + messages.append((timestamp, line)) + except: + pass + + if not messages: + logger.error("No se encontraron mensajes") + return [] + + start_time = messages[0][0] + end_time = messages[-1][0] + duration = (end_time - start_time).total_seconds() + + logger.info(f"Chat: {len(messages)} mensajes, duración: {duration:.1f}s") + + # Agrupar por segundo + time_buckets = {} + for timestamp, _ in messages: + second = int((timestamp - start_time).total_seconds()) + time_buckets[second] = time_buckets.get(second, 0) + 1 + + # Calcular estadísticas + counts = list(time_buckets.values()) + mean_count = np.mean(counts) + std_count = np.std(counts) + + logger.info(f"Stats: media={mean_count:.1f}, std={std_count:.1f}") + + # Detectar picos + peak_seconds = [] + for second, count in time_buckets.items(): + if std_count > 0: + z_score = (count - mean_count) / std_count + if z_score > threshold: + peak_seconds.append(second) + + logger.info(f"Picos encontrados: {len(peak_seconds)}") + + # Unir segundos consecutivos + if not peak_seconds: + return [] + + intervals = [] + start = peak_seconds[0] + prev = peak_seconds[0] + + for second in peak_seconds[1:]: + if second - prev > 1: + if second - start >= min_duration: + intervals.append((start, prev)) + start = second + prev = second + + if prev - start >= min_duration: + intervals.append((start, prev)) + + return intervals + + +if __name__ == "__main__": + chat_file = "chat.txt" + + highlights = detect_highlights(chat_file) + + print(f"\nHighlights encontrados: {len(highlights)}") + for i, (start, end) in enumerate(highlights): + print(f" {i+1}. {start}s - {end}s (duración: {end-start}s)") + + # Guardar JSON + with open("highlights.json", "w") as f: + json.dump(highlights, f) + print(f"\nGuardado en highlights.json") diff --git a/detector_gpu.py b/detector_gpu.py new file mode 100644 index 0000000..3501750 --- /dev/null +++ b/detector_gpu.py @@ -0,0 +1,283 @@ +import sys +import json +import logging +import subprocess +import torch +import numpy as np +from pathlib import Path + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +def get_device(): + """Obtiene el dispositivo (GPU o CPU)""" + if torch.cuda.is_available(): + return torch.device("cuda") + return torch.device("cpu") + +def extract_audio_gpu(video_file, output_wav="audio.wav"): + """Extrae audio usando ffmpeg""" + logger.info(f"Extrayendo audio de {video_file}...") + subprocess.run([ + "ffmpeg", "-i", video_file, + "-vn", "-acodec", "pcm_s16le", + "-ar", "16000", "-ac", "1", output_wav, "-y" + ], capture_output=True) + return output_wav + +def detect_audio_peaks_gpu(audio_file, threshold=1.5, window_seconds=5, device="cpu"): + """ + Detecta picos de audio usando PyTorch para procesamiento + """ + logger.info("Analizando picos de audio con GPU...") + + # Cargar audio con scipy + import scipy.io.wavfile as wavfile + sr, waveform = wavfile.read(audio_file) + + # Convertir a float + waveform = waveform.astype(np.float32) / 32768.0 + + # Calcular RMS por ventana usando numpy + frame_length = sr * window_seconds + hop_length = sr # 1 segundo entre ventanas + + energies = [] + for i in range(0, len(waveform) - frame_length, hop_length): + chunk = waveform[i:i + frame_length] + energy = np.sqrt(np.mean(chunk ** 2)) + energies.append(energy) + + energies = np.array(energies) + + # Detectar picos + mean_e = np.mean(energies) + std_e = np.std(energies) + + logger.info(f"Audio stats: media={mean_e:.4f}, std={std_e:.4f}") + + audio_scores = {} + for i, energy in enumerate(energies): + if std_e > 0: + z_score = (energy - mean_e) / std_e + if z_score > threshold: + audio_scores[i] = z_score + + logger.info(f"Picos de audio detectados: {len(audio_scores)}") + return audio_scores + +def detect_video_peaks_fast(video_file, threshold=1.5, window_seconds=5): + """ + Detecta cambios de color/brillo (versión rápida, sin frames) + """ + logger.info("Analizando picos de color...") + + # Usar ffmpeg para obtener información de brillo por segundo + # Esto es mucho más rápido que procesar frames + result = subprocess.run([ + "ffprobe", "-v", "error", "-select_streams", "v:0", + "-show_entries", "stream=width,height,r_frame_rate,duration", + "-of", "csv=p=0", video_file + ], capture_output=True, text=True) + + # Extraer frames de referencia en baja resolución + video_360 = video_file.replace('.mp4', '_temp_360.mp4') + + # Convertir a 360p para procesamiento rápido + logger.info("Convirtiendo a 360p para análisis...") + subprocess.run([ + "ffmpeg", "-i", video_file, + "-vf", "scale=-2:360", + "-c:v", "libx264", "-preset", "fast", + "-crf", "28", + "-c:a", "copy", + video_360, "-y" + ], capture_output=True) + + # Extraer un frame cada N segundos + frames_dir = Path("frames_temp") + frames_dir.mkdir(exist_ok=True) + + subprocess.run([ + "ffmpeg", "-i", video_360, + "-vf", f"fps=1/{window_seconds}", + f"{frames_dir}/frame_%04d.png", "-y" + ], capture_output=True) + + # Procesar frames con PIL + from PIL import Image + import cv2 + + frame_files = sorted(frames_dir.glob("frame_*.png")) + + if not frame_files: + logger.warning("No se pudieron extraer frames") + return {} + + logger.info(f"Procesando {len(frame_files)} frames...") + + brightness_scores = [] + for frame_file in frame_files: + img = cv2.imread(str(frame_file)) + hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) + + # Brillo = Value en HSV + brightness = hsv[:,:,2].mean() + # Saturación + saturation = hsv[:,:,1].mean() + + # Score combinado + score = (brightness / 255) + (saturation / 255) * 0.5 + brightness_scores.append(score) + + brightness_scores = np.array(brightness_scores) + + # Detectar picos + mean_b = np.mean(brightness_scores) + std_b = np.std(brightness_scores) + + logger.info(f"Brillo stats: media={mean_b:.3f}, std={std_b:.3f}") + + color_scores = {} + for i, score in enumerate(brightness_scores): + if std_b > 0: + z_score = (score - mean_b) / std_b + if z_score > threshold: + color_scores[i * window_seconds] = z_score + + # Limpiar + subprocess.run(["rm", "-rf", str(frames_dir)]) + subprocess.run(["rm", "-f", video_360], capture_output=True) + + logger.info(f"Picos de color detectados: {len(color_scores)}") + return color_scores + +def main(): + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--video", required=True, help="Video file") + parser.add_argument("--chat", required=True, help="Chat JSON file") + parser.add_argument("--output", default="highlights.json", help="Output JSON") + parser.add_argument("--threshold", type=float, default=1.5, help="Threshold for peaks") + parser.add_argument("--min-duration", type=int, default=10, help="Min highlight duration") + parser.add_argument("--device", default="auto", help="Device: auto, cuda, cpu") + args = parser.parse_args() + + # Determinar device + if args.device == "auto": + device = get_device() + else: + device = torch.device(args.device) + + logger.info(f"Usando device: {device}") + + # Cargar chat + logger.info("Cargando chat...") + with open(args.chat, 'r') as f: + chat_data = json.load(f) + + # Extraer timestamps del chat + chat_times = {} + for comment in chat_data['comments']: + second = int(comment['content_offset_seconds']) + chat_times[second] = chat_times.get(second, 0) + 1 + + # Detectar picos de chat + chat_values = list(chat_times.values()) + mean_c = np.mean(chat_values) + std_c = np.std(chat_values) + + logger.info(f"Chat stats: media={mean_c:.1f}, std={std_c:.1f}") + + chat_scores = {} + max_chat = max(chat_values) if chat_values else 1 + for second, count in chat_times.items(): + if std_c > 0: + z_score = (count - mean_c) / std_c + if z_score > args.threshold: + chat_scores[second] = z_score + + logger.info(f"Picos de chat: {len(chat_scores)}") + + # Extraer y analizar audio + audio_file = "temp_audio.wav" + extract_audio_gpu(args.video, audio_file) + audio_scores = detect_audio_peaks_gpu(audio_file, args.threshold, device=str(device)) + + # Limpiar audio temporal + Path(audio_file).unlink(missing_ok=True) + + # Analizar video + video_scores = detect_video_peaks_fast(args.video, args.threshold) + + # Combinar scores (2 de 3) + logger.info("Combinando scores (2 de 3)...") + + # Obtener duración total + result = subprocess.run( + ["ffprobe", "-v", "error", "-show_entries", "format=duration", + "-of", "default=noprint_wrokey=1:nokey=1", args.video], + capture_output=True, text=True + ) + duration = int(float(result.stdout.strip())) if result.stdout.strip() else 3600 + + # Normalizar scores + max_audio = max(audio_scores.values()) if audio_scores else 1 + max_video = max(video_scores.values()) if video_scores else 1 + max_chat_norm = max(chat_scores.values()) if chat_scores else 1 + + # Unir segundos consecutivos + highlights = [] + for second in range(duration): + points = 0 + + # Chat + chat_point = chat_scores.get(second, 0) / max_chat_norm if max_chat_norm > 0 else 0 + if chat_point > 0.5: + points += 1 + + # Audio + audio_point = audio_scores.get(second, 0) / max_audio if max_audio > 0 else 0 + if audio_point > 0.5: + points += 1 + + # Color + video_point = video_scores.get(second, 0) / max_video if max_video > 0 else 0 + if video_point > 0.5: + points += 1 + + if points >= 2: + highlights.append(second) + + # Crear intervalos + intervals = [] + if highlights: + start = highlights[0] + prev = highlights[0] + + for second in highlights[1:]: + if second - prev > 1: + if second - start >= args.min_duration: + intervals.append((start, prev)) + start = second + prev = second + + if prev - start >= args.min_duration: + intervals.append((start, prev)) + + logger.info(f"Highlights encontrados: {len(intervals)}") + + # Guardar + with open(args.output, 'w') as f: + json.dump(intervals, f) + + logger.info(f"Guardado en {args.output}") + + # Imprimir resumen + print(f"\nHighlights ({len(intervals)} total):") + for i, (s, e) in enumerate(intervals[:10]): + print(f" {i+1}. {s}s - {e}s (duración: {e-s}s)") + + +if __name__ == "__main__": + main() diff --git a/generate_video.py b/generate_video.py new file mode 100644 index 0000000..c3a0b9f --- /dev/null +++ b/generate_video.py @@ -0,0 +1,63 @@ +import json +import argparse +from moviepy.editor import VideoFileClip, concatenate_videoclips +import logging + +logging.basicConfig(level=logging.INFO) + +def create_summary(video_file, highlights_file, output_file, padding=5): + """Crea video resumen con los highlights""" + + # Cargar highlights + with open(highlights_file, 'r') as f: + highlights = json.load(f) + + if not highlights: + print("No hay highlights") + return + + # Filtrar highlights con duración mínima + highlights = [(s, e) for s, e in highlights if e - s >= 5] + + print(f"Creando video con {len(highlights)} highlights...") + + clip = VideoFileClip(video_file) + duration = clip.duration + + highlight_clips = [] + for start, end in highlights: + start_pad = max(0, start - padding) + end_pad = min(duration, end + padding) + + highlight_clip = clip.subclip(start_pad, end_pad) + highlight_clips.append(highlight_clip) + print(f" Clip: {start_pad:.1f}s - {end_pad:.1f}s (duración: {end_pad-start_pad:.1f}s)") + + if not highlight_clips: + print("No se pudo crear ningún clip") + return + + print(f"Exportando video ({len(highlight_clips)} clips, {sum(c.duration for c in highlight_clips):.1f}s total)...") + + final_clip = concatenate_videoclips(highlight_clips, method="compose") + + final_clip.write_videofile( + output_file, + codec='libx264', + audio_codec='aac', + fps=24, + verbose=False, + logger=None + ) + + print(f"¡Listo! Video guardado en: {output_file}") + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--video", required=True, help="Video file") + parser.add_argument("--highlights", required=True, help="Highlights JSON") + parser.add_argument("--output", required=True, help="Output video") + parser.add_argument("--padding", type=int, default=5, help="Padding seconds") + args = parser.parse_args() + + create_summary(args.video, args.highlights, args.output, args.padding) diff --git a/highlight.md b/highlight.md new file mode 100644 index 0000000..db84029 --- /dev/null +++ b/highlight.md @@ -0,0 +1,106 @@ +# Highlight Detector Pipeline + +Pipeline completo para detectar y generar highlights de streams de Twitch/Kick. + +## Requisitos + +```bash +# Instalar dependencias del sistema +sudo pacman -S ffmpeg streamlink dotnet-sdk --noconfirm + +# Instalar dependencias de Python +pip install --break-system-packages moviepy opencv-python-headless scipy numpy python-dotenv + +# Instalar TwitchDownloaderCLI (ya incluido en /usr/local/bin) +``` + +## Uso + +### 1. Descargar Stream + +```bash +# Usar streamlink (incluye video + audio) +bajar "https://www.twitch.tv/videos/2701190361" + +# O manualmente con streamlink +streamlink "https://www.twitch.tv/videos/2701190361" best -o video.mp4 +``` + +### 2. Descargar Chat + +```bash +# Usar TwitchDownloaderCLI +TwitchDownloaderCLI chatdownload --id 2701190361 -o chat.json +``` + +### 3. Detectar Highlights + +```bash +# Convertir chat a texto y detectar highlights +python3 detector.py + +# Esto genera: +# - chat.txt (chat en formato texto) +# - highlights.json (timestamps de highlights) +``` + +### 4. Generar Video Resumen + +```bash +python3 generate_video.py + +# Esto genera: +# - highlights.mp4 (video con los mejores momentos) +``` + +## Automatizado (Un solo comando) + +```bash +# Downloader + Chat + Detect + Generate +./pipeline.sh +``` + +## Parámetros Ajustables + +En `detector.py`: +- `min_duration`: Duración mínima del highlight (default: 10s) +- `threshold`: Umbral de detección (default: 2.0 desviaciones estándar) + +En `generate_video.py`: +- `padding`: Segundos adicionales antes/después del highlight (default: 5s) + +## GPU vs CPU + +**El pipeline actual es 100% CPU.** + +Para mejor rendimiento: +- **MoviePy**: Usa CPU (puede usar GPU con ffmpeg) +- **Análisis de video**: CPU con OpenCV +- **Audio**: CPU con librosa + +Para hacer GPU-dependiente: +- Usar `PyTorch`/`TensorFlow` para detección +- Usar GPU de la GPU para renderizado con ffmpeg + +## Estructura de Archivos + +``` +Twitch-Highlight-Detector/ +├── .env # Credenciales +├── main.py # Entry point +├── requirements.txt +├── bajar # Script para descargar streams +├── detector.py # Detección de highlights +├── generate_video.py # Generación de video +├── pipeline.sh # Pipeline automatizado +├── chat.json # Chat descargado +├── chat.txt # Chat en formato texto +├── highlights.json # Timestamps de highlights +└── highlights.mp4 # Video final +``` + +## Notas + +- El chat de VODs antiguos puede no estar disponible (Twitch lo elimina) +- El threshold bajo detecta más highlights (puede ser ruido) +- Duraciones muy cortas pueden no ser highlights reales diff --git a/main.py b/main.py deleted file mode 100644 index 82b78af..0000000 --- a/main.py +++ /dev/null @@ -1,37 +0,0 @@ -import argparse -import os -import sys -import logging - -from clipper import recorder - - -def parse_arguments(): - parser = argparse.ArgumentParser(description='Twitch highlighter') - parser.add_argument('--client', "-c", help='Twitch client id', required=True, dest="tw_client") - parser.add_argument('--secret', "-s", help='Twitch secret id', required=True, dest="tw_secret") - parser.add_argument('--streamer', "-t", help='Twitch streamer username', required=True, dest="tw_streamer") - parser.add_argument('--quality', "-q", help='Video downloading quality', dest="tw_quality", default="360p") - parser.add_argument('--output_path', "-o", help='Video download folder', dest="output_path", - default=os.path.join(os.getcwd(), "recorded")) - - return parser.parse_args() - - -def on_video_recorded(streamer, filename): - pass - - -def on_chat_recorded(streamer, filename): - pass - - -if __name__ == "__main__": - # TODO configure logging - logging.basicConfig(stream=sys.stdout, level=logging.INFO) - args = parse_arguments() - - config = recorder.RecorderConfig(args.tw_client, args.tw_secret, args.tw_streamer, args.tw_quality, - args.output_path) - recorder = recorder.Recorder(config) - recorder.run() diff --git a/pipeline.sh b/pipeline.sh new file mode 100755 index 0000000..52b27d6 --- /dev/null +++ b/pipeline.sh @@ -0,0 +1,109 @@ +#!/bin/bash + +# Highlight Detector Pipeline con Modo Draft +# Uso: ./pipeline.sh [output_name] [--draft | --hd] + +set -e + +# Parsear argumentos +DRAFT_MODE=false +VIDEO_ID="" +OUTPUT_NAME="highlights" + +while [[ $# -gt 0 ]]; do + case $1 in + --draft) + DRAFT_MODE=true + shift + ;; + --hd) + DRAFT_MODE=false + shift + ;; + *) + if [[ -z "$VIDEO_ID" ]]; then + VIDEO_ID="$1" + else + OUTPUT_NAME="$1" + fi + shift + ;; + esac +done + +if [ -z "$VIDEO_ID" ]; then + echo "Uso: $0 [output_name] [--draft | --hd]" + echo "" + echo "Modos:" + echo " --draft Modo prueba rápida (360p, menos procesamiento)" + echo " --hd Modo alta calidad (1080p, por defecto)" + echo "" + echo "Ejemplo:" + echo " $0 2701190361 elxokas --draft # Prueba rápida" + echo " $0 2701190361 elxokas --hd # Alta calidad" + exit 1 +fi + +echo "============================================" +echo " HIGHLIGHT DETECTOR PIPELINE" +echo "============================================" +echo "Video ID: $VIDEO_ID" +echo "Output: $OUTPUT_NAME" +echo "Modo: $([ "$DRAFT_MODE" = true ] && echo "DRAFT (360p)" || echo "HD (1080p)")" +echo "" + +# Determinar calidad +if [ "$DRAFT_MODE" = true ]; then + QUALITY="360p" + VIDEO_FILE="${OUTPUT_NAME}_draft.mp4" +else + QUALITY="best" + VIDEO_FILE="${OUTPUT_NAME}.mp4" +fi + +# 1. Descargar video +echo "[1/5] Descargando video ($QUALITY)..." +if [ ! -f "$VIDEO_FILE" ]; then + streamlink "https://www.twitch.tv/videos/${VIDEO_ID}" "$QUALITY" -o "$VIDEO_FILE" +else + echo "Video ya existe: $VIDEO_FILE" +fi + +# 2. Descargar chat +echo "[2/5] Descargando chat..." +if [ ! -f "${OUTPUT_NAME}_chat.json" ]; then + TwitchDownloaderCLI chatdownload --id "$VIDEO_ID" -o "${OUTPUT_NAME}_chat.json" +else + echo "Chat ya existe" +fi + +# 3. Detectar highlights (usando GPU si está disponible) +echo "[3/5] Detectando highlights..." +python3 detector_gpu.py \ + --video "$VIDEO_FILE" \ + --chat "${OUTPUT_NAME}_chat.json" \ + --output "${OUTPUT_NAME}_highlights.json" \ + --threshold 1.5 \ + --min-duration 10 + +# 4. Generar video +echo "[4/5] Generando video..." +python3 generate_video.py \ + --video "$VIDEO_FILE" \ + --highlights "${OUTPUT_NAME}_highlights.json" \ + --output "${OUTPUT_NAME}_final.mp4" + +# 5. Limpiar +echo "[5/5] Limpiando archivos temporales..." +if [ "$DRAFT_MODE" = true ]; then + rm -f "${OUTPUT_NAME}_draft_360p.mp4" +fi + +echo "" +echo "============================================" +echo " COMPLETADO" +echo "============================================" +echo "Video final: ${OUTPUT_NAME}_final.mp4" +echo "" +echo "Para procesar en HD después:" +echo " $0 $VIDEO_ID ${OUTPUT_NAME}_hd --hd" diff --git a/requirements.txt b/requirements.txt index d4a16d9..cfd3f1f 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,17 @@ -requests==2.28.1 -streamlink==4.2.0 -twitchAPI==2.5.7 -irc==20.1.0 -scipy==1.9.0 -matplotlib==3.5.2 -numpy==1.23.0 \ No newline at end of file +# Core +requests +python-dotenv + +# Video processing +moviepy +opencv-python-headless + +# Audio processing +scipy +numpy +librosa + +# Chat download +chat-downloader + +# Chat analysis diff --git a/todo.md b/todo.md new file mode 100644 index 0000000..53014c7 --- /dev/null +++ b/todo.md @@ -0,0 +1,229 @@ +# TODO - Mejoras Pendientes + +## Estado Actual + +### Working ✅ +- Descarga de video (streamlink) +- Descarga de chat (TwitchDownloaderCLI) +- Detección por chat saturado +- Generación de video (moviepy) +- PyTorch con ROCm instalado + +### Pendiente ❌ +- Análisis de audio +- Análisis de color +- Uso de GPU en procesamiento + +--- + +## PRIORIDAD 1: Sistema 2 de 3 + +### [ ] Audio - Picos de Sonido +Implementar detección de gritos/picos de volumen. + +**Método actual (CPU):** +- Extraer audio con ffmpeg +- Usar librosa para RMS +- Detectar picos con scipy + +**Método GPU (a implementar):** +```python +import torch +import torchaudio + +# Usar GPU para análisis espectral +waveform, sr = torchaudio.load(audio_file) +spectrogram = torchaudio.transforms.Spectrogram()(waveform) +``` + +**Tareas:** +- [ ] Extraer audio del video con ffmpeg +- [ ] Calcular RMS/energía por ventana +- [ ] Detectar picos (threshold = media + 1.5*std) +- [ ] Devolver timestamps de picos + +### [ ] Color - Momentos Brillantes +Detectar cambios de color/brillo en el video. + +**Método GPU:** +```python +import cv2 +# OpenCV con OpenCL +cv2.ocl::setUseOpenCL(True) +``` + +**Tareas:** +- [ ] Procesar frames con OpenCV GPU +- [ ] Calcular saturación y brillo HSV +- [ ] Detectar momentos con cambios significativos +- [ ] Devolver timestamps + +### [ ] Combinar 2 de 3 +Sistema de scoring: +``` +highlight = (chat_score >= 2) + (audio_score >= 1.5) + (color_score >= 0.5) +if highlight >= 2: es highlight +``` + +--- + +## PRIORIDAD 2: GPU - Optimizar para 6800XT + +### [ ] PyTorch con ROCm +✅ Ya instalado: +``` +PyTorch: 2.10.0+rocm7.1 +ROCm available: True +Device: AMD Radeon Graphics +``` + +### [ ] OpenCV con OpenCL +```bash +# Verificar soporte OpenCL +python -c "import cv2; print(cv2.ocl.haveOpenCL())" +``` + +**Si no tiene OpenCL:** +- [ ] Instalar opencv-python (no headless) +- [ ] Instalar ocl-runtime para AMD + +### [ ] Reemplazar librerías CPU por GPU + +| Componente | CPU | GPU | +|------------|-----|-----| +| Audio | librosa | torchaudio (ROCm) | +| Video frames | cv2 | cv2 + OpenCL | +| Procesamiento | scipy | torch | +| Concatenación | moviepy | torch + ffmpeg | + +### [ ] MoviePy con GPU +MoviePy actualmente usa CPU. Opciones: +1. Usar ffmpeg directamente con flags GPU +2. Crear pipeline propio con torch + +```bash +# ffmpeg con GPU +ffmpeg -hwaccel auto -i input.mp4 -c:v h264_amf output.mp4 +``` + +--- + +## PRIORIDAD 3: Mejorar Detección + +### [ ] Palabras Clave en Chat +Detectar momentos con keywords como: +- "LOL", "POG", "KEK", "RIP", "WTF" +- Emotes populares +- Mayúsculas (gritos en chat) + +### [ ] Análisis de Sentimiento +- [ ] Usar modelo de sentiment (torch) +- [ ] Detectar momentos positivos/negativos intensos + +### [ ] Ranking de Highlights +- [ ] Ordenar por intensidad (combinación de scores) +- [ ] Limitar a N mejores highlights +- [ ] Duration-aware scoring + +--- + +## PRIORIDAD 4: Kick + +### [ ] Descarga de Video +✅ Ya funciona con streamlink: +```bash +streamlink https://kick.com/streamer best -o video.mp4 +``` + +### [ ] Chat +❌ Kick NO tiene API pública para chat. + +**Opciones:** +1. Web scraping del chat +2. Usar herramientas de terceros +3. Omitir chat y usar solo audio/color + +--- + +## PRIORIDAD 5: Optimizaciones + +### [ ] Paralelización +- [ ] Procesar chunks del video en paralelo +- [ ] ThreadPool para I/O + +### [ ] Cache +- [ ] Guardar resultados intermedios +- [ ] Reutilizar análisis si existe chat.txt + +### [ ] Chunking +- [ ] Procesar video en segmentos +- [ ] Evitar cargar todo en memoria + +--- + +## PRIORIDAD 6: UX/UI + +### [ ] CLI Mejorada +```bash +python main.py --video-id 2701190361 --platform twitch \ + --min-duration 10 --threshold 2.0 \ + --output highlights.mp4 \ + --use-gpu --gpu-device 0 +``` + +### [ ] Interfaz Web +- [ ] Streamlit app +- [ ] Subir video/chat +- [ ] Ver timeline de highlights +- [ ] Preview de clips + +### [ ] Progress Bars +- [ ] tqdm para descargas +- [ ] Progress para procesamiento + +--- + +## RECETAS DE INSTALACIÓN + +### GPU ROCm +```bash +# PyTorch con ROCm +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.1 + +# Verificar +python -c "import torch; print(torch.cuda.is_available())" +``` + +### NVIDIA CUDA (alternativa) +```bash +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 +``` + +### OpenCV con OpenCL +```bash +# Verificar +python -c "import cv2; print(cv2.ocl.haveOpenCL())" + +# Si False, instalar con GPU support +pip uninstall opencv-python-headless +pip install opencv-python +``` + +--- + +## RENDIMIENTO ESPERADO + +| Config | FPS Processing | Tiempo 5h Video | +|--------|----------------|------------------| +| CPU (12 cores) | ~5-10 FPS | ~1-2 horas | +| GPU NVIDIA 3050 | ~30-50 FPS | ~10-20 min | +| GPU AMD 6800XT | ~30-40 FPS | ~15-25 min | + +--- + +## NOTAS + +1. **ROCm 7.1** funcionando con PyTorch +2. **6800XT** detectada como "AMD Radeon Graphics" +3. **MoviePy** sigue usando CPU para renderizado +4. Para mejor rendimiento, considerar renderizado con ffmpeg GPU directamente