feat: initial commit - manga-image-translator setup with MiniMax LLM for Spanish translation

2026-05-28 20:51:35 -03:00
commit 9231d96305
18 changed files with 437 additions and 0 deletions
--- a/.commandcode/taste/taste.md
+++ b/.commandcode/taste/taste.md
@@ -0,0 +1,4 @@
+# Taste (Continuously Learned by [CommandCode][cmd])
+
+[cmd]: https://commandcode.ai/
+
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,35 @@
+# Python
+__pycache__/
+*.pyc
+*.pyo
+venv/
+.venv/
+
+# IDE
+.vscode/
+.idea/
+
+# OS
+Thumbs.db
+Desktop.ini
+.DS_Store
+
+# manga-image-translator internals (submodule content)
+manga-image-translator/.git/
+manga-image-translator/models/
+manga-image-translator/venv/
+manga-image-translator/result/
+manga-image-translator/__pycache__/
+manga-image-translator/**/*.pyc
+
+# Sensitive
+.env
+.env.*
+manga-image-translator/.env
+
+# Large translated galleries (>10MB)
+example-translated/nhentai_652854/
+example-translated/nhentai_652854_test/
+
+# Temporary/debug
+*.log
--- a/.gitmodules
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "manga-image-translator"]
+	path = manga-image-translator
+	url = https://github.com/zyddnys/manga-image-translator.git
--- a/LLM_READY.md
+++ b/LLM_READY.md
@@ -0,0 +1,312 @@
+# LLM_READY.md - manga-image-translator Context Guide
+
+> **Purpose:** Complete context document for any AI working with this project.
+> **Last updated:** 2026-05-28
+> **Working directory:** `C:\Users\Administrator\Documents\fansub2`
+
+---
+
+## 1. Project Overview
+
+**manga-image-translator** is a tool that automatically translates text in manga/comic images. It:
+1. **Detects** text regions in images (bounding boxes)
+2. **OCRs** the text (reads what it says)
+3. **Translates** the text to a target language
+4. **Inpaints** (erases) the original text
+5. **Renders** the translated text in the same position
+
+### Repository
+- **Location:** `manga-image-translator/` (git child repo)
+- **Language:** Python (experimental version)
+- **GPU:** Not available — CPU + RAM only (VPS deployment planned)
+- **Python venv:** `manga-image-translator/venv/`
+
+---
+
+## 2. Translation Pipeline
+
+```
+Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output
+```
+
+### Detection
+- **Model:** DBNet_resnet34 (default detector)
+- **Default resolution:** 2048px (configurable via `detection_size`)
+- **Impact on speed:** Reducing to 1024 is ~2-4x faster
+
+### OCR
+- **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr`
+- **Default:** `48px` (ConvNext backbone, more accurate)
+- **32px:** ResNet backbone, ~20-30% faster but less accurate
+- **Impact:** 48px recommended for reliability; 32px for speed
+
+### Translation (the critical part)
+- **LLM translator:** `chatgpt` (OpenAI-compatible API)
+- **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate)
+- **Target language:** `ESP` (Spanish)
+
+### Inpainting
+- **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none`
+- **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU
+- **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU
+- **`none`:** Skip inpainting entirely
+
+### Rendering
+- **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none`
+- **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment`
+
+---
+
+## 3. Working Configuration (PROVEN)
+
+This configuration was tested on 275 pages and completed successfully in ~1.5 hours.
+
+### Config file: `translate_config.json`
+```json
+{
+  "translator": {
+    "translator": "chatgpt",
+    "target_lang": "ESP"
+  },
+  "render": {
+    "renderer": "manga2eng",
+    "font_size_offset": -10,
+    "font_size_minimum": 8,
+    "no_hyphenation": true,
+    "alignment": "center"
+  }
+}
+```
+
+### Environment file: `manga-image-translator/.env`
+```
+OPENAI_API_KEY=<your-api-key>
+OPENAI_API_BASE=https://api.minimax.io/v1
+OPENAI_MODEL=MiniMax-M2.7
+CUSTOM_OPENAI_API_KEY=<your-api-key>
+CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
+CUSTOM_OPENAI_MODEL=MiniMax-M2.7
+```
+
+### Proven command
+```powershell
+$env:PYTHONIOENCODING="utf-8"
+$env:PYTHONUTF8="1"
+& "manga-image-translator\venv\Scripts\python.exe" `
+  -m manga_translator local `
+  -i "example\nhentai_652854" `
+  -o "example-translated\nhentai_652854" `
+  --config-file "translate_config.json" `
+  --ignore-errors `
+  --overwrite
+```
+
+### Performance (275 pages)
+| Metric | Value |
+|--------|-------|
+| Pages processed | 275/275 |
+| Time | ~1.5 hours |
+| API calls | ~700 |
+| Translator | MiniMax-M2.7 via OpenAI-compatible API |
+| Language | Japanese/Chinese → Spanish (ESP) |
+
+---
+
+## 4. CLI Flags Reference
+
+### General
+| Flag | Description |
+|------|-------------|
+| `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) |
+| `--overwrite` | Overwrite existing translated files |
+| `--skip-no-text` | Don't save images with no detected text |
+| `-v` | Verbose output (saves intermediate images to `result/`) |
+
+### Batch processing
+| Flag | Description |
+|------|-------------|
+| `--batch-size N` | Process N images per batch (default: 1) |
+| `--batch-concurrent` | Use concurrent mode for batch translation |
+
+### GPU (not available on VPS)
+| Flag | Description |
+|------|-------------|
+| `--use-gpu` | Use CUDA/MPS for all models |
+| `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators |
+
+### Config file options (in `translate_config.json`)
+| Key | Values | Default | Recommended |
+|-----|--------|---------|-------------|
+| `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` |
+| `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` |
+| `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null |
+| `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` |
+| `render.font_size_offset` | integer | 0 | -10 |
+| `render.font_size_minimum` | integer | -1 | 8 |
+| `render.no_hyphenation` | boolean | false | true |
+| `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` |
+| `detector.detection_size` | integer | 2048 | 1024 (faster) |
+| `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` |
+| `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) |
+| `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` |
+
+---
+
+## 5. Valid Language Codes (target_lang)
+
+From `manga_translator/translators/common.py`:
+
+| Code | Language |
+|------|----------|
+| `CHS` | Chinese (Simplified) |
+| `CHT` | Chinese (Traditional) |
+| `ENG` | English |
+| `JPN` | Japanese |
+| `KOR` | Korean |
+| `ESP` | Spanish |
+| `FRA` | French |
+| `DEU` | German |
+| `ITA` | Italian |
+| `PTB` | Portuguese (Brazil) |
+| `RUS` | Russian |
+| `ARA` | Arabic |
+| `THA` | Thai |
+| `VIN` | Vietnamese |
+| ... | (25+ languages total) |
+
+---
+
+## 6. Optimization Options (Tested Results)
+
+### OPCIÓN A: Config only (no code changes) — TESTED
+| Flag | Default | Tested | Impact |
+|------|---------|--------|--------|
+| `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection |
+| `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting |
+| `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting |
+| `--ocr 32px` | 48px | Tested | ~20-30% faster OCR |
+| `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax |
+| `--batch-concurrent` | off | Tested | Added overhead, no benefit |
+| `--skip-no-text` | off | Tested | Saves I/O, minor benefit |
+
+**Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):**
+- **1.9 hours** (SLOWER than 1.5h without flags)
+- **266/275 pages** (9 failed)
+- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)
+
+### OPCIÓN B: Code changes (not implemented yet)
+| Change | Expected Impact | Complexity |
+|--------|----------------|------------|
+| Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low |
+| Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High |
+| Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low |
+
+---
+
+## 7. Known Issues
+
+### MiniMax API Errors
+- **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content
+- **Frequency:** Occurs with long Chinese text blocks, especially in batch mode
+- **Workaround:** `--ignore-errors` skips failed pages
+
+### Post-Translation Check Failures
+- The tool checks if translated text is actually in the target language
+- Sometimes valid Spanish translations fail the check (false negatives)
+- This causes unnecessary retries and can revert translations to original text
+- **Workaround:** Already handled by `--ignore-errors`
+
+### Vertical Bubble Problem
+- Japanese manga uses vertical speech bubbles (narrow, tall)
+- Spanish text is horizontal and longer than Japanese
+- Text overflows or doesn't fit in narrow vertical bubbles
+- **Mitigation:** `font_size_offset: -10` reduces font size to fit better
+- **Known limitation:** Some vertical bubbles will always overflow
+
+### OCR Accuracy
+- OCR sometimes misreads characters (especially damaged/low-quality scans)
+- OCR errors propagate to translation (garbage in → garbage out)
+- `48px` model is more accurate than `32px`
+
+---
+
+## 8. File Structure
+
+```
+fansub2/
+├── example/                          # Source images
+│   ├── nhentai_652854/              # 275-page gallery (Chinese manga)
+│   ├── japanese.jpg                 # Single Japanese page test
+│   ├── english.jpg                  # Single English page test
+│   ├── chinese_sfw.webp             # Single Chinese page test
+│   ├── coreano.jpg                  # Single Korean page test
+│   └── burbujascombinadas.webp      # Single English page test
+├── example-translated/              # Output translated images
+│   ├── nhentai_652854/             # 275 pages (1.5h, batch-size 1)
+│   ├── nhentai_652854_test/        # 275 pages (1.9h, batch-size 30 - SLOWER)
+│   ├── japanese.jpg                # Latest: font_size_offset -10
+│   └── ... (other translated files)
+├── translate_config.json            # Active translation config
+├── OPTIMIZACIONES.md               # Optimization notes
+└── manga-image-translator/          # The tool
+    ├── .env                         # API keys (DO NOT COMMIT)
+    ├── venv/                        # Python virtual environment
+    ├── manga_translator/            # Source code
+    │   ├── translators/             # Translation backends
+    │   │   ├── chatgpt.py          # OpenAI-compatible (MiniMax)
+    │   │   ├── common.py           # Language codes, base classes
+    │   │   ├── nllb.py             # Facebook NLLB-200 (offline)
+    │   │   ├── m2m100.py           # Facebook M2M-100 (offline)
+    │   │   └── keys.py             # API key env vars
+    │   ├── rendering/              # Text rendering
+    │   │   ├── __init__.py         # Main render pipeline
+    │   │   └── text_render.py      # Font/text rendering
+    │   ├── detection/              # Text detection (DBNet)
+    │   ├── ocr/                    # OCR models
+    │   ├── inpainting/             # Text erasure models
+    │   ├── manga_translator.py     # Main orchestrator
+    │   ├── config.py               # Config schema
+    │   └── mode/local.py           # Local batch mode
+    ├── result/                      # Debug output (with -v flag)
+    └── README.md                    # Official documentation
+```
+
+---
+
+## 9. Running the Tool
+
+### Single image
+```powershell
+$env:PYTHONIOENCODING="utf-8"
+$env:PYTHONUTF8="1"
+& "manga-image-translator\venv\Scripts\python.exe" `
+  -m manga_translator local `
+  -i "example\japanese.jpg" `
+  -o "example-translated" `
+  --config-file "translate_config.json" `
+  --ignore-errors --overwrite
+```
+
+### Full gallery
+```powershell
+$env:PYTHONIOENCODING="utf-8"
+$env:PYTHONUTF8="1"
+& "manga-image-translator\venv\Scripts\python.exe" `
+  -m manga_translator local `
+  -i "example\nhentai_652854" `
+  -o "example-translated\nhentai_652854" `
+  --config-file "translate_config.json" `
+  --ignore-errors --overwrite
+```
+
+---
+
+## 10. Key Learnings
+
+1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
+2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
+3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console
+4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
+5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
+6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case
+7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently
--- a/OPTIMIZACIONES.md
+++ b/OPTIMIZACIONES.md
@@ -0,0 +1,59 @@
+# Optimizaciones de manga-image-translator
+
+## OPCIÓN A: Solo config (sin tocar código)
+
+### Flags para acelerar en CPU
+
+| Flag | Default | Recomendado | Impacto | Riesgo |
+|------|---------|-------------|---------|--------|
+| `--detection-size` | 2048 | **1024** | ~2-4x más rápido detección | Puede perder burbujas muy pequeñas |
+| `--inpainting-size` | 2048 | **1024** | ~2-4x más rápido inpainting | Menor calidad de borrado |
+| `--inpainter` | lama_large | **default (AOT)** | ~3-5x más rápido inpainting | AOT borra un poco peor |
+| `--ocr` | 48px | **32px** | ~20-30% más rápido OCR | Puede fallar en texto pequeño |
+| `--batch-size` | 1 | **20-50** | Reduce llamadas API ~20x | Textos muy largos pueden fallar (error 2013) |
+| `--batch-concurrent` | off | **on** | Superpone red + CPU | Mejora parcial |
+| `--skip-no-text` | off | **on** | Ahorra I/O en páginas sin texto | Solo ahorra escritura, no procesamiento |
+
+### Comando óptimo
+```bash
+python -m manga_translator local \
+  -i "input" -o "output" \
+  --config-file translate_config.json \
+  --detection-size 1024 \
+  --inpainting-size 1024 \
+  --inpainter default \
+  --ocr 32px \
+  --batch-size 30 \
+  --batch-concurrent \
+  --skip-no-text \
+  --ignore-errors --overwrite
+```
+
+### Estimación: 1.5h → ~20-30 min
+
+---
+
+## OPCIÓN B: Cambios de código (requiere modificar el repo)
+
+### B1. Corregir `_concurrent_translate_contexts` (chatgpt.py)
+- Usar `asyncio.gather` en vez de loop secuencial
+- Impacto: ~30-40% más rápido en fase de traducción
+- Riesgo: Bajo
+
+### B2. Pipeline real con ProcessPoolExecutor
+- multiprocessing.Pool para detección + OCR paralela
+- Bypass del GIL de Python
+- Impacto: ~50-60% más rápido (mayor salto)
+- Riesgo: Requiere refactorizar state global de modelos
+
+### B3. Aumentar `_MAX_TOKENS` (chatgpt.py)
+- Default: 4096 → 8192 si MiniMax lo soporta
+- Impacto: Menor con batch-size alto
+
+### Estimación: 1.5h → ~15-25 min
+
+---
+
+## OPCIÓN C: Híbrida (recomendada)
+- Config de Opción A + B1 (código menor)
+- Estimación: ~15-25 min, ~50-80 llamadas API
--- a/example-translated/burbujascombinadas.webp
+++ b/example-translated/burbujascombinadas.webp
--- a/example-translated/chinese_sfw.webp
+++ b/example-translated/chinese_sfw.webp
--- a/example-translated/coreano.jpg
+++ b/example-translated/coreano.jpg
--- a/example-translated/english.jpg
+++ b/example-translated/english.jpg
--- a/example-translated/japanese.jpg
+++ b/example-translated/japanese.jpg
--- a/example-translated/japanese_m2m100.jpg
+++ b/example-translated/japanese_m2m100.jpg
--- a/example/burbujascombinadas.webp
+++ b/example/burbujascombinadas.webp
--- a/example/chinese_sfw.webp
+++ b/example/chinese_sfw.webp
--- a/example/coreano.jpg
+++ b/example/coreano.jpg
--- a/example/english.jpg
+++ b/example/english.jpg
--- a/example/japanese.jpg
+++ b/example/japanese.jpg
--- a/1
+++ b/1
--- a/translate_config.json
+++ b/translate_config.json
@@ -0,0 +1,23 @@
+{
+  "translator": {
+    "translator": "chatgpt",
+    "target_lang": "ESP"
+  },
+  "render": {
+    "renderer": "manga2eng",
+    "font_size_offset": -10,
+    "font_size_minimum": 8,
+    "no_hyphenation": true,
+    "alignment": "center"
+  },
+  "detector": {
+    "detection_size": 1024
+  },
+  "inpainter": {
+    "inpainter": "default",
+    "inpainting_size": 1024
+  },
+  "ocr": {
+    "ocr": "32px"
+  }
+}