commit 9231d9630579429eb7a22c92812a11b62b9c865e Author: renato97 Date: Thu May 28 20:51:35 2026 -0300 feat: initial commit - manga-image-translator setup with MiniMax LLM for Spanish translation diff --git a/.commandcode/taste/taste.md b/.commandcode/taste/taste.md new file mode 100644 index 0000000..f562cac --- /dev/null +++ b/.commandcode/taste/taste.md @@ -0,0 +1,4 @@ +# Taste (Continuously Learned by [CommandCode][cmd]) + +[cmd]: https://commandcode.ai/ + diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..8936364 --- /dev/null +++ b/.gitignore @@ -0,0 +1,35 @@ +# Python +__pycache__/ +*.pyc +*.pyo +venv/ +.venv/ + +# IDE +.vscode/ +.idea/ + +# OS +Thumbs.db +Desktop.ini +.DS_Store + +# manga-image-translator internals (submodule content) +manga-image-translator/.git/ +manga-image-translator/models/ +manga-image-translator/venv/ +manga-image-translator/result/ +manga-image-translator/__pycache__/ +manga-image-translator/**/*.pyc + +# Sensitive +.env +.env.* +manga-image-translator/.env + +# Large translated galleries (>10MB) +example-translated/nhentai_652854/ +example-translated/nhentai_652854_test/ + +# Temporary/debug +*.log diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 0000000..18b52d7 --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "manga-image-translator"] + path = manga-image-translator + url = https://github.com/zyddnys/manga-image-translator.git diff --git a/LLM_READY.md b/LLM_READY.md new file mode 100644 index 0000000..48bb66f --- /dev/null +++ b/LLM_READY.md @@ -0,0 +1,312 @@ +# LLM_READY.md - manga-image-translator Context Guide + +> **Purpose:** Complete context document for any AI working with this project. +> **Last updated:** 2026-05-28 +> **Working directory:** `C:\Users\Administrator\Documents\fansub2` + +--- + +## 1. Project Overview + +**manga-image-translator** is a tool that automatically translates text in manga/comic images. It: +1. **Detects** text regions in images (bounding boxes) +2. **OCRs** the text (reads what it says) +3. **Translates** the text to a target language +4. **Inpaints** (erases) the original text +5. **Renders** the translated text in the same position + +### Repository +- **Location:** `manga-image-translator/` (git child repo) +- **Language:** Python (experimental version) +- **GPU:** Not available — CPU + RAM only (VPS deployment planned) +- **Python venv:** `manga-image-translator/venv/` + +--- + +## 2. Translation Pipeline + +``` +Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output +``` + +### Detection +- **Model:** DBNet_resnet34 (default detector) +- **Default resolution:** 2048px (configurable via `detection_size`) +- **Impact on speed:** Reducing to 1024 is ~2-4x faster + +### OCR +- **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr` +- **Default:** `48px` (ConvNext backbone, more accurate) +- **32px:** ResNet backbone, ~20-30% faster but less accurate +- **Impact:** 48px recommended for reliability; 32px for speed + +### Translation (the critical part) +- **LLM translator:** `chatgpt` (OpenAI-compatible API) +- **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate) +- **Target language:** `ESP` (Spanish) + +### Inpainting +- **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none` +- **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU +- **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU +- **`none`:** Skip inpainting entirely + +### Rendering +- **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none` +- **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment` + +--- + +## 3. Working Configuration (PROVEN) + +This configuration was tested on 275 pages and completed successfully in ~1.5 hours. + +### Config file: `translate_config.json` +```json +{ + "translator": { + "translator": "chatgpt", + "target_lang": "ESP" + }, + "render": { + "renderer": "manga2eng", + "font_size_offset": -10, + "font_size_minimum": 8, + "no_hyphenation": true, + "alignment": "center" + } +} +``` + +### Environment file: `manga-image-translator/.env` +``` +OPENAI_API_KEY= +OPENAI_API_BASE=https://api.minimax.io/v1 +OPENAI_MODEL=MiniMax-M2.7 +CUSTOM_OPENAI_API_KEY= +CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1 +CUSTOM_OPENAI_MODEL=MiniMax-M2.7 +``` + +### Proven command +```powershell +$env:PYTHONIOENCODING="utf-8" +$env:PYTHONUTF8="1" +& "manga-image-translator\venv\Scripts\python.exe" ` + -m manga_translator local ` + -i "example\nhentai_652854" ` + -o "example-translated\nhentai_652854" ` + --config-file "translate_config.json" ` + --ignore-errors ` + --overwrite +``` + +### Performance (275 pages) +| Metric | Value | +|--------|-------| +| Pages processed | 275/275 | +| Time | ~1.5 hours | +| API calls | ~700 | +| Translator | MiniMax-M2.7 via OpenAI-compatible API | +| Language | Japanese/Chinese → Spanish (ESP) | + +--- + +## 4. CLI Flags Reference + +### General +| Flag | Description | +|------|-------------| +| `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) | +| `--overwrite` | Overwrite existing translated files | +| `--skip-no-text` | Don't save images with no detected text | +| `-v` | Verbose output (saves intermediate images to `result/`) | + +### Batch processing +| Flag | Description | +|------|-------------| +| `--batch-size N` | Process N images per batch (default: 1) | +| `--batch-concurrent` | Use concurrent mode for batch translation | + +### GPU (not available on VPS) +| Flag | Description | +|------|-------------| +| `--use-gpu` | Use CUDA/MPS for all models | +| `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators | + +### Config file options (in `translate_config.json`) +| Key | Values | Default | Recommended | +|-----|--------|---------|-------------| +| `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` | +| `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` | +| `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null | +| `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` | +| `render.font_size_offset` | integer | 0 | -10 | +| `render.font_size_minimum` | integer | -1 | 8 | +| `render.no_hyphenation` | boolean | false | true | +| `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` | +| `detector.detection_size` | integer | 2048 | 1024 (faster) | +| `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` | +| `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) | +| `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` | + +--- + +## 5. Valid Language Codes (target_lang) + +From `manga_translator/translators/common.py`: + +| Code | Language | +|------|----------| +| `CHS` | Chinese (Simplified) | +| `CHT` | Chinese (Traditional) | +| `ENG` | English | +| `JPN` | Japanese | +| `KOR` | Korean | +| `ESP` | Spanish | +| `FRA` | French | +| `DEU` | German | +| `ITA` | Italian | +| `PTB` | Portuguese (Brazil) | +| `RUS` | Russian | +| `ARA` | Arabic | +| `THA` | Thai | +| `VIN` | Vietnamese | +| ... | (25+ languages total) | + +--- + +## 6. Optimization Options (Tested Results) + +### OPCIÓN A: Config only (no code changes) — TESTED +| Flag | Default | Tested | Impact | +|------|---------|--------|--------| +| `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection | +| `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting | +| `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting | +| `--ocr 32px` | 48px | Tested | ~20-30% faster OCR | +| `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax | +| `--batch-concurrent` | off | Tested | Added overhead, no benefit | +| `--skip-no-text` | off | Tested | Saves I/O, minor benefit | + +**Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):** +- **1.9 hours** (SLOWER than 1.5h without flags) +- **266/275 pages** (9 failed) +- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013) + +### OPCIÓN B: Code changes (not implemented yet) +| Change | Expected Impact | Complexity | +|--------|----------------|------------| +| Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low | +| Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High | +| Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low | + +--- + +## 7. Known Issues + +### MiniMax API Errors +- **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content +- **Frequency:** Occurs with long Chinese text blocks, especially in batch mode +- **Workaround:** `--ignore-errors` skips failed pages + +### Post-Translation Check Failures +- The tool checks if translated text is actually in the target language +- Sometimes valid Spanish translations fail the check (false negatives) +- This causes unnecessary retries and can revert translations to original text +- **Workaround:** Already handled by `--ignore-errors` + +### Vertical Bubble Problem +- Japanese manga uses vertical speech bubbles (narrow, tall) +- Spanish text is horizontal and longer than Japanese +- Text overflows or doesn't fit in narrow vertical bubbles +- **Mitigation:** `font_size_offset: -10` reduces font size to fit better +- **Known limitation:** Some vertical bubbles will always overflow + +### OCR Accuracy +- OCR sometimes misreads characters (especially damaged/low-quality scans) +- OCR errors propagate to translation (garbage in → garbage out) +- `48px` model is more accurate than `32px` + +--- + +## 8. File Structure + +``` +fansub2/ +├── example/ # Source images +│ ├── nhentai_652854/ # 275-page gallery (Chinese manga) +│ ├── japanese.jpg # Single Japanese page test +│ ├── english.jpg # Single English page test +│ ├── chinese_sfw.webp # Single Chinese page test +│ ├── coreano.jpg # Single Korean page test +│ └── burbujascombinadas.webp # Single English page test +├── example-translated/ # Output translated images +│ ├── nhentai_652854/ # 275 pages (1.5h, batch-size 1) +│ ├── nhentai_652854_test/ # 275 pages (1.9h, batch-size 30 - SLOWER) +│ ├── japanese.jpg # Latest: font_size_offset -10 +│ └── ... (other translated files) +├── translate_config.json # Active translation config +├── OPTIMIZACIONES.md # Optimization notes +└── manga-image-translator/ # The tool + ├── .env # API keys (DO NOT COMMIT) + ├── venv/ # Python virtual environment + ├── manga_translator/ # Source code + │ ├── translators/ # Translation backends + │ │ ├── chatgpt.py # OpenAI-compatible (MiniMax) + │ │ ├── common.py # Language codes, base classes + │ │ ├── nllb.py # Facebook NLLB-200 (offline) + │ │ ├── m2m100.py # Facebook M2M-100 (offline) + │ │ └── keys.py # API key env vars + │ ├── rendering/ # Text rendering + │ │ ├── __init__.py # Main render pipeline + │ │ └── text_render.py # Font/text rendering + │ ├── detection/ # Text detection (DBNet) + │ ├── ocr/ # OCR models + │ ├── inpainting/ # Text erasure models + │ ├── manga_translator.py # Main orchestrator + │ ├── config.py # Config schema + │ └── mode/local.py # Local batch mode + ├── result/ # Debug output (with -v flag) + └── README.md # Official documentation +``` + +--- + +## 9. Running the Tool + +### Single image +```powershell +$env:PYTHONIOENCODING="utf-8" +$env:PYTHONUTF8="1" +& "manga-image-translator\venv\Scripts\python.exe" ` + -m manga_translator local ` + -i "example\japanese.jpg" ` + -o "example-translated" ` + --config-file "translate_config.json" ` + --ignore-errors --overwrite +``` + +### Full gallery +```powershell +$env:PYTHONIOENCODING="utf-8" +$env:PYTHONUTF8="1" +& "manga-image-translator\venv\Scripts\python.exe" ` + -m manga_translator local ` + -i "example\nhentai_652854" ` + -o "example-translated\nhentai_652854" ` + --config-file "translate_config.json" ` + --ignore-errors --overwrite +``` + +--- + +## 10. Key Learnings + +1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga +2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest +3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console +4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue +5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing +6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case +7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently diff --git a/OPTIMIZACIONES.md b/OPTIMIZACIONES.md new file mode 100644 index 0000000..23dc962 --- /dev/null +++ b/OPTIMIZACIONES.md @@ -0,0 +1,59 @@ +# Optimizaciones de manga-image-translator + +## OPCIÓN A: Solo config (sin tocar código) + +### Flags para acelerar en CPU + +| Flag | Default | Recomendado | Impacto | Riesgo | +|------|---------|-------------|---------|--------| +| `--detection-size` | 2048 | **1024** | ~2-4x más rápido detección | Puede perder burbujas muy pequeñas | +| `--inpainting-size` | 2048 | **1024** | ~2-4x más rápido inpainting | Menor calidad de borrado | +| `--inpainter` | lama_large | **default (AOT)** | ~3-5x más rápido inpainting | AOT borra un poco peor | +| `--ocr` | 48px | **32px** | ~20-30% más rápido OCR | Puede fallar en texto pequeño | +| `--batch-size` | 1 | **20-50** | Reduce llamadas API ~20x | Textos muy largos pueden fallar (error 2013) | +| `--batch-concurrent` | off | **on** | Superpone red + CPU | Mejora parcial | +| `--skip-no-text` | off | **on** | Ahorra I/O en páginas sin texto | Solo ahorra escritura, no procesamiento | + +### Comando óptimo +```bash +python -m manga_translator local \ + -i "input" -o "output" \ + --config-file translate_config.json \ + --detection-size 1024 \ + --inpainting-size 1024 \ + --inpainter default \ + --ocr 32px \ + --batch-size 30 \ + --batch-concurrent \ + --skip-no-text \ + --ignore-errors --overwrite +``` + +### Estimación: 1.5h → ~20-30 min + +--- + +## OPCIÓN B: Cambios de código (requiere modificar el repo) + +### B1. Corregir `_concurrent_translate_contexts` (chatgpt.py) +- Usar `asyncio.gather` en vez de loop secuencial +- Impacto: ~30-40% más rápido en fase de traducción +- Riesgo: Bajo + +### B2. Pipeline real con ProcessPoolExecutor +- multiprocessing.Pool para detección + OCR paralela +- Bypass del GIL de Python +- Impacto: ~50-60% más rápido (mayor salto) +- Riesgo: Requiere refactorizar state global de modelos + +### B3. Aumentar `_MAX_TOKENS` (chatgpt.py) +- Default: 4096 → 8192 si MiniMax lo soporta +- Impacto: Menor con batch-size alto + +### Estimación: 1.5h → ~15-25 min + +--- + +## OPCIÓN C: Híbrida (recomendada) +- Config de Opción A + B1 (código menor) +- Estimación: ~15-25 min, ~50-80 llamadas API diff --git a/example-translated/burbujascombinadas.webp b/example-translated/burbujascombinadas.webp new file mode 100644 index 0000000..332e0f6 Binary files /dev/null and b/example-translated/burbujascombinadas.webp differ diff --git a/example-translated/chinese_sfw.webp b/example-translated/chinese_sfw.webp new file mode 100644 index 0000000..121b18d Binary files /dev/null and b/example-translated/chinese_sfw.webp differ diff --git a/example-translated/coreano.jpg b/example-translated/coreano.jpg new file mode 100644 index 0000000..3785ddb Binary files /dev/null and b/example-translated/coreano.jpg differ diff --git a/example-translated/english.jpg b/example-translated/english.jpg new file mode 100644 index 0000000..b012242 Binary files /dev/null and b/example-translated/english.jpg differ diff --git a/example-translated/japanese.jpg b/example-translated/japanese.jpg new file mode 100644 index 0000000..a05a294 Binary files /dev/null and b/example-translated/japanese.jpg differ diff --git a/example-translated/japanese_m2m100.jpg b/example-translated/japanese_m2m100.jpg new file mode 100644 index 0000000..0ad77e1 Binary files /dev/null and b/example-translated/japanese_m2m100.jpg differ diff --git a/example/burbujascombinadas.webp b/example/burbujascombinadas.webp new file mode 100644 index 0000000..dad71f5 Binary files /dev/null and b/example/burbujascombinadas.webp differ diff --git a/example/chinese_sfw.webp b/example/chinese_sfw.webp new file mode 100644 index 0000000..c2210da Binary files /dev/null and b/example/chinese_sfw.webp differ diff --git a/example/coreano.jpg b/example/coreano.jpg new file mode 100644 index 0000000..8c34622 Binary files /dev/null and b/example/coreano.jpg differ diff --git a/example/english.jpg b/example/english.jpg new file mode 100644 index 0000000..b69e033 Binary files /dev/null and b/example/english.jpg differ diff --git a/example/japanese.jpg b/example/japanese.jpg new file mode 100644 index 0000000..52b8a6b Binary files /dev/null and b/example/japanese.jpg differ diff --git a/manga-image-translator b/manga-image-translator new file mode 160000 index 0000000..d5a3eee --- /dev/null +++ b/manga-image-translator @@ -0,0 +1 @@ +Subproject commit d5a3eee4a7b7b7754b71baa2ee82309dfff468bc diff --git a/translate_config.json b/translate_config.json new file mode 100644 index 0000000..eed0e74 --- /dev/null +++ b/translate_config.json @@ -0,0 +1,23 @@ +{ + "translator": { + "translator": "chatgpt", + "target_lang": "ESP" + }, + "render": { + "renderer": "manga2eng", + "font_size_offset": -10, + "font_size_minimum": 8, + "no_hyphenation": true, + "alignment": "center" + }, + "detector": { + "detection_size": 1024 + }, + "inpainter": { + "inpainter": "default", + "inpainting_size": 1024 + }, + "ocr": { + "ocr": "32px" + } +} \ No newline at end of file