# LLM_READY.md - manga-image-translator Context Guide > **Purpose:** Complete context document for any AI working with this project. > **Last updated:** 2026-05-28 > **Working directory:** `C:\Users\Administrator\Documents\fansub2` --- ## 1. Project Overview **manga-image-translator** is a tool that automatically translates text in manga/comic images. It: 1. **Detects** text regions in images (bounding boxes) 2. **OCRs** the text (reads what it says) 3. **Translates** the text to a target language 4. **Inpaints** (erases) the original text 5. **Renders** the translated text in the same position ### Repository - **Location:** `manga-image-translator/` (git child repo) - **Language:** Python (experimental version) - **GPU:** Not available — CPU + RAM only (VPS deployment planned) - **Python venv:** `manga-image-translator/venv/` --- ## 2. Translation Pipeline ``` Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output ``` ### Detection - **Model:** DBNet_resnet34 (default detector) - **Default resolution:** 2048px (configurable via `detection_size`) - **Impact on speed:** Reducing to 1024 is ~2-4x faster ### OCR - **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr` - **Default:** `48px` (ConvNext backbone, more accurate) - **32px:** ResNet backbone, ~20-30% faster but less accurate - **Impact:** 48px recommended for reliability; 32px for speed ### Translation (the critical part) - **LLM translator:** `chatgpt` (OpenAI-compatible API) - **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate) - **Target language:** `ESP` (Spanish) ### Inpainting - **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none` - **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU - **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU - **`none`:** Skip inpainting entirely ### Rendering - **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none` - **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment` --- ## 3. Working Configuration (PROVEN) This configuration was tested on 275 pages and completed successfully in ~1.5 hours. ### Config file: `translate_config.json` ```json { "translator": { "translator": "chatgpt", "target_lang": "ESP" }, "render": { "renderer": "manga2eng", "font_size_offset": -10, "font_size_minimum": 8, "no_hyphenation": true, "alignment": "center" } } ``` ### Environment file: `manga-image-translator/.env` ``` OPENAI_API_KEY= OPENAI_API_BASE=https://api.minimax.io/v1 OPENAI_MODEL=MiniMax-M2.7 CUSTOM_OPENAI_API_KEY= CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1 CUSTOM_OPENAI_MODEL=MiniMax-M2.7 ``` ### Proven command ```powershell $env:PYTHONIOENCODING="utf-8" $env:PYTHONUTF8="1" & "manga-image-translator\venv\Scripts\python.exe" ` -m manga_translator local ` -i "example\nhentai_652854" ` -o "example-translated\nhentai_652854" ` --config-file "translate_config.json" ` --ignore-errors ` --overwrite ``` ### Performance (275 pages) | Metric | Value | |--------|-------| | Pages processed | 275/275 | | Time | ~1.5 hours | | API calls | ~700 | | Translator | MiniMax-M2.7 via OpenAI-compatible API | | Language | Japanese/Chinese → Spanish (ESP) | --- ## 4. CLI Flags Reference ### General | Flag | Description | |------|-------------| | `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) | | `--overwrite` | Overwrite existing translated files | | `--skip-no-text` | Don't save images with no detected text | | `-v` | Verbose output (saves intermediate images to `result/`) | ### Batch processing | Flag | Description | |------|-------------| | `--batch-size N` | Process N images per batch (default: 1) | | `--batch-concurrent` | Use concurrent mode for batch translation | ### GPU (not available on VPS) | Flag | Description | |------|-------------| | `--use-gpu` | Use CUDA/MPS for all models | | `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators | ### Config file options (in `translate_config.json`) | Key | Values | Default | Recommended | |-----|--------|---------|-------------| | `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` | | `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` | | `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null | | `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` | | `render.font_size_offset` | integer | 0 | -10 | | `render.font_size_minimum` | integer | -1 | 8 | | `render.no_hyphenation` | boolean | false | true | | `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` | | `detector.detection_size` | integer | 2048 | 1024 (faster) | | `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` | | `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) | | `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` | --- ## 5. Valid Language Codes (target_lang) From `manga_translator/translators/common.py`: | Code | Language | |------|----------| | `CHS` | Chinese (Simplified) | | `CHT` | Chinese (Traditional) | | `ENG` | English | | `JPN` | Japanese | | `KOR` | Korean | | `ESP` | Spanish | | `FRA` | French | | `DEU` | German | | `ITA` | Italian | | `PTB` | Portuguese (Brazil) | | `RUS` | Russian | | `ARA` | Arabic | | `THA` | Thai | | `VIN` | Vietnamese | | ... | (25+ languages total) | --- ## 6. Optimization Options (Tested Results) ### OPCIÓN A: Config only (no code changes) — TESTED | Flag | Default | Tested | Impact | |------|---------|--------|--------| | `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection | | `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting | | `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting | | `--ocr 32px` | 48px | Tested | ~20-30% faster OCR | | `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax | | `--batch-concurrent` | off | Tested | Added overhead, no benefit | | `--skip-no-text` | off | Tested | Saves I/O, minor benefit | **Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):** - **1.9 hours** (SLOWER than 1.5h without flags) - **266/275 pages** (9 failed) - Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013) ### OPCIÓN B: Code changes (not implemented yet) | Change | Expected Impact | Complexity | |--------|----------------|------------| | Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low | | Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High | | Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low | --- ## 7. Known Issues ### MiniMax API Errors - **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content - **Frequency:** Occurs with long Chinese text blocks, especially in batch mode - **Workaround:** `--ignore-errors` skips failed pages ### Post-Translation Check Failures - The tool checks if translated text is actually in the target language - Sometimes valid Spanish translations fail the check (false negatives) - This causes unnecessary retries and can revert translations to original text - **Workaround:** Already handled by `--ignore-errors` ### Vertical Bubble Problem - Japanese manga uses vertical speech bubbles (narrow, tall) - Spanish text is horizontal and longer than Japanese - Text overflows or doesn't fit in narrow vertical bubbles - **Mitigation:** `font_size_offset: -10` reduces font size to fit better - **Known limitation:** Some vertical bubbles will always overflow ### OCR Accuracy - OCR sometimes misreads characters (especially damaged/low-quality scans) - OCR errors propagate to translation (garbage in → garbage out) - `48px` model is more accurate than `32px` --- ## 8. File Structure ``` fansub2/ ├── example/ # Source images │ ├── nhentai_652854/ # 275-page gallery (Chinese manga) │ ├── japanese.jpg # Single Japanese page test │ ├── english.jpg # Single English page test │ ├── chinese_sfw.webp # Single Chinese page test │ ├── coreano.jpg # Single Korean page test │ └── burbujascombinadas.webp # Single English page test ├── example-translated/ # Output translated images │ ├── nhentai_652854/ # 275 pages (1.5h, batch-size 1) │ ├── nhentai_652854_test/ # 275 pages (1.9h, batch-size 30 - SLOWER) │ ├── japanese.jpg # Latest: font_size_offset -10 │ └── ... (other translated files) ├── translate_config.json # Active translation config ├── OPTIMIZACIONES.md # Optimization notes └── manga-image-translator/ # The tool ├── .env # API keys (DO NOT COMMIT) ├── venv/ # Python virtual environment ├── manga_translator/ # Source code │ ├── translators/ # Translation backends │ │ ├── chatgpt.py # OpenAI-compatible (MiniMax) │ │ ├── common.py # Language codes, base classes │ │ ├── nllb.py # Facebook NLLB-200 (offline) │ │ ├── m2m100.py # Facebook M2M-100 (offline) │ │ └── keys.py # API key env vars │ ├── rendering/ # Text rendering │ │ ├── __init__.py # Main render pipeline │ │ └── text_render.py # Font/text rendering │ ├── detection/ # Text detection (DBNet) │ ├── ocr/ # OCR models │ ├── inpainting/ # Text erasure models │ ├── manga_translator.py # Main orchestrator │ ├── config.py # Config schema │ └── mode/local.py # Local batch mode ├── result/ # Debug output (with -v flag) └── README.md # Official documentation ``` --- ## 9. Running the Tool ### Single image ```powershell $env:PYTHONIOENCODING="utf-8" $env:PYTHONUTF8="1" & "manga-image-translator\venv\Scripts\python.exe" ` -m manga_translator local ` -i "example\japanese.jpg" ` -o "example-translated" ` --config-file "translate_config.json" ` --ignore-errors --overwrite ``` ### Full gallery ```powershell $env:PYTHONIOENCODING="utf-8" $env:PYTHONUTF8="1" & "manga-image-translator\venv\Scripts\python.exe" ` -m manga_translator local ` -i "example\nhentai_652854" ` -o "example-translated\nhentai_652854" ` --config-file "translate_config.json" ` --ignore-errors --overwrite ``` --- ## 10. Key Learnings 1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga 2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest 3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console 4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue 5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing 6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case 7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently