Files
rv_fansub/LLM_READY.md

12 KiB

LLM_READY.md - manga-image-translator Context Guide

Purpose: Complete context document for any AI working with this project. Last updated: 2026-05-28 Working directory: C:\Users\Administrator\Documents\fansub2


1. Project Overview

manga-image-translator is a tool that automatically translates text in manga/comic images. It:

  1. Detects text regions in images (bounding boxes)
  2. OCRs the text (reads what it says)
  3. Translates the text to a target language
  4. Inpaints (erases) the original text
  5. Renders the translated text in the same position

Repository

  • Location: manga-image-translator/ (git child repo)
  • Language: Python (experimental version)
  • GPU: Not available — CPU + RAM only (VPS deployment planned)
  • Python venv: manga-image-translator/venv/

2. Translation Pipeline

Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output

Detection

  • Model: DBNet_resnet34 (default detector)
  • Default resolution: 2048px (configurable via detection_size)
  • Impact on speed: Reducing to 1024 is ~2-4x faster

OCR

  • Available models: 32px, 48px (default), 48px_ctc, mocr
  • Default: 48px (ConvNext backbone, more accurate)
  • 32px: ResNet backbone, ~20-30% faster but less accurate
  • Impact: 48px recommended for reliability; 32px for speed

Translation (the critical part)

  • LLM translator: chatgpt (OpenAI-compatible API)
  • Offline translators: nllb, m2m100, mbart50, qwen2 (all slower and less accurate)
  • Target language: ESP (Spanish)

Inpainting

  • Available: lama_large (default), lama_mpe, default (AOT), sd, none
  • Default: lama_large — uses FFT (FourierUnit), expensive on CPU
  • AOT (default): Lightweight convolutions, ~3-5x faster on CPU
  • none: Skip inpainting entirely

Rendering

  • Available: default, manga2eng (recommended), manga2eng_pillow, none
  • Key settings: font_size_offset, font_size_minimum, no_hyphenation, alignment

3. Working Configuration (PROVEN)

This configuration was tested on 275 pages and completed successfully in ~1.5 hours.

Config file: translate_config.json

{
  "translator": {
    "translator": "chatgpt",
    "target_lang": "ESP"
  },
  "render": {
    "renderer": "manga2eng",
    "font_size_offset": -10,
    "font_size_minimum": 8,
    "no_hyphenation": true,
    "alignment": "center"
  }
}

Environment file: manga-image-translator/.env

OPENAI_API_KEY=<your-api-key>
OPENAI_API_BASE=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
CUSTOM_OPENAI_API_KEY=<your-api-key>
CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
CUSTOM_OPENAI_MODEL=MiniMax-M2.7

Proven command

$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\nhentai_652854" `
  -o "example-translated\nhentai_652854" `
  --config-file "translate_config.json" `
  --ignore-errors `
  --overwrite

Performance (275 pages)

Metric Value
Pages processed 275/275
Time ~1.5 hours
API calls ~700
Translator MiniMax-M2.7 via OpenAI-compatible API
Language Japanese/Chinese → Spanish (ESP)

4. CLI Flags Reference

General

Flag Description
--ignore-errors Skip failed images instead of crashing (ESSENTIAL for batch jobs)
--overwrite Overwrite existing translated files
--skip-no-text Don't save images with no detected text
-v Verbose output (saves intermediate images to result/)

Batch processing

Flag Description
--batch-size N Process N images per batch (default: 1)
--batch-concurrent Use concurrent mode for batch translation

GPU (not available on VPS)

Flag Description
--use-gpu Use CUDA/MPS for all models
--use-gpu-limited Use GPU for detection/OCR but CPU for offline translators

Config file options (in translate_config.json)

Key Values Default Recommended
translator.translator chatgpt, nllb, m2m100, sugoi, etc. sugoi chatgpt
translator.target_lang ESP, ENG, JPN, etc. ENG ESP
translator.translator_chain e.g. "nllb:ENG;nllb:ESP" null null
render.renderer default, manga2eng, manga2eng_pillow, none default manga2eng
render.font_size_offset integer 0 -10
render.font_size_minimum integer -1 8
render.no_hyphenation boolean false true
render.alignment auto, left, center, right auto center
detector.detection_size integer 2048 1024 (faster)
inpainter.inpainter default, lama_large, lama_mpe, sd, none lama_large lama_large
inpainter.inpainting_size integer 2048 1024 (faster)
ocr.ocr 32px, 48px, 48px_ctc, mocr 48px 48px

5. Valid Language Codes (target_lang)

From manga_translator/translators/common.py:

Code Language
CHS Chinese (Simplified)
CHT Chinese (Traditional)
ENG English
JPN Japanese
KOR Korean
ESP Spanish
FRA French
DEU German
ITA Italian
PTB Portuguese (Brazil)
RUS Russian
ARA Arabic
THA Thai
VIN Vietnamese
... (25+ languages total)

6. Optimization Options (Tested Results)

OPCIÓN A: Config only (no code changes) — TESTED

Flag Default Tested Impact
--detection-size 1024 2048 Not tested yet ~2-4x faster detection
--inpainting-size 1024 2048 Not tested yet ~2-4x faster inpainting
--inpainter default (AOT) lama_large Tested ~3-5x faster inpainting
--ocr 32px 48px Tested ~20-30% faster OCR
--batch-size 10-30 1 Tested (30) FAILED — error 2013 with MiniMax
--batch-concurrent off Tested Added overhead, no benefit
--skip-no-text off Tested Saves I/O, minor benefit

Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):

  • 1.9 hours (SLOWER than 1.5h without flags)
  • 266/275 pages (9 failed)
  • Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)

OPCIÓN B: Code changes (not implemented yet)

Change Expected Impact Complexity
Fix _concurrent_translate_contexts to use asyncio.gather ~30-40% faster Low
Add ProcessPoolExecutor for detection/OCR ~50-60% faster High
Increase _MAX_TOKENS from 4096 to 8192 Minor Low

7. Known Issues

MiniMax API Errors

  • Error 400: bad_request_error (2013) — prompt too long or contains problematic content
  • Frequency: Occurs with long Chinese text blocks, especially in batch mode
  • Workaround: --ignore-errors skips failed pages

Post-Translation Check Failures

  • The tool checks if translated text is actually in the target language
  • Sometimes valid Spanish translations fail the check (false negatives)
  • This causes unnecessary retries and can revert translations to original text
  • Workaround: Already handled by --ignore-errors

Vertical Bubble Problem

  • Japanese manga uses vertical speech bubbles (narrow, tall)
  • Spanish text is horizontal and longer than Japanese
  • Text overflows or doesn't fit in narrow vertical bubbles
  • Mitigation: font_size_offset: -10 reduces font size to fit better
  • Known limitation: Some vertical bubbles will always overflow

OCR Accuracy

  • OCR sometimes misreads characters (especially damaged/low-quality scans)
  • OCR errors propagate to translation (garbage in → garbage out)
  • 48px model is more accurate than 32px

8. File Structure

fansub2/
├── example/                          # Source images
│   ├── nhentai_652854/              # 275-page gallery (Chinese manga)
│   ├── japanese.jpg                 # Single Japanese page test
│   ├── english.jpg                  # Single English page test
│   ├── chinese_sfw.webp             # Single Chinese page test
│   ├── coreano.jpg                  # Single Korean page test
│   └── burbujascombinadas.webp      # Single English page test
├── example-translated/              # Output translated images
│   ├── nhentai_652854/             # 275 pages (1.5h, batch-size 1)
│   ├── nhentai_652854_test/        # 275 pages (1.9h, batch-size 30 - SLOWER)
│   ├── japanese.jpg                # Latest: font_size_offset -10
│   └── ... (other translated files)
├── translate_config.json            # Active translation config
├── OPTIMIZACIONES.md               # Optimization notes
└── manga-image-translator/          # The tool
    ├── .env                         # API keys (DO NOT COMMIT)
    ├── venv/                        # Python virtual environment
    ├── manga_translator/            # Source code
    │   ├── translators/             # Translation backends
    │   │   ├── chatgpt.py          # OpenAI-compatible (MiniMax)
    │   │   ├── common.py           # Language codes, base classes
    │   │   ├── nllb.py             # Facebook NLLB-200 (offline)
    │   │   ├── m2m100.py           # Facebook M2M-100 (offline)
    │   │   └── keys.py             # API key env vars
    │   ├── rendering/              # Text rendering
    │   │   ├── __init__.py         # Main render pipeline
    │   │   └── text_render.py      # Font/text rendering
    │   ├── detection/              # Text detection (DBNet)
    │   ├── ocr/                    # OCR models
    │   ├── inpainting/             # Text erasure models
    │   ├── manga_translator.py     # Main orchestrator
    │   ├── config.py               # Config schema
    │   └── mode/local.py           # Local batch mode
    ├── result/                      # Debug output (with -v flag)
    └── README.md                    # Official documentation

9. Running the Tool

Single image

$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\japanese.jpg" `
  -o "example-translated" `
  --config-file "translate_config.json" `
  --ignore-errors --overwrite
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\nhentai_652854" `
  -o "example-translated\nhentai_652854" `
  --config-file "translate_config.json" `
  --ignore-errors --overwrite

10. Key Learnings

  1. LLM translators beat offline models — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
  2. batch-size > 1 is risky with LLMs — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
  3. UTF-8 is mandatory on Windows — Must set PYTHONIOENCODING=utf-8 and PYTHONUTF8=1 or CJK characters crash the console
  4. Vertical bubbles are a fundamental limitation — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
  5. --ignore-errors is essential — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
  6. AOT inpainter is faster on CPU — But lama_large produces better quality; trade-off depends on use case
  7. manga2eng renderer is better than default — Handles text sizing and positioning more intelligently