Files

renato97 9231d96305 feat: initial commit - manga-image-translator setup with MiniMax LLM for Spanish translation

2026-05-28 20:51:35 -03:00

12 KiB

Raw Permalink Blame History

LLM_READY.md - manga-image-translator Context Guide

Purpose: Complete context document for any AI working with this project. Last updated: 2026-05-28 Working directory: C:\Users\Administrator\Documents\fansub2

1. Project Overview

manga-image-translator is a tool that automatically translates text in manga/comic images. It:

Detects text regions in images (bounding boxes)
OCRs the text (reads what it says)
Translates the text to a target language
Inpaints (erases) the original text
Renders the translated text in the same position

Repository

Location: manga-image-translator/ (git child repo)
Language: Python (experimental version)
GPU: Not available — CPU + RAM only (VPS deployment planned)
Python venv: manga-image-translator/venv/

2. Translation Pipeline

Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output

Detection

Model: DBNet_resnet34 (default detector)
Default resolution: 2048px (configurable via detection_size)
Impact on speed: Reducing to 1024 is ~2-4x faster

OCR

Available models: 32px, 48px (default), 48px_ctc, mocr
Default: 48px (ConvNext backbone, more accurate)
32px: ResNet backbone, ~20-30% faster but less accurate
Impact: 48px recommended for reliability; 32px for speed

Translation (the critical part)

LLM translator: chatgpt (OpenAI-compatible API)
Offline translators: nllb, m2m100, mbart50, qwen2 (all slower and less accurate)
Target language: ESP (Spanish)

Inpainting

Available: lama_large (default), lama_mpe, default (AOT), sd, none
Default: lama_large — uses FFT (FourierUnit), expensive on CPU
AOT (default): Lightweight convolutions, ~3-5x faster on CPU
none: Skip inpainting entirely

Rendering

Available: default, manga2eng (recommended), manga2eng_pillow, none
Key settings: font_size_offset, font_size_minimum, no_hyphenation, alignment

3. Working Configuration (PROVEN)

This configuration was tested on 275 pages and completed successfully in ~1.5 hours.

Config file: `translate_config.json`

{
  "translator": {
    "translator": "chatgpt",
    "target_lang": "ESP"
  },
  "render": {
    "renderer": "manga2eng",
    "font_size_offset": -10,
    "font_size_minimum": 8,
    "no_hyphenation": true,
    "alignment": "center"
  }
}

Environment file: `manga-image-translator/.env`

OPENAI_API_KEY=<your-api-key>
OPENAI_API_BASE=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
CUSTOM_OPENAI_API_KEY=<your-api-key>
CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
CUSTOM_OPENAI_MODEL=MiniMax-M2.7

Proven command

$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\nhentai_652854" `
  -o "example-translated\nhentai_652854" `
  --config-file "translate_config.json" `
  --ignore-errors `
  --overwrite

Performance (275 pages)

Metric	Value
Pages processed	275/275
Time	~1.5 hours
API calls	~700
Translator	MiniMax-M2.7 via OpenAI-compatible API
Language	Japanese/Chinese → Spanish (ESP)

4. CLI Flags Reference

General

Flag	Description
`--ignore-errors`	Skip failed images instead of crashing (ESSENTIAL for batch jobs)
`--overwrite`	Overwrite existing translated files
`--skip-no-text`	Don't save images with no detected text
`-v`	Verbose output (saves intermediate images to `result/`)

Batch processing

Flag	Description
`--batch-size N`	Process N images per batch (default: 1)
`--batch-concurrent`	Use concurrent mode for batch translation

GPU (not available on VPS)

Flag	Description
`--use-gpu`	Use CUDA/MPS for all models
`--use-gpu-limited`	Use GPU for detection/OCR but CPU for offline translators

Config file options (in `translate_config.json`)

Key	Values	Default	Recommended
`translator.translator`	`chatgpt`, `nllb`, `m2m100`, `sugoi`, etc.	`sugoi`	`chatgpt`
`translator.target_lang`	`ESP`, `ENG`, `JPN`, etc.	`ENG`	`ESP`
`translator.translator_chain`	e.g. `"nllb:ENG;nllb:ESP"`	null	null
`render.renderer`	`default`, `manga2eng`, `manga2eng_pillow`, `none`	`default`	`manga2eng`
`render.font_size_offset`	integer	0	-10
`render.font_size_minimum`	integer	-1	8
`render.no_hyphenation`	boolean	false	true
`render.alignment`	`auto`, `left`, `center`, `right`	`auto`	`center`
`detector.detection_size`	integer	2048	1024 (faster)
`inpainter.inpainter`	`default`, `lama_large`, `lama_mpe`, `sd`, `none`	`lama_large`	`lama_large`
`inpainter.inpainting_size`	integer	2048	1024 (faster)
`ocr.ocr`	`32px`, `48px`, `48px_ctc`, `mocr`	`48px`	`48px`

5. Valid Language Codes (target_lang)

From manga_translator/translators/common.py:

Code	Language
`CHS`	Chinese (Simplified)
`CHT`	Chinese (Traditional)
`ENG`	English
`JPN`	Japanese
`KOR`	Korean
`ESP`	Spanish
`FRA`	French
`DEU`	German
`ITA`	Italian
`PTB`	Portuguese (Brazil)
`RUS`	Russian
`ARA`	Arabic
`THA`	Thai
`VIN`	Vietnamese
...	(25+ languages total)

6. Optimization Options (Tested Results)

OPCIÓN A: Config only (no code changes) — TESTED

Flag	Default	Tested	Impact
`--detection-size 1024`	2048	Not tested yet	~2-4x faster detection
`--inpainting-size 1024`	2048	Not tested yet	~2-4x faster inpainting
`--inpainter default` (AOT)	lama_large	Tested	~3-5x faster inpainting
`--ocr 32px`	48px	Tested	~20-30% faster OCR
`--batch-size 10-30`	1	Tested (30)	FAILED — error 2013 with MiniMax
`--batch-concurrent`	off	Tested	Added overhead, no benefit
`--skip-no-text`	off	Tested	Saves I/O, minor benefit

Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):

1.9 hours (SLOWER than 1.5h without flags)
266/275 pages (9 failed)
Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)

OPCIÓN B: Code changes (not implemented yet)

Change	Expected Impact	Complexity
Fix `_concurrent_translate_contexts` to use `asyncio.gather`	~30-40% faster	Low
Add `ProcessPoolExecutor` for detection/OCR	~50-60% faster	High
Increase `_MAX_TOKENS` from 4096 to 8192	Minor	Low

7. Known Issues

MiniMax API Errors

Error 400: bad_request_error (2013) — prompt too long or contains problematic content
Frequency: Occurs with long Chinese text blocks, especially in batch mode
Workaround: --ignore-errors skips failed pages

Post-Translation Check Failures

The tool checks if translated text is actually in the target language
Sometimes valid Spanish translations fail the check (false negatives)
This causes unnecessary retries and can revert translations to original text
Workaround: Already handled by --ignore-errors

Vertical Bubble Problem

Japanese manga uses vertical speech bubbles (narrow, tall)
Spanish text is horizontal and longer than Japanese
Text overflows or doesn't fit in narrow vertical bubbles
Mitigation: font_size_offset: -10 reduces font size to fit better
Known limitation: Some vertical bubbles will always overflow

OCR Accuracy

OCR sometimes misreads characters (especially damaged/low-quality scans)
OCR errors propagate to translation (garbage in → garbage out)
48px model is more accurate than 32px

8. File Structure

fansub2/
├── example/                          # Source images
│   ├── nhentai_652854/              # 275-page gallery (Chinese manga)
│   ├── japanese.jpg                 # Single Japanese page test
│   ├── english.jpg                  # Single English page test
│   ├── chinese_sfw.webp             # Single Chinese page test
│   ├── coreano.jpg                  # Single Korean page test
│   └── burbujascombinadas.webp      # Single English page test
├── example-translated/              # Output translated images
│   ├── nhentai_652854/             # 275 pages (1.5h, batch-size 1)
│   ├── nhentai_652854_test/        # 275 pages (1.9h, batch-size 30 - SLOWER)
│   ├── japanese.jpg                # Latest: font_size_offset -10
│   └── ... (other translated files)
├── translate_config.json            # Active translation config
├── OPTIMIZACIONES.md               # Optimization notes
└── manga-image-translator/          # The tool
    ├── .env                         # API keys (DO NOT COMMIT)
    ├── venv/                        # Python virtual environment
    ├── manga_translator/            # Source code
    │   ├── translators/             # Translation backends
    │   │   ├── chatgpt.py          # OpenAI-compatible (MiniMax)
    │   │   ├── common.py           # Language codes, base classes
    │   │   ├── nllb.py             # Facebook NLLB-200 (offline)
    │   │   ├── m2m100.py           # Facebook M2M-100 (offline)
    │   │   └── keys.py             # API key env vars
    │   ├── rendering/              # Text rendering
    │   │   ├── __init__.py         # Main render pipeline
    │   │   └── text_render.py      # Font/text rendering
    │   ├── detection/              # Text detection (DBNet)
    │   ├── ocr/                    # OCR models
    │   ├── inpainting/             # Text erasure models
    │   ├── manga_translator.py     # Main orchestrator
    │   ├── config.py               # Config schema
    │   └── mode/local.py           # Local batch mode
    ├── result/                      # Debug output (with -v flag)
    └── README.md                    # Official documentation

9. Running the Tool

Single image

$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\japanese.jpg" `
  -o "example-translated" `
  --config-file "translate_config.json" `
  --ignore-errors --overwrite

Full gallery

$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\nhentai_652854" `
  -o "example-translated\nhentai_652854" `
  --config-file "translate_config.json" `
  --ignore-errors --overwrite

10. Key Learnings

LLM translators beat offline models — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
batch-size > 1 is risky with LLMs — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
UTF-8 is mandatory on Windows — Must set PYTHONIOENCODING=utf-8 and PYTHONUTF8=1 or CJK characters crash the console
Vertical bubbles are a fundamental limitation — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
--ignore-errors is essential — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
AOT inpainter is faster on CPU — But lama_large produces better quality; trade-off depends on use case
manga2eng renderer is better than default — Handles text sizing and positioning more intelligently

12 KiB Raw Permalink Blame History