LLM_READY.md - manga-image-translator Context Guide
Purpose: Complete context document for any AI working with this project.
Last updated: 2026-05-28
Working directory: C:\Users\Administrator\Documents\fansub2
1. Project Overview
manga-image-translator is a tool that automatically translates text in manga/comic images. It:
- Detects text regions in images (bounding boxes)
- OCRs the text (reads what it says)
- Translates the text to a target language
- Inpaints (erases) the original text
- Renders the translated text in the same position
Repository
- Location:
manga-image-translator/ (git child repo)
- Language: Python (experimental version)
- GPU: Not available — CPU + RAM only (VPS deployment planned)
- Python venv:
manga-image-translator/venv/
2. Translation Pipeline
Detection
- Model: DBNet_resnet34 (default detector)
- Default resolution: 2048px (configurable via
detection_size)
- Impact on speed: Reducing to 1024 is ~2-4x faster
OCR
- Available models:
32px, 48px (default), 48px_ctc, mocr
- Default:
48px (ConvNext backbone, more accurate)
- 32px: ResNet backbone, ~20-30% faster but less accurate
- Impact: 48px recommended for reliability; 32px for speed
Translation (the critical part)
- LLM translator:
chatgpt (OpenAI-compatible API)
- Offline translators:
nllb, m2m100, mbart50, qwen2 (all slower and less accurate)
- Target language:
ESP (Spanish)
Inpainting
- Available:
lama_large (default), lama_mpe, default (AOT), sd, none
- Default:
lama_large — uses FFT (FourierUnit), expensive on CPU
- AOT (
default): Lightweight convolutions, ~3-5x faster on CPU
none: Skip inpainting entirely
Rendering
- Available:
default, manga2eng (recommended), manga2eng_pillow, none
- Key settings:
font_size_offset, font_size_minimum, no_hyphenation, alignment
3. Working Configuration (PROVEN)
This configuration was tested on 275 pages and completed successfully in ~1.5 hours.
Config file: translate_config.json
Environment file: manga-image-translator/.env
Proven command
Performance (275 pages)
| Metric |
Value |
| Pages processed |
275/275 |
| Time |
~1.5 hours |
| API calls |
~700 |
| Translator |
MiniMax-M2.7 via OpenAI-compatible API |
| Language |
Japanese/Chinese → Spanish (ESP) |
4. CLI Flags Reference
General
| Flag |
Description |
--ignore-errors |
Skip failed images instead of crashing (ESSENTIAL for batch jobs) |
--overwrite |
Overwrite existing translated files |
--skip-no-text |
Don't save images with no detected text |
-v |
Verbose output (saves intermediate images to result/) |
Batch processing
| Flag |
Description |
--batch-size N |
Process N images per batch (default: 1) |
--batch-concurrent |
Use concurrent mode for batch translation |
GPU (not available on VPS)
| Flag |
Description |
--use-gpu |
Use CUDA/MPS for all models |
--use-gpu-limited |
Use GPU for detection/OCR but CPU for offline translators |
Config file options (in translate_config.json)
| Key |
Values |
Default |
Recommended |
translator.translator |
chatgpt, nllb, m2m100, sugoi, etc. |
sugoi |
chatgpt |
translator.target_lang |
ESP, ENG, JPN, etc. |
ENG |
ESP |
translator.translator_chain |
e.g. "nllb:ENG;nllb:ESP" |
null |
null |
render.renderer |
default, manga2eng, manga2eng_pillow, none |
default |
manga2eng |
render.font_size_offset |
integer |
0 |
-10 |
render.font_size_minimum |
integer |
-1 |
8 |
render.no_hyphenation |
boolean |
false |
true |
render.alignment |
auto, left, center, right |
auto |
center |
detector.detection_size |
integer |
2048 |
1024 (faster) |
inpainter.inpainter |
default, lama_large, lama_mpe, sd, none |
lama_large |
lama_large |
inpainter.inpainting_size |
integer |
2048 |
1024 (faster) |
ocr.ocr |
32px, 48px, 48px_ctc, mocr |
48px |
48px |
5. Valid Language Codes (target_lang)
From manga_translator/translators/common.py:
| Code |
Language |
CHS |
Chinese (Simplified) |
CHT |
Chinese (Traditional) |
ENG |
English |
JPN |
Japanese |
KOR |
Korean |
ESP |
Spanish |
FRA |
French |
DEU |
German |
ITA |
Italian |
PTB |
Portuguese (Brazil) |
RUS |
Russian |
ARA |
Arabic |
THA |
Thai |
VIN |
Vietnamese |
| ... |
(25+ languages total) |
6. Optimization Options (Tested Results)
OPCIÓN A: Config only (no code changes) — TESTED
| Flag |
Default |
Tested |
Impact |
--detection-size 1024 |
2048 |
Not tested yet |
~2-4x faster detection |
--inpainting-size 1024 |
2048 |
Not tested yet |
~2-4x faster inpainting |
--inpainter default (AOT) |
lama_large |
Tested |
~3-5x faster inpainting |
--ocr 32px |
48px |
Tested |
~20-30% faster OCR |
--batch-size 10-30 |
1 |
Tested (30) |
FAILED — error 2013 with MiniMax |
--batch-concurrent |
off |
Tested |
Added overhead, no benefit |
--skip-no-text |
off |
Tested |
Saves I/O, minor benefit |
Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):
- 1.9 hours (SLOWER than 1.5h without flags)
- 266/275 pages (9 failed)
- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)
OPCIÓN B: Code changes (not implemented yet)
| Change |
Expected Impact |
Complexity |
Fix _concurrent_translate_contexts to use asyncio.gather |
~30-40% faster |
Low |
Add ProcessPoolExecutor for detection/OCR |
~50-60% faster |
High |
Increase _MAX_TOKENS from 4096 to 8192 |
Minor |
Low |
7. Known Issues
MiniMax API Errors
- Error 400:
bad_request_error (2013) — prompt too long or contains problematic content
- Frequency: Occurs with long Chinese text blocks, especially in batch mode
- Workaround:
--ignore-errors skips failed pages
Post-Translation Check Failures
- The tool checks if translated text is actually in the target language
- Sometimes valid Spanish translations fail the check (false negatives)
- This causes unnecessary retries and can revert translations to original text
- Workaround: Already handled by
--ignore-errors
Vertical Bubble Problem
- Japanese manga uses vertical speech bubbles (narrow, tall)
- Spanish text is horizontal and longer than Japanese
- Text overflows or doesn't fit in narrow vertical bubbles
- Mitigation:
font_size_offset: -10 reduces font size to fit better
- Known limitation: Some vertical bubbles will always overflow
OCR Accuracy
- OCR sometimes misreads characters (especially damaged/low-quality scans)
- OCR errors propagate to translation (garbage in → garbage out)
48px model is more accurate than 32px
8. File Structure
9. Running the Tool
Single image
Full gallery
10. Key Learnings
- LLM translators beat offline models — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
- batch-size > 1 is risky with LLMs — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
- UTF-8 is mandatory on Windows — Must set
PYTHONIOENCODING=utf-8 and PYTHONUTF8=1 or CJK characters crash the console
- Vertical bubbles are a fundamental limitation — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
--ignore-errors is essential — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
- AOT inpainter is faster on CPU — But
lama_large produces better quality; trade-off depends on use case
manga2eng renderer is better than default — Handles text sizing and positioning more intelligently