313 lines
12 KiB
Markdown
313 lines
12 KiB
Markdown
# LLM_READY.md - manga-image-translator Context Guide
|
|
|
|
> **Purpose:** Complete context document for any AI working with this project.
|
|
> **Last updated:** 2026-05-28
|
|
> **Working directory:** `C:\Users\Administrator\Documents\fansub2`
|
|
|
|
---
|
|
|
|
## 1. Project Overview
|
|
|
|
**manga-image-translator** is a tool that automatically translates text in manga/comic images. It:
|
|
1. **Detects** text regions in images (bounding boxes)
|
|
2. **OCRs** the text (reads what it says)
|
|
3. **Translates** the text to a target language
|
|
4. **Inpaints** (erases) the original text
|
|
5. **Renders** the translated text in the same position
|
|
|
|
### Repository
|
|
- **Location:** `manga-image-translator/` (git child repo)
|
|
- **Language:** Python (experimental version)
|
|
- **GPU:** Not available — CPU + RAM only (VPS deployment planned)
|
|
- **Python venv:** `manga-image-translator/venv/`
|
|
|
|
---
|
|
|
|
## 2. Translation Pipeline
|
|
|
|
```
|
|
Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output
|
|
```
|
|
|
|
### Detection
|
|
- **Model:** DBNet_resnet34 (default detector)
|
|
- **Default resolution:** 2048px (configurable via `detection_size`)
|
|
- **Impact on speed:** Reducing to 1024 is ~2-4x faster
|
|
|
|
### OCR
|
|
- **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr`
|
|
- **Default:** `48px` (ConvNext backbone, more accurate)
|
|
- **32px:** ResNet backbone, ~20-30% faster but less accurate
|
|
- **Impact:** 48px recommended for reliability; 32px for speed
|
|
|
|
### Translation (the critical part)
|
|
- **LLM translator:** `chatgpt` (OpenAI-compatible API)
|
|
- **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate)
|
|
- **Target language:** `ESP` (Spanish)
|
|
|
|
### Inpainting
|
|
- **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none`
|
|
- **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU
|
|
- **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU
|
|
- **`none`:** Skip inpainting entirely
|
|
|
|
### Rendering
|
|
- **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none`
|
|
- **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment`
|
|
|
|
---
|
|
|
|
## 3. Working Configuration (PROVEN)
|
|
|
|
This configuration was tested on 275 pages and completed successfully in ~1.5 hours.
|
|
|
|
### Config file: `translate_config.json`
|
|
```json
|
|
{
|
|
"translator": {
|
|
"translator": "chatgpt",
|
|
"target_lang": "ESP"
|
|
},
|
|
"render": {
|
|
"renderer": "manga2eng",
|
|
"font_size_offset": -10,
|
|
"font_size_minimum": 8,
|
|
"no_hyphenation": true,
|
|
"alignment": "center"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Environment file: `manga-image-translator/.env`
|
|
```
|
|
OPENAI_API_KEY=<your-api-key>
|
|
OPENAI_API_BASE=https://api.minimax.io/v1
|
|
OPENAI_MODEL=MiniMax-M2.7
|
|
CUSTOM_OPENAI_API_KEY=<your-api-key>
|
|
CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
|
|
CUSTOM_OPENAI_MODEL=MiniMax-M2.7
|
|
```
|
|
|
|
### Proven command
|
|
```powershell
|
|
$env:PYTHONIOENCODING="utf-8"
|
|
$env:PYTHONUTF8="1"
|
|
& "manga-image-translator\venv\Scripts\python.exe" `
|
|
-m manga_translator local `
|
|
-i "example\nhentai_652854" `
|
|
-o "example-translated\nhentai_652854" `
|
|
--config-file "translate_config.json" `
|
|
--ignore-errors `
|
|
--overwrite
|
|
```
|
|
|
|
### Performance (275 pages)
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Pages processed | 275/275 |
|
|
| Time | ~1.5 hours |
|
|
| API calls | ~700 |
|
|
| Translator | MiniMax-M2.7 via OpenAI-compatible API |
|
|
| Language | Japanese/Chinese → Spanish (ESP) |
|
|
|
|
---
|
|
|
|
## 4. CLI Flags Reference
|
|
|
|
### General
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) |
|
|
| `--overwrite` | Overwrite existing translated files |
|
|
| `--skip-no-text` | Don't save images with no detected text |
|
|
| `-v` | Verbose output (saves intermediate images to `result/`) |
|
|
|
|
### Batch processing
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--batch-size N` | Process N images per batch (default: 1) |
|
|
| `--batch-concurrent` | Use concurrent mode for batch translation |
|
|
|
|
### GPU (not available on VPS)
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--use-gpu` | Use CUDA/MPS for all models |
|
|
| `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators |
|
|
|
|
### Config file options (in `translate_config.json`)
|
|
| Key | Values | Default | Recommended |
|
|
|-----|--------|---------|-------------|
|
|
| `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` |
|
|
| `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` |
|
|
| `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null |
|
|
| `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` |
|
|
| `render.font_size_offset` | integer | 0 | -10 |
|
|
| `render.font_size_minimum` | integer | -1 | 8 |
|
|
| `render.no_hyphenation` | boolean | false | true |
|
|
| `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` |
|
|
| `detector.detection_size` | integer | 2048 | 1024 (faster) |
|
|
| `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` |
|
|
| `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) |
|
|
| `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` |
|
|
|
|
---
|
|
|
|
## 5. Valid Language Codes (target_lang)
|
|
|
|
From `manga_translator/translators/common.py`:
|
|
|
|
| Code | Language |
|
|
|------|----------|
|
|
| `CHS` | Chinese (Simplified) |
|
|
| `CHT` | Chinese (Traditional) |
|
|
| `ENG` | English |
|
|
| `JPN` | Japanese |
|
|
| `KOR` | Korean |
|
|
| `ESP` | Spanish |
|
|
| `FRA` | French |
|
|
| `DEU` | German |
|
|
| `ITA` | Italian |
|
|
| `PTB` | Portuguese (Brazil) |
|
|
| `RUS` | Russian |
|
|
| `ARA` | Arabic |
|
|
| `THA` | Thai |
|
|
| `VIN` | Vietnamese |
|
|
| ... | (25+ languages total) |
|
|
|
|
---
|
|
|
|
## 6. Optimization Options (Tested Results)
|
|
|
|
### OPCIÓN A: Config only (no code changes) — TESTED
|
|
| Flag | Default | Tested | Impact |
|
|
|------|---------|--------|--------|
|
|
| `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection |
|
|
| `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting |
|
|
| `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting |
|
|
| `--ocr 32px` | 48px | Tested | ~20-30% faster OCR |
|
|
| `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax |
|
|
| `--batch-concurrent` | off | Tested | Added overhead, no benefit |
|
|
| `--skip-no-text` | off | Tested | Saves I/O, minor benefit |
|
|
|
|
**Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):**
|
|
- **1.9 hours** (SLOWER than 1.5h without flags)
|
|
- **266/275 pages** (9 failed)
|
|
- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)
|
|
|
|
### OPCIÓN B: Code changes (not implemented yet)
|
|
| Change | Expected Impact | Complexity |
|
|
|--------|----------------|------------|
|
|
| Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low |
|
|
| Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High |
|
|
| Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low |
|
|
|
|
---
|
|
|
|
## 7. Known Issues
|
|
|
|
### MiniMax API Errors
|
|
- **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content
|
|
- **Frequency:** Occurs with long Chinese text blocks, especially in batch mode
|
|
- **Workaround:** `--ignore-errors` skips failed pages
|
|
|
|
### Post-Translation Check Failures
|
|
- The tool checks if translated text is actually in the target language
|
|
- Sometimes valid Spanish translations fail the check (false negatives)
|
|
- This causes unnecessary retries and can revert translations to original text
|
|
- **Workaround:** Already handled by `--ignore-errors`
|
|
|
|
### Vertical Bubble Problem
|
|
- Japanese manga uses vertical speech bubbles (narrow, tall)
|
|
- Spanish text is horizontal and longer than Japanese
|
|
- Text overflows or doesn't fit in narrow vertical bubbles
|
|
- **Mitigation:** `font_size_offset: -10` reduces font size to fit better
|
|
- **Known limitation:** Some vertical bubbles will always overflow
|
|
|
|
### OCR Accuracy
|
|
- OCR sometimes misreads characters (especially damaged/low-quality scans)
|
|
- OCR errors propagate to translation (garbage in → garbage out)
|
|
- `48px` model is more accurate than `32px`
|
|
|
|
---
|
|
|
|
## 8. File Structure
|
|
|
|
```
|
|
fansub2/
|
|
├── example/ # Source images
|
|
│ ├── nhentai_652854/ # 275-page gallery (Chinese manga)
|
|
│ ├── japanese.jpg # Single Japanese page test
|
|
│ ├── english.jpg # Single English page test
|
|
│ ├── chinese_sfw.webp # Single Chinese page test
|
|
│ ├── coreano.jpg # Single Korean page test
|
|
│ └── burbujascombinadas.webp # Single English page test
|
|
├── example-translated/ # Output translated images
|
|
│ ├── nhentai_652854/ # 275 pages (1.5h, batch-size 1)
|
|
│ ├── nhentai_652854_test/ # 275 pages (1.9h, batch-size 30 - SLOWER)
|
|
│ ├── japanese.jpg # Latest: font_size_offset -10
|
|
│ └── ... (other translated files)
|
|
├── translate_config.json # Active translation config
|
|
├── OPTIMIZACIONES.md # Optimization notes
|
|
└── manga-image-translator/ # The tool
|
|
├── .env # API keys (DO NOT COMMIT)
|
|
├── venv/ # Python virtual environment
|
|
├── manga_translator/ # Source code
|
|
│ ├── translators/ # Translation backends
|
|
│ │ ├── chatgpt.py # OpenAI-compatible (MiniMax)
|
|
│ │ ├── common.py # Language codes, base classes
|
|
│ │ ├── nllb.py # Facebook NLLB-200 (offline)
|
|
│ │ ├── m2m100.py # Facebook M2M-100 (offline)
|
|
│ │ └── keys.py # API key env vars
|
|
│ ├── rendering/ # Text rendering
|
|
│ │ ├── __init__.py # Main render pipeline
|
|
│ │ └── text_render.py # Font/text rendering
|
|
│ ├── detection/ # Text detection (DBNet)
|
|
│ ├── ocr/ # OCR models
|
|
│ ├── inpainting/ # Text erasure models
|
|
│ ├── manga_translator.py # Main orchestrator
|
|
│ ├── config.py # Config schema
|
|
│ └── mode/local.py # Local batch mode
|
|
├── result/ # Debug output (with -v flag)
|
|
└── README.md # Official documentation
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Running the Tool
|
|
|
|
### Single image
|
|
```powershell
|
|
$env:PYTHONIOENCODING="utf-8"
|
|
$env:PYTHONUTF8="1"
|
|
& "manga-image-translator\venv\Scripts\python.exe" `
|
|
-m manga_translator local `
|
|
-i "example\japanese.jpg" `
|
|
-o "example-translated" `
|
|
--config-file "translate_config.json" `
|
|
--ignore-errors --overwrite
|
|
```
|
|
|
|
### Full gallery
|
|
```powershell
|
|
$env:PYTHONIOENCODING="utf-8"
|
|
$env:PYTHONUTF8="1"
|
|
& "manga-image-translator\venv\Scripts\python.exe" `
|
|
-m manga_translator local `
|
|
-i "example\nhentai_652854" `
|
|
-o "example-translated\nhentai_652854" `
|
|
--config-file "translate_config.json" `
|
|
--ignore-errors --overwrite
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Key Learnings
|
|
|
|
1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
|
|
2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
|
|
3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console
|
|
4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
|
|
5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
|
|
6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case
|
|
7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently
|