Files
rv_fansub/LLM_READY.md

313 lines
12 KiB
Markdown

# LLM_READY.md - manga-image-translator Context Guide
> **Purpose:** Complete context document for any AI working with this project.
> **Last updated:** 2026-05-28
> **Working directory:** `C:\Users\Administrator\Documents\fansub2`
---
## 1. Project Overview
**manga-image-translator** is a tool that automatically translates text in manga/comic images. It:
1. **Detects** text regions in images (bounding boxes)
2. **OCRs** the text (reads what it says)
3. **Translates** the text to a target language
4. **Inpaints** (erases) the original text
5. **Renders** the translated text in the same position
### Repository
- **Location:** `manga-image-translator/` (git child repo)
- **Language:** Python (experimental version)
- **GPU:** Not available — CPU + RAM only (VPS deployment planned)
- **Python venv:** `manga-image-translator/venv/`
---
## 2. Translation Pipeline
```
Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output
```
### Detection
- **Model:** DBNet_resnet34 (default detector)
- **Default resolution:** 2048px (configurable via `detection_size`)
- **Impact on speed:** Reducing to 1024 is ~2-4x faster
### OCR
- **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr`
- **Default:** `48px` (ConvNext backbone, more accurate)
- **32px:** ResNet backbone, ~20-30% faster but less accurate
- **Impact:** 48px recommended for reliability; 32px for speed
### Translation (the critical part)
- **LLM translator:** `chatgpt` (OpenAI-compatible API)
- **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate)
- **Target language:** `ESP` (Spanish)
### Inpainting
- **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none`
- **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU
- **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU
- **`none`:** Skip inpainting entirely
### Rendering
- **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none`
- **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment`
---
## 3. Working Configuration (PROVEN)
This configuration was tested on 275 pages and completed successfully in ~1.5 hours.
### Config file: `translate_config.json`
```json
{
"translator": {
"translator": "chatgpt",
"target_lang": "ESP"
},
"render": {
"renderer": "manga2eng",
"font_size_offset": -10,
"font_size_minimum": 8,
"no_hyphenation": true,
"alignment": "center"
}
}
```
### Environment file: `manga-image-translator/.env`
```
OPENAI_API_KEY=<your-api-key>
OPENAI_API_BASE=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
CUSTOM_OPENAI_API_KEY=<your-api-key>
CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
CUSTOM_OPENAI_MODEL=MiniMax-M2.7
```
### Proven command
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
-m manga_translator local `
-i "example\nhentai_652854" `
-o "example-translated\nhentai_652854" `
--config-file "translate_config.json" `
--ignore-errors `
--overwrite
```
### Performance (275 pages)
| Metric | Value |
|--------|-------|
| Pages processed | 275/275 |
| Time | ~1.5 hours |
| API calls | ~700 |
| Translator | MiniMax-M2.7 via OpenAI-compatible API |
| Language | Japanese/Chinese → Spanish (ESP) |
---
## 4. CLI Flags Reference
### General
| Flag | Description |
|------|-------------|
| `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) |
| `--overwrite` | Overwrite existing translated files |
| `--skip-no-text` | Don't save images with no detected text |
| `-v` | Verbose output (saves intermediate images to `result/`) |
### Batch processing
| Flag | Description |
|------|-------------|
| `--batch-size N` | Process N images per batch (default: 1) |
| `--batch-concurrent` | Use concurrent mode for batch translation |
### GPU (not available on VPS)
| Flag | Description |
|------|-------------|
| `--use-gpu` | Use CUDA/MPS for all models |
| `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators |
### Config file options (in `translate_config.json`)
| Key | Values | Default | Recommended |
|-----|--------|---------|-------------|
| `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` |
| `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` |
| `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null |
| `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` |
| `render.font_size_offset` | integer | 0 | -10 |
| `render.font_size_minimum` | integer | -1 | 8 |
| `render.no_hyphenation` | boolean | false | true |
| `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` |
| `detector.detection_size` | integer | 2048 | 1024 (faster) |
| `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` |
| `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) |
| `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` |
---
## 5. Valid Language Codes (target_lang)
From `manga_translator/translators/common.py`:
| Code | Language |
|------|----------|
| `CHS` | Chinese (Simplified) |
| `CHT` | Chinese (Traditional) |
| `ENG` | English |
| `JPN` | Japanese |
| `KOR` | Korean |
| `ESP` | Spanish |
| `FRA` | French |
| `DEU` | German |
| `ITA` | Italian |
| `PTB` | Portuguese (Brazil) |
| `RUS` | Russian |
| `ARA` | Arabic |
| `THA` | Thai |
| `VIN` | Vietnamese |
| ... | (25+ languages total) |
---
## 6. Optimization Options (Tested Results)
### OPCIÓN A: Config only (no code changes) — TESTED
| Flag | Default | Tested | Impact |
|------|---------|--------|--------|
| `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection |
| `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting |
| `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting |
| `--ocr 32px` | 48px | Tested | ~20-30% faster OCR |
| `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax |
| `--batch-concurrent` | off | Tested | Added overhead, no benefit |
| `--skip-no-text` | off | Tested | Saves I/O, minor benefit |
**Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):**
- **1.9 hours** (SLOWER than 1.5h without flags)
- **266/275 pages** (9 failed)
- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)
### OPCIÓN B: Code changes (not implemented yet)
| Change | Expected Impact | Complexity |
|--------|----------------|------------|
| Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low |
| Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High |
| Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low |
---
## 7. Known Issues
### MiniMax API Errors
- **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content
- **Frequency:** Occurs with long Chinese text blocks, especially in batch mode
- **Workaround:** `--ignore-errors` skips failed pages
### Post-Translation Check Failures
- The tool checks if translated text is actually in the target language
- Sometimes valid Spanish translations fail the check (false negatives)
- This causes unnecessary retries and can revert translations to original text
- **Workaround:** Already handled by `--ignore-errors`
### Vertical Bubble Problem
- Japanese manga uses vertical speech bubbles (narrow, tall)
- Spanish text is horizontal and longer than Japanese
- Text overflows or doesn't fit in narrow vertical bubbles
- **Mitigation:** `font_size_offset: -10` reduces font size to fit better
- **Known limitation:** Some vertical bubbles will always overflow
### OCR Accuracy
- OCR sometimes misreads characters (especially damaged/low-quality scans)
- OCR errors propagate to translation (garbage in → garbage out)
- `48px` model is more accurate than `32px`
---
## 8. File Structure
```
fansub2/
├── example/ # Source images
│ ├── nhentai_652854/ # 275-page gallery (Chinese manga)
│ ├── japanese.jpg # Single Japanese page test
│ ├── english.jpg # Single English page test
│ ├── chinese_sfw.webp # Single Chinese page test
│ ├── coreano.jpg # Single Korean page test
│ └── burbujascombinadas.webp # Single English page test
├── example-translated/ # Output translated images
│ ├── nhentai_652854/ # 275 pages (1.5h, batch-size 1)
│ ├── nhentai_652854_test/ # 275 pages (1.9h, batch-size 30 - SLOWER)
│ ├── japanese.jpg # Latest: font_size_offset -10
│ └── ... (other translated files)
├── translate_config.json # Active translation config
├── OPTIMIZACIONES.md # Optimization notes
└── manga-image-translator/ # The tool
├── .env # API keys (DO NOT COMMIT)
├── venv/ # Python virtual environment
├── manga_translator/ # Source code
│ ├── translators/ # Translation backends
│ │ ├── chatgpt.py # OpenAI-compatible (MiniMax)
│ │ ├── common.py # Language codes, base classes
│ │ ├── nllb.py # Facebook NLLB-200 (offline)
│ │ ├── m2m100.py # Facebook M2M-100 (offline)
│ │ └── keys.py # API key env vars
│ ├── rendering/ # Text rendering
│ │ ├── __init__.py # Main render pipeline
│ │ └── text_render.py # Font/text rendering
│ ├── detection/ # Text detection (DBNet)
│ ├── ocr/ # OCR models
│ ├── inpainting/ # Text erasure models
│ ├── manga_translator.py # Main orchestrator
│ ├── config.py # Config schema
│ └── mode/local.py # Local batch mode
├── result/ # Debug output (with -v flag)
└── README.md # Official documentation
```
---
## 9. Running the Tool
### Single image
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
-m manga_translator local `
-i "example\japanese.jpg" `
-o "example-translated" `
--config-file "translate_config.json" `
--ignore-errors --overwrite
```
### Full gallery
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
-m manga_translator local `
-i "example\nhentai_652854" `
-o "example-translated\nhentai_652854" `
--config-file "translate_config.json" `
--ignore-errors --overwrite
```
---
## 10. Key Learnings
1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console
4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case
7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently