rv_fansub/LLM_READY.md

# LLM_READY.md - manga-image-translator Context Guide

> **Purpose:** Complete context document for any AI working with this project.
> **Last updated:** 2026-05-28
> **Working directory:** `C:\Users\Administrator\Documents\fansub2`

---

## 1. Project Overview

**manga-image-translator** is a tool that automatically translates text in manga/comic images. It:
1. **Detects** text regions in images (bounding boxes)
2. **OCRs** the text (reads what it says)
3. **Translates** the text to a target language
4. **Inpaints** (erases) the original text
5. **Renders** the translated text in the same position

### Repository
- **Location:** `manga-image-translator/` (git child repo)
- **Language:** Python (experimental version)
- **GPU:** Not available — CPU + RAM only (VPS deployment planned)
- **Python venv:** `manga-image-translator/venv/`

---

## 2. Translation Pipeline

```
Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output
```

### Detection
- **Model:** DBNet_resnet34 (default detector)
- **Default resolution:** 2048px (configurable via `detection_size`)
- **Impact on speed:** Reducing to 1024 is ~2-4x faster

### OCR
- **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr`
- **Default:** `48px` (ConvNext backbone, more accurate)
- **32px:** ResNet backbone, ~20-30% faster but less accurate
- **Impact:** 48px recommended for reliability; 32px for speed

### Translation (the critical part)
- **LLM translator:** `chatgpt` (OpenAI-compatible API)
- **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate)
- **Target language:** `ESP` (Spanish)

### Inpainting
- **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none`
- **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU
- **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU
- **`none`:** Skip inpainting entirely

### Rendering
- **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none`
- **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment`

---

## 3. Working Configuration (PROVEN)

This configuration was tested on 275 pages and completed successfully in ~1.5 hours.

### Config file: `translate_config.json`
```json
{
  "translator": {
    "translator": "chatgpt",
    "target_lang": "ESP"
  },
  "render": {
    "renderer": "manga2eng",
    "font_size_offset": -10,
    "font_size_minimum": 8,
    "no_hyphenation": true,
    "alignment": "center"
  }
}
```

### Environment file: `manga-image-translator/.env`
```
OPENAI_API_KEY=<your-api-key>
OPENAI_API_BASE=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
CUSTOM_OPENAI_API_KEY=<your-api-key>
CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
CUSTOM_OPENAI_MODEL=MiniMax-M2.7
```

### Proven command
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\nhentai_652854" `
  -o "example-translated\nhentai_652854" `
  --config-file "translate_config.json" `
  --ignore-errors `
  --overwrite
```

### Performance (275 pages)
| Metric | Value |
|--------|-------|
| Pages processed | 275/275 |
| Time | ~1.5 hours |
| API calls | ~700 |
| Translator | MiniMax-M2.7 via OpenAI-compatible API |
| Language | Japanese/Chinese → Spanish (ESP) |

---

## 4. CLI Flags Reference

### General
| Flag | Description |
|------|-------------|
| `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) |
| `--overwrite` | Overwrite existing translated files |
| `--skip-no-text` | Don't save images with no detected text |
| `-v` | Verbose output (saves intermediate images to `result/`) |

### Batch processing
| Flag | Description |
|------|-------------|
| `--batch-size N` | Process N images per batch (default: 1) |
| `--batch-concurrent` | Use concurrent mode for batch translation |

### GPU (not available on VPS)
| Flag | Description |
|------|-------------|
| `--use-gpu` | Use CUDA/MPS for all models |
| `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators |

### Config file options (in `translate_config.json`)
| Key | Values | Default | Recommended |
|-----|--------|---------|-------------|
| `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` |
| `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` |
| `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null |
| `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` |
| `render.font_size_offset` | integer | 0 | -10 |
| `render.font_size_minimum` | integer | -1 | 8 |
| `render.no_hyphenation` | boolean | false | true |
| `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` |
| `detector.detection_size` | integer | 2048 | 1024 (faster) |
| `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` |
| `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) |
| `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` |

---

## 5. Valid Language Codes (target_lang)

From `manga_translator/translators/common.py`:

| Code | Language |
|------|----------|
| `CHS` | Chinese (Simplified) |
| `CHT` | Chinese (Traditional) |
| `ENG` | English |
| `JPN` | Japanese |
| `KOR` | Korean |
| `ESP` | Spanish |
| `FRA` | French |
| `DEU` | German |
| `ITA` | Italian |
| `PTB` | Portuguese (Brazil) |
| `RUS` | Russian |
| `ARA` | Arabic |
| `THA` | Thai |
| `VIN` | Vietnamese |
| ... | (25+ languages total) |

---

## 6. Optimization Options (Tested Results)

### OPCIÓN A: Config only (no code changes) — TESTED
| Flag | Default | Tested | Impact |
|------|---------|--------|--------|
| `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection |
| `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting |
| `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting |
| `--ocr 32px` | 48px | Tested | ~20-30% faster OCR |
| `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax |
| `--batch-concurrent` | off | Tested | Added overhead, no benefit |
| `--skip-no-text` | off | Tested | Saves I/O, minor benefit |

**Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):**
- **1.9 hours** (SLOWER than 1.5h without flags)
- **266/275 pages** (9 failed)
- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)

### OPCIÓN B: Code changes (not implemented yet)
| Change | Expected Impact | Complexity |
|--------|----------------|------------|
| Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low |
| Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High |
| Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low |

---

## 7. Known Issues

### MiniMax API Errors
- **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content
- **Frequency:** Occurs with long Chinese text blocks, especially in batch mode
- **Workaround:** `--ignore-errors` skips failed pages

### Post-Translation Check Failures
- The tool checks if translated text is actually in the target language
- Sometimes valid Spanish translations fail the check (false negatives)
- This causes unnecessary retries and can revert translations to original text
- **Workaround:** Already handled by `--ignore-errors`

### Vertical Bubble Problem
- Japanese manga uses vertical speech bubbles (narrow, tall)
- Spanish text is horizontal and longer than Japanese
- Text overflows or doesn't fit in narrow vertical bubbles
- **Mitigation:** `font_size_offset: -10` reduces font size to fit better
- **Known limitation:** Some vertical bubbles will always overflow

### OCR Accuracy
- OCR sometimes misreads characters (especially damaged/low-quality scans)
- OCR errors propagate to translation (garbage in → garbage out)
- `48px` model is more accurate than `32px`

---

## 8. File Structure

```
fansub2/
├── example/                          # Source images
│   ├── nhentai_652854/              # 275-page gallery (Chinese manga)
│   ├── japanese.jpg                 # Single Japanese page test
│   ├── english.jpg                  # Single English page test
│   ├── chinese_sfw.webp             # Single Chinese page test
│   ├── coreano.jpg                  # Single Korean page test
│   └── burbujascombinadas.webp      # Single English page test
├── example-translated/              # Output translated images
│   ├── nhentai_652854/             # 275 pages (1.5h, batch-size 1)
│   ├── nhentai_652854_test/        # 275 pages (1.9h, batch-size 30 - SLOWER)
│   ├── japanese.jpg                # Latest: font_size_offset -10
│   └── ... (other translated files)
├── translate_config.json            # Active translation config
├── OPTIMIZACIONES.md               # Optimization notes
└── manga-image-translator/          # The tool
    ├── .env                         # API keys (DO NOT COMMIT)
    ├── venv/                        # Python virtual environment
    ├── manga_translator/            # Source code
    │   ├── translators/             # Translation backends
    │   │   ├── chatgpt.py          # OpenAI-compatible (MiniMax)
    │   │   ├── common.py           # Language codes, base classes
    │   │   ├── nllb.py             # Facebook NLLB-200 (offline)
    │   │   ├── m2m100.py           # Facebook M2M-100 (offline)
    │   │   └── keys.py             # API key env vars
    │   ├── rendering/              # Text rendering
    │   │   ├── __init__.py         # Main render pipeline
    │   │   └── text_render.py      # Font/text rendering
    │   ├── detection/              # Text detection (DBNet)
    │   ├── ocr/                    # OCR models
    │   ├── inpainting/             # Text erasure models
    │   ├── manga_translator.py     # Main orchestrator
    │   ├── config.py               # Config schema
    │   └── mode/local.py           # Local batch mode
    ├── result/                      # Debug output (with -v flag)
    └── README.md                    # Official documentation
```

---

## 9. Running the Tool

### Single image
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\japanese.jpg" `
  -o "example-translated" `
  --config-file "translate_config.json" `
  --ignore-errors --overwrite
```

### Full gallery
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
  -m manga_translator local `
  -i "example\nhentai_652854" `
  -o "example-translated\nhentai_652854" `
  --config-file "translate_config.json" `
  --ignore-errors --overwrite
```

---

## 10. Key Learnings

1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console
4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case
7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently