feat: initial commit - manga-image-translator setup with MiniMax LLM for Spanish translation

This commit is contained in:
renato97
2026-05-28 20:51:35 -03:00
commit 9231d96305
18 changed files with 437 additions and 0 deletions

View File

@@ -0,0 +1,4 @@
# Taste (Continuously Learned by [CommandCode][cmd])
[cmd]: https://commandcode.ai/

35
.gitignore vendored Normal file
View File

@@ -0,0 +1,35 @@
# Python
__pycache__/
*.pyc
*.pyo
venv/
.venv/
# IDE
.vscode/
.idea/
# OS
Thumbs.db
Desktop.ini
.DS_Store
# manga-image-translator internals (submodule content)
manga-image-translator/.git/
manga-image-translator/models/
manga-image-translator/venv/
manga-image-translator/result/
manga-image-translator/__pycache__/
manga-image-translator/**/*.pyc
# Sensitive
.env
.env.*
manga-image-translator/.env
# Large translated galleries (>10MB)
example-translated/nhentai_652854/
example-translated/nhentai_652854_test/
# Temporary/debug
*.log

3
.gitmodules vendored Normal file
View File

@@ -0,0 +1,3 @@
[submodule "manga-image-translator"]
path = manga-image-translator
url = https://github.com/zyddnys/manga-image-translator.git

312
LLM_READY.md Normal file
View File

@@ -0,0 +1,312 @@
# LLM_READY.md - manga-image-translator Context Guide
> **Purpose:** Complete context document for any AI working with this project.
> **Last updated:** 2026-05-28
> **Working directory:** `C:\Users\Administrator\Documents\fansub2`
---
## 1. Project Overview
**manga-image-translator** is a tool that automatically translates text in manga/comic images. It:
1. **Detects** text regions in images (bounding boxes)
2. **OCRs** the text (reads what it says)
3. **Translates** the text to a target language
4. **Inpaints** (erases) the original text
5. **Renders** the translated text in the same position
### Repository
- **Location:** `manga-image-translator/` (git child repo)
- **Language:** Python (experimental version)
- **GPU:** Not available — CPU + RAM only (VPS deployment planned)
- **Python venv:** `manga-image-translator/venv/`
---
## 2. Translation Pipeline
```
Image → Detection → OCR → Translation → Mask Refinement → Inpainting → Rendering → Output
```
### Detection
- **Model:** DBNet_resnet34 (default detector)
- **Default resolution:** 2048px (configurable via `detection_size`)
- **Impact on speed:** Reducing to 1024 is ~2-4x faster
### OCR
- **Available models:** `32px`, `48px` (default), `48px_ctc`, `mocr`
- **Default:** `48px` (ConvNext backbone, more accurate)
- **32px:** ResNet backbone, ~20-30% faster but less accurate
- **Impact:** 48px recommended for reliability; 32px for speed
### Translation (the critical part)
- **LLM translator:** `chatgpt` (OpenAI-compatible API)
- **Offline translators:** `nllb`, `m2m100`, `mbart50`, `qwen2` (all slower and less accurate)
- **Target language:** `ESP` (Spanish)
### Inpainting
- **Available:** `lama_large` (default), `lama_mpe`, `default` (AOT), `sd`, `none`
- **Default:** `lama_large` — uses FFT (FourierUnit), expensive on CPU
- **AOT (`default`):** Lightweight convolutions, ~3-5x faster on CPU
- **`none`:** Skip inpainting entirely
### Rendering
- **Available:** `default`, `manga2eng` (recommended), `manga2eng_pillow`, `none`
- **Key settings:** `font_size_offset`, `font_size_minimum`, `no_hyphenation`, `alignment`
---
## 3. Working Configuration (PROVEN)
This configuration was tested on 275 pages and completed successfully in ~1.5 hours.
### Config file: `translate_config.json`
```json
{
"translator": {
"translator": "chatgpt",
"target_lang": "ESP"
},
"render": {
"renderer": "manga2eng",
"font_size_offset": -10,
"font_size_minimum": 8,
"no_hyphenation": true,
"alignment": "center"
}
}
```
### Environment file: `manga-image-translator/.env`
```
OPENAI_API_KEY=<your-api-key>
OPENAI_API_BASE=https://api.minimax.io/v1
OPENAI_MODEL=MiniMax-M2.7
CUSTOM_OPENAI_API_KEY=<your-api-key>
CUSTOM_OPENAI_API_BASE=https://api.minimax.io/v1
CUSTOM_OPENAI_MODEL=MiniMax-M2.7
```
### Proven command
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
-m manga_translator local `
-i "example\nhentai_652854" `
-o "example-translated\nhentai_652854" `
--config-file "translate_config.json" `
--ignore-errors `
--overwrite
```
### Performance (275 pages)
| Metric | Value |
|--------|-------|
| Pages processed | 275/275 |
| Time | ~1.5 hours |
| API calls | ~700 |
| Translator | MiniMax-M2.7 via OpenAI-compatible API |
| Language | Japanese/Chinese → Spanish (ESP) |
---
## 4. CLI Flags Reference
### General
| Flag | Description |
|------|-------------|
| `--ignore-errors` | Skip failed images instead of crashing (ESSENTIAL for batch jobs) |
| `--overwrite` | Overwrite existing translated files |
| `--skip-no-text` | Don't save images with no detected text |
| `-v` | Verbose output (saves intermediate images to `result/`) |
### Batch processing
| Flag | Description |
|------|-------------|
| `--batch-size N` | Process N images per batch (default: 1) |
| `--batch-concurrent` | Use concurrent mode for batch translation |
### GPU (not available on VPS)
| Flag | Description |
|------|-------------|
| `--use-gpu` | Use CUDA/MPS for all models |
| `--use-gpu-limited` | Use GPU for detection/OCR but CPU for offline translators |
### Config file options (in `translate_config.json`)
| Key | Values | Default | Recommended |
|-----|--------|---------|-------------|
| `translator.translator` | `chatgpt`, `nllb`, `m2m100`, `sugoi`, etc. | `sugoi` | `chatgpt` |
| `translator.target_lang` | `ESP`, `ENG`, `JPN`, etc. | `ENG` | `ESP` |
| `translator.translator_chain` | e.g. `"nllb:ENG;nllb:ESP"` | null | null |
| `render.renderer` | `default`, `manga2eng`, `manga2eng_pillow`, `none` | `default` | `manga2eng` |
| `render.font_size_offset` | integer | 0 | -10 |
| `render.font_size_minimum` | integer | -1 | 8 |
| `render.no_hyphenation` | boolean | false | true |
| `render.alignment` | `auto`, `left`, `center`, `right` | `auto` | `center` |
| `detector.detection_size` | integer | 2048 | 1024 (faster) |
| `inpainter.inpainter` | `default`, `lama_large`, `lama_mpe`, `sd`, `none` | `lama_large` | `lama_large` |
| `inpainter.inpainting_size` | integer | 2048 | 1024 (faster) |
| `ocr.ocr` | `32px`, `48px`, `48px_ctc`, `mocr` | `48px` | `48px` |
---
## 5. Valid Language Codes (target_lang)
From `manga_translator/translators/common.py`:
| Code | Language |
|------|----------|
| `CHS` | Chinese (Simplified) |
| `CHT` | Chinese (Traditional) |
| `ENG` | English |
| `JPN` | Japanese |
| `KOR` | Korean |
| `ESP` | Spanish |
| `FRA` | French |
| `DEU` | German |
| `ITA` | Italian |
| `PTB` | Portuguese (Brazil) |
| `RUS` | Russian |
| `ARA` | Arabic |
| `THA` | Thai |
| `VIN` | Vietnamese |
| ... | (25+ languages total) |
---
## 6. Optimization Options (Tested Results)
### OPCIÓN A: Config only (no code changes) — TESTED
| Flag | Default | Tested | Impact |
|------|---------|--------|--------|
| `--detection-size 1024` | 2048 | Not tested yet | ~2-4x faster detection |
| `--inpainting-size 1024` | 2048 | Not tested yet | ~2-4x faster inpainting |
| `--inpainter default` (AOT) | lama_large | Tested | ~3-5x faster inpainting |
| `--ocr 32px` | 48px | Tested | ~20-30% faster OCR |
| `--batch-size 10-30` | 1 | Tested (30) | **FAILED** — error 2013 with MiniMax |
| `--batch-concurrent` | off | Tested | Added overhead, no benefit |
| `--skip-no-text` | off | Tested | Saves I/O, minor benefit |
**Results with flags (detection 1024, inpainting 1024, AOT, OCR 32px, batch 30):**
- **1.9 hours** (SLOWER than 1.5h without flags)
- **266/275 pages** (9 failed)
- Root cause: batch-size 30 generates prompts too large for MiniMax (error 2013)
### OPCIÓN B: Code changes (not implemented yet)
| Change | Expected Impact | Complexity |
|--------|----------------|------------|
| Fix `_concurrent_translate_contexts` to use `asyncio.gather` | ~30-40% faster | Low |
| Add `ProcessPoolExecutor` for detection/OCR | ~50-60% faster | High |
| Increase `_MAX_TOKENS` from 4096 to 8192 | Minor | Low |
---
## 7. Known Issues
### MiniMax API Errors
- **Error 400:** `bad_request_error (2013)` — prompt too long or contains problematic content
- **Frequency:** Occurs with long Chinese text blocks, especially in batch mode
- **Workaround:** `--ignore-errors` skips failed pages
### Post-Translation Check Failures
- The tool checks if translated text is actually in the target language
- Sometimes valid Spanish translations fail the check (false negatives)
- This causes unnecessary retries and can revert translations to original text
- **Workaround:** Already handled by `--ignore-errors`
### Vertical Bubble Problem
- Japanese manga uses vertical speech bubbles (narrow, tall)
- Spanish text is horizontal and longer than Japanese
- Text overflows or doesn't fit in narrow vertical bubbles
- **Mitigation:** `font_size_offset: -10` reduces font size to fit better
- **Known limitation:** Some vertical bubbles will always overflow
### OCR Accuracy
- OCR sometimes misreads characters (especially damaged/low-quality scans)
- OCR errors propagate to translation (garbage in → garbage out)
- `48px` model is more accurate than `32px`
---
## 8. File Structure
```
fansub2/
├── example/ # Source images
│ ├── nhentai_652854/ # 275-page gallery (Chinese manga)
│ ├── japanese.jpg # Single Japanese page test
│ ├── english.jpg # Single English page test
│ ├── chinese_sfw.webp # Single Chinese page test
│ ├── coreano.jpg # Single Korean page test
│ └── burbujascombinadas.webp # Single English page test
├── example-translated/ # Output translated images
│ ├── nhentai_652854/ # 275 pages (1.5h, batch-size 1)
│ ├── nhentai_652854_test/ # 275 pages (1.9h, batch-size 30 - SLOWER)
│ ├── japanese.jpg # Latest: font_size_offset -10
│ └── ... (other translated files)
├── translate_config.json # Active translation config
├── OPTIMIZACIONES.md # Optimization notes
└── manga-image-translator/ # The tool
├── .env # API keys (DO NOT COMMIT)
├── venv/ # Python virtual environment
├── manga_translator/ # Source code
│ ├── translators/ # Translation backends
│ │ ├── chatgpt.py # OpenAI-compatible (MiniMax)
│ │ ├── common.py # Language codes, base classes
│ │ ├── nllb.py # Facebook NLLB-200 (offline)
│ │ ├── m2m100.py # Facebook M2M-100 (offline)
│ │ └── keys.py # API key env vars
│ ├── rendering/ # Text rendering
│ │ ├── __init__.py # Main render pipeline
│ │ └── text_render.py # Font/text rendering
│ ├── detection/ # Text detection (DBNet)
│ ├── ocr/ # OCR models
│ ├── inpainting/ # Text erasure models
│ ├── manga_translator.py # Main orchestrator
│ ├── config.py # Config schema
│ └── mode/local.py # Local batch mode
├── result/ # Debug output (with -v flag)
└── README.md # Official documentation
```
---
## 9. Running the Tool
### Single image
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
-m manga_translator local `
-i "example\japanese.jpg" `
-o "example-translated" `
--config-file "translate_config.json" `
--ignore-errors --overwrite
```
### Full gallery
```powershell
$env:PYTHONIOENCODING="utf-8"
$env:PYTHONUTF8="1"
& "manga-image-translator\venv\Scripts\python.exe" `
-m manga_translator local `
-i "example\nhentai_652854" `
-o "example-translated\nhentai_652854" `
--config-file "translate_config.json" `
--ignore-errors --overwrite
```
---
## 10. Key Learnings
1. **LLM translators beat offline models** — MiniMax produces much more natural, context-aware translations than NLLB/M2M100 for manga
2. **batch-size > 1 is risky with LLMs** — Large batches cause API errors (2013) with MiniMax; batch-size 1 is safest
3. **UTF-8 is mandatory on Windows** — Must set `PYTHONIOENCODING=utf-8` and `PYTHONUTF8=1` or CJK characters crash the console
4. **Vertical bubbles are a fundamental limitation** — Japanese vertical text bubbles don't work well with horizontal Spanish text; this is a render issue, not a translation issue
5. **`--ignore-errors` is essential** — Some pages will always fail (long text, API limits, OCR errors); skipping them is better than crashing
6. **AOT inpainter is faster on CPU** — But `lama_large` produces better quality; trade-off depends on use case
7. **`manga2eng` renderer is better than `default`** — Handles text sizing and positioning more intelligently

59
OPTIMIZACIONES.md Normal file
View File

@@ -0,0 +1,59 @@
# Optimizaciones de manga-image-translator
## OPCIÓN A: Solo config (sin tocar código)
### Flags para acelerar en CPU
| Flag | Default | Recomendado | Impacto | Riesgo |
|------|---------|-------------|---------|--------|
| `--detection-size` | 2048 | **1024** | ~2-4x más rápido detección | Puede perder burbujas muy pequeñas |
| `--inpainting-size` | 2048 | **1024** | ~2-4x más rápido inpainting | Menor calidad de borrado |
| `--inpainter` | lama_large | **default (AOT)** | ~3-5x más rápido inpainting | AOT borra un poco peor |
| `--ocr` | 48px | **32px** | ~20-30% más rápido OCR | Puede fallar en texto pequeño |
| `--batch-size` | 1 | **20-50** | Reduce llamadas API ~20x | Textos muy largos pueden fallar (error 2013) |
| `--batch-concurrent` | off | **on** | Superpone red + CPU | Mejora parcial |
| `--skip-no-text` | off | **on** | Ahorra I/O en páginas sin texto | Solo ahorra escritura, no procesamiento |
### Comando óptimo
```bash
python -m manga_translator local \
-i "input" -o "output" \
--config-file translate_config.json \
--detection-size 1024 \
--inpainting-size 1024 \
--inpainter default \
--ocr 32px \
--batch-size 30 \
--batch-concurrent \
--skip-no-text \
--ignore-errors --overwrite
```
### Estimación: 1.5h → ~20-30 min
---
## OPCIÓN B: Cambios de código (requiere modificar el repo)
### B1. Corregir `_concurrent_translate_contexts` (chatgpt.py)
- Usar `asyncio.gather` en vez de loop secuencial
- Impacto: ~30-40% más rápido en fase de traducción
- Riesgo: Bajo
### B2. Pipeline real con ProcessPoolExecutor
- multiprocessing.Pool para detección + OCR paralela
- Bypass del GIL de Python
- Impacto: ~50-60% más rápido (mayor salto)
- Riesgo: Requiere refactorizar state global de modelos
### B3. Aumentar `_MAX_TOKENS` (chatgpt.py)
- Default: 4096 → 8192 si MiniMax lo soporta
- Impacto: Menor con batch-size alto
### Estimación: 1.5h → ~15-25 min
---
## OPCIÓN C: Híbrida (recomendada)
- Config de Opción A + B1 (código menor)
- Estimación: ~15-25 min, ~50-80 llamadas API

Binary file not shown.

After

Width:  |  Height:  |  Size: 386 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 400 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 991 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 408 KiB

BIN
example/chinese_sfw.webp Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 401 KiB

BIN
example/coreano.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 753 KiB

BIN
example/english.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 920 KiB

BIN
example/japanese.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

23
translate_config.json Normal file
View File

@@ -0,0 +1,23 @@
{
"translator": {
"translator": "chatgpt",
"target_lang": "ESP"
},
"render": {
"renderer": "manga2eng",
"font_size_offset": -10,
"font_size_minimum": 8,
"no_hyphenation": true,
"alignment": "center"
},
"detector": {
"detection_size": 1024
},
"inpainter": {
"inpainter": "default",
"inpainting_size": 1024
},
"ocr": {
"ocr": "32px"
}
}