Sync: Complete project state with all MEGA SPRINT V1-V3 features and Codex stubs

2026-04-08 17:58:47 -03:00
parent c9d3528900
commit 6d080d43b3
372 changed files with 189715 additions and 8590 deletions
--- a/docs/RALPH_24_7_AUTOMATION_ARCHITECTURE.md
+++ b/docs/RALPH_24_7_AUTOMATION_ARCHITECTURE.md
@@ -0,0 +1,397 @@
+# Ralph 24/7 Automation Architecture
+
+This document defines the operating model for running Ralph as a persistent, local, task-driven swarm for this repository.
+
+The goal is simple:
+
+- accept tasks as Markdown files
+- turn each task into a structured task pack
+- dispatch implementation to a provider-backed agent
+- run multiple reviewers
+- run Codex as the final master reviewer
+- keep every run isolated in its own worktree
+- keep the system alive through a Windows Scheduled Task and a background daemon
+
+The design is intentionally local-first. The source of truth is always the repository on disk, the task files on disk, and the run artifacts on disk.
+
+## 1. Why this architecture exists
+
+The current project needs two things at the same time:
+
+- reliable engineering work on a shared codebase
+- long-lived orchestration that can run without manual babysitting
+
+The architecture below is meant to support both.
+
+It avoids the most common failure modes of agent swarms:
+
+- editing the main tree directly
+- losing task context after a single run
+- trusting a model's self-report instead of persisted artifacts
+- overclaiming success without validation
+- letting one provider become the entire system
+
+Ralph is the orchestration layer that prevents those problems.
+
+## 2. High-level flow
+
+The intended 24/7 flow is:
+
+1. A human drops a task Markdown file into an inbox folder.
+2. A daemon notices the file and turns it into a task pack.
+3. Ralph creates an isolated worktree for the run.
+4. One provider acts as implementer.
+5. Multiple providers act as reviewers.
+6. Codex runs as the persistent master reviewer.
+7. A fix pass can be executed if the review warrants it.
+8. The run is archived with prompts, outputs, diffs and logs.
+
+Nothing merges automatically into the main branch. Every result must be inspected through the persisted run artifacts.
+
+## 3. Core directories
+
+The architecture should use these directories under `ralph/`:
+
+- `ralph/tasks/inbox/`
+- `ralph/tasks/processing/`
+- `ralph/tasks/completed/`
+- `ralph/tasks/failed/`
+- `ralph/tasks/current/`
+- `ralph/runs/`
+- `ralph/worktrees/`
+- `ralph/state/`
+- `ralph/logs/`
+
+Suggested responsibilities:
+
+- `inbox/`: raw Markdown tasks waiting to be picked up
+- `processing/`: task packs currently under execution
+- `completed/`: successful completed task packages
+- `failed/`: task packages that failed validation or execution
+- `current/`: optional manually curated task pack for one-off runs
+- `runs/`: immutable run artifacts for each execution
+- `worktrees/`: isolated git worktrees per run
+- `state/`: machine-readable live state for dashboards and automation
+- `logs/`: daemon and runner logs
+
+## 4. Task contract
+
+Every task should begin as a single Markdown file.
+
+The file can be simple or structured, but the daemon should be able to extract:
+
+- task goal
+- acceptance criteria
+- context
+- constraints
+- expected outputs
+
+If a section is missing, the system should fall back to the standard task pack templates.
+
+Recommended minimum structure:
+
+```md
+# Task Title
+
+## Goal
+What needs to be built or fixed.
+
+## Acceptance Criteria
+- measurable outcome 1
+- measurable outcome 2
+
+## Context
+Relevant background, references or links.
+
+## Constraints
+- scope limits
+- runtime limits
+- no-go areas
+```
+
+The parser should prefer explicit headings, but it should also work if the task is just a plain instruction block.
+
+## 5. From Markdown to task pack
+
+The daemon should convert the inbox Markdown into a task pack directory.
+
+Each task pack directory should contain at least:
+
+- `TASK.md`
+- `ACCEPTANCE.md`
+- `CONTEXT.md`
+- optionally `SOURCE.md`
+
+Recommended behavior:
+
+- if the source Markdown has no acceptance section, insert the standard acceptance template
+- if the source Markdown has no context section, insert the standard context template
+- preserve the original task Markdown as `SOURCE.md` for traceability
+- record the original file name, timestamp and run id in metadata
+
+This keeps the task pack stable even when the original inbox file is messy.
+
+## 6. Daemon responsibilities
+
+The daemon is the always-on process that watches the inbox and triggers runs.
+
+Its responsibilities are:
+
+- watch `ralph/tasks/inbox/` for new `.md` files
+- move a file into a processing folder before starting work
+- generate a run id
+- build the task pack directory
+- launch the autopilot runner against that task pack
+- move the task into archive or failed state after execution
+- write live state into `ralph/state/`
+- keep logs in `ralph/logs/`
+
+The daemon should be conservative:
+
+- process one task at a time
+- prevent duplicate execution with a lock or mutex
+- never run two autopilot jobs on the same inbox item
+- never delete original task content
+
+## 7. Scheduled Task behavior on Windows
+
+The daemon should be started and kept alive by a Windows Scheduled Task.
+
+Recommended behavior:
+
+- trigger at logon or startup
+- run with the current user context when possible
+- restart on failure
+- keep a log file for stdout and stderr
+- avoid hidden state outside the repository
+
+The Scheduled Task should only be a launcher. The actual orchestration logic belongs in the daemon.
+
+This separation matters:
+
+- Scheduled Task = persistence and recovery
+- daemon = queue processing and orchestration
+
+## 8. Worktree isolation
+
+Every run must happen in a dedicated git worktree.
+
+That is a hard rule.
+
+Why:
+
+- providers can make destructive or exploratory edits without polluting the main tree
+- diffs are easier to review
+- a failed run can be inspected after the fact
+- multiple runs can coexist without overwriting each other
+
+Recommended worktree lifecycle:
+
+1. create a detached worktree from the current HEAD
+2. run the implementer in that worktree
+3. capture the diff after the first pass
+4. run reviewers against the diff
+5. run Codex against the same artifacts
+6. optionally run a fix pass in the same worktree
+7. keep the worktree for inspection or cleanup later
+
+The main tree should remain untouched unless a human explicitly decides to merge.
+
+## 9. Provider flow
+
+Ralph should treat providers as roles, not as equal interchangeable machines.
+
+Recommended role split:
+
+- implementer: one model does the first pass
+- reviewers: multiple models critique the diff
+- Codex master: final senior review and sprint writer
+
+Recommended policy:
+
+- do not let the implementer self-approve
+- do not accept a run based only on compilation
+- do not trust a single provider's summary if the persisted artifacts disagree
+- do not let reviewers edit the main tree directly
+
+## 10. Codex master reviewer
+
+Codex should act as the final reviewer, not the only actor.
+
+Its job is to:
+
+- read the task pack
+- read reviewer outputs
+- read the worktree diff
+- compare claims against repository truth
+- point out missing validations
+- write the next sprint document when the run is incomplete
+
+Codex is especially valuable for:
+
+- overclaim detection
+- acceptance verification
+- long-lived project memory through a persistent session
+- final synthesis after multiple providers disagree
+
+The Codex master should always prefer persisted facts over model self-reporting.
+
+## 11. Suggested run lifecycle
+
+This is the recommended state machine for a single task:
+
+### 11.1 Ingest
+
+- inbox file appears
+- daemon moves it to processing
+- task pack is created
+- run id is assigned
+
+### 11.2 Implement
+
+- isolated worktree is created
+- implementer receives the task pack and worktree
+- implementer writes code changes only inside the worktree
+
+### 11.3 Review
+
+- reviewers receive the task pack plus the diff
+- reviewers produce structured critiques
+- reviewer output is persisted in the run folder
+
+### 11.4 Codex master review
+
+- Codex receives the task pack, diff and reviewer outputs
+- Codex decides whether the work is actually complete
+- if the work is incomplete, Codex writes the next sprint
+
+### 11.5 Fix pass
+
+- if the reviews identify high-signal issues, a fix pass runs in the same worktree
+
+### 11.6 Archive
+
+- run artifacts are finalized
+- task is archived or marked failed
+- state files are updated
+
+## 12. Run artifacts
+
+Every run should leave a complete artifact trail.
+
+At minimum, the run folder should contain:
+
+- the task pack files
+- the implementer prompt
+- reviewer prompts
+- reviewer outputs
+- Codex master output
+- patch files
+- status snapshots
+- summary markdown
+- state metadata
+
+This is what makes the system auditable.
+
+If a model says the task is complete, the artifacts should prove it.
+
+## 13. Validation rules
+
+Ralph should not accept success based on compile-only checks.
+
+The validation standard should be:
+
+- code compiles
+- runtime behavior is validated when relevant
+- the diff matches the claim
+- reviewer feedback is addressed
+- Codex master agrees the task is actually complete
+
+For Ableton or MCP work, runtime truth matters more than static validity.
+
+## 14. Recommended failure handling
+
+The daemon should treat these as failures or partial failures:
+
+- provider timeout
+- missing prompt artifact
+- empty diff when changes were expected
+- divergence between claimed result and persisted state
+- Codex master flags unresolved acceptance issues
+
+On failure:
+
+- persist the error
+- mark the run as failed
+- preserve the worktree and logs
+- move the task to `failed/`
+
+Failure should be informative, not destructive.
+
+## 15. Security and secrets
+
+Provider tokens must not be embedded in task files or committed docs.
+
+Recommended practice:
+
+- keep live secrets in local config or environment variables
+- never paste tokens into a task Markdown file
+- rotate any token that is accidentally exposed outside the machine
+
+The task inbox is for instructions, not secrets.
+
+## 16. Operational examples
+
+### Submit a task
+
+Drop a file into the inbox:
+
+```text
+ralph/tasks/inbox/2026-04-03-fix-automation.md
+```
+
+### Process one task manually
+
+The daemon should be able to run a single pass and exit.
+
+### Run continuously
+
+The daemon should stay alive, poll the inbox, and process tasks as they arrive.
+
+### Review a completed run
+
+Inspect:
+
+- `ralph/runs/<run-id>/SUMMARY.md`
+- `ralph/runs/<run-id>/reviews/`
+- `ralph/runs/<run-id>/outputs/`
+- `ralph/runs/<run-id>/implementer.patch`
+
+## 17. What this architecture is not
+
+This is not:
+
+- a blind autonomous coder with no review loop
+- a merge bot
+- a cloud-only pipeline
+- a single model pretending to be a team
+- a system that trusts one summary file over the actual repository state
+
+## 18. Recommended next implementation steps
+
+If this architecture is implemented in code, the first concrete pieces should be:
+
+1. inbox daemon
+2. Windows Scheduled Task installer
+3. Markdown task submitter
+4. state snapshot writer
+5. Codex master review wrapper
+6. dashboard refresh hooks
+
+That sequence gives the highest leverage with the lowest risk.
+
+## 19. Final rule
+
+The system is only healthy if it can keep running, keep explaining itself, and keep proving its claims with files on disk.
+
+That is the standard for a real 24/7 Ralph pipeline.