Sync: Complete project state with all MEGA SPRINT V1-V3 features and Codex stubs

This commit is contained in:
renato97
2026-04-08 17:58:47 -03:00
parent c9d3528900
commit 6d080d43b3
372 changed files with 189715 additions and 8590 deletions

View File

@@ -0,0 +1,397 @@
# Ralph 24/7 Automation Architecture
This document defines the operating model for running Ralph as a persistent, local, task-driven swarm for this repository.
The goal is simple:
- accept tasks as Markdown files
- turn each task into a structured task pack
- dispatch implementation to a provider-backed agent
- run multiple reviewers
- run Codex as the final master reviewer
- keep every run isolated in its own worktree
- keep the system alive through a Windows Scheduled Task and a background daemon
The design is intentionally local-first. The source of truth is always the repository on disk, the task files on disk, and the run artifacts on disk.
## 1. Why this architecture exists
The current project needs two things at the same time:
- reliable engineering work on a shared codebase
- long-lived orchestration that can run without manual babysitting
The architecture below is meant to support both.
It avoids the most common failure modes of agent swarms:
- editing the main tree directly
- losing task context after a single run
- trusting a model's self-report instead of persisted artifacts
- overclaiming success without validation
- letting one provider become the entire system
Ralph is the orchestration layer that prevents those problems.
## 2. High-level flow
The intended 24/7 flow is:
1. A human drops a task Markdown file into an inbox folder.
2. A daemon notices the file and turns it into a task pack.
3. Ralph creates an isolated worktree for the run.
4. One provider acts as implementer.
5. Multiple providers act as reviewers.
6. Codex runs as the persistent master reviewer.
7. A fix pass can be executed if the review warrants it.
8. The run is archived with prompts, outputs, diffs and logs.
Nothing merges automatically into the main branch. Every result must be inspected through the persisted run artifacts.
## 3. Core directories
The architecture should use these directories under `ralph/`:
- `ralph/tasks/inbox/`
- `ralph/tasks/processing/`
- `ralph/tasks/completed/`
- `ralph/tasks/failed/`
- `ralph/tasks/current/`
- `ralph/runs/`
- `ralph/worktrees/`
- `ralph/state/`
- `ralph/logs/`
Suggested responsibilities:
- `inbox/`: raw Markdown tasks waiting to be picked up
- `processing/`: task packs currently under execution
- `completed/`: successful completed task packages
- `failed/`: task packages that failed validation or execution
- `current/`: optional manually curated task pack for one-off runs
- `runs/`: immutable run artifacts for each execution
- `worktrees/`: isolated git worktrees per run
- `state/`: machine-readable live state for dashboards and automation
- `logs/`: daemon and runner logs
## 4. Task contract
Every task should begin as a single Markdown file.
The file can be simple or structured, but the daemon should be able to extract:
- task goal
- acceptance criteria
- context
- constraints
- expected outputs
If a section is missing, the system should fall back to the standard task pack templates.
Recommended minimum structure:
```md
# Task Title
## Goal
What needs to be built or fixed.
## Acceptance Criteria
- measurable outcome 1
- measurable outcome 2
## Context
Relevant background, references or links.
## Constraints
- scope limits
- runtime limits
- no-go areas
```
The parser should prefer explicit headings, but it should also work if the task is just a plain instruction block.
## 5. From Markdown to task pack
The daemon should convert the inbox Markdown into a task pack directory.
Each task pack directory should contain at least:
- `TASK.md`
- `ACCEPTANCE.md`
- `CONTEXT.md`
- optionally `SOURCE.md`
Recommended behavior:
- if the source Markdown has no acceptance section, insert the standard acceptance template
- if the source Markdown has no context section, insert the standard context template
- preserve the original task Markdown as `SOURCE.md` for traceability
- record the original file name, timestamp and run id in metadata
This keeps the task pack stable even when the original inbox file is messy.
## 6. Daemon responsibilities
The daemon is the always-on process that watches the inbox and triggers runs.
Its responsibilities are:
- watch `ralph/tasks/inbox/` for new `.md` files
- move a file into a processing folder before starting work
- generate a run id
- build the task pack directory
- launch the autopilot runner against that task pack
- move the task into archive or failed state after execution
- write live state into `ralph/state/`
- keep logs in `ralph/logs/`
The daemon should be conservative:
- process one task at a time
- prevent duplicate execution with a lock or mutex
- never run two autopilot jobs on the same inbox item
- never delete original task content
## 7. Scheduled Task behavior on Windows
The daemon should be started and kept alive by a Windows Scheduled Task.
Recommended behavior:
- trigger at logon or startup
- run with the current user context when possible
- restart on failure
- keep a log file for stdout and stderr
- avoid hidden state outside the repository
The Scheduled Task should only be a launcher. The actual orchestration logic belongs in the daemon.
This separation matters:
- Scheduled Task = persistence and recovery
- daemon = queue processing and orchestration
## 8. Worktree isolation
Every run must happen in a dedicated git worktree.
That is a hard rule.
Why:
- providers can make destructive or exploratory edits without polluting the main tree
- diffs are easier to review
- a failed run can be inspected after the fact
- multiple runs can coexist without overwriting each other
Recommended worktree lifecycle:
1. create a detached worktree from the current HEAD
2. run the implementer in that worktree
3. capture the diff after the first pass
4. run reviewers against the diff
5. run Codex against the same artifacts
6. optionally run a fix pass in the same worktree
7. keep the worktree for inspection or cleanup later
The main tree should remain untouched unless a human explicitly decides to merge.
## 9. Provider flow
Ralph should treat providers as roles, not as equal interchangeable machines.
Recommended role split:
- implementer: one model does the first pass
- reviewers: multiple models critique the diff
- Codex master: final senior review and sprint writer
Recommended policy:
- do not let the implementer self-approve
- do not accept a run based only on compilation
- do not trust a single provider's summary if the persisted artifacts disagree
- do not let reviewers edit the main tree directly
## 10. Codex master reviewer
Codex should act as the final reviewer, not the only actor.
Its job is to:
- read the task pack
- read reviewer outputs
- read the worktree diff
- compare claims against repository truth
- point out missing validations
- write the next sprint document when the run is incomplete
Codex is especially valuable for:
- overclaim detection
- acceptance verification
- long-lived project memory through a persistent session
- final synthesis after multiple providers disagree
The Codex master should always prefer persisted facts over model self-reporting.
## 11. Suggested run lifecycle
This is the recommended state machine for a single task:
### 11.1 Ingest
- inbox file appears
- daemon moves it to processing
- task pack is created
- run id is assigned
### 11.2 Implement
- isolated worktree is created
- implementer receives the task pack and worktree
- implementer writes code changes only inside the worktree
### 11.3 Review
- reviewers receive the task pack plus the diff
- reviewers produce structured critiques
- reviewer output is persisted in the run folder
### 11.4 Codex master review
- Codex receives the task pack, diff and reviewer outputs
- Codex decides whether the work is actually complete
- if the work is incomplete, Codex writes the next sprint
### 11.5 Fix pass
- if the reviews identify high-signal issues, a fix pass runs in the same worktree
### 11.6 Archive
- run artifacts are finalized
- task is archived or marked failed
- state files are updated
## 12. Run artifacts
Every run should leave a complete artifact trail.
At minimum, the run folder should contain:
- the task pack files
- the implementer prompt
- reviewer prompts
- reviewer outputs
- Codex master output
- patch files
- status snapshots
- summary markdown
- state metadata
This is what makes the system auditable.
If a model says the task is complete, the artifacts should prove it.
## 13. Validation rules
Ralph should not accept success based on compile-only checks.
The validation standard should be:
- code compiles
- runtime behavior is validated when relevant
- the diff matches the claim
- reviewer feedback is addressed
- Codex master agrees the task is actually complete
For Ableton or MCP work, runtime truth matters more than static validity.
## 14. Recommended failure handling
The daemon should treat these as failures or partial failures:
- provider timeout
- missing prompt artifact
- empty diff when changes were expected
- divergence between claimed result and persisted state
- Codex master flags unresolved acceptance issues
On failure:
- persist the error
- mark the run as failed
- preserve the worktree and logs
- move the task to `failed/`
Failure should be informative, not destructive.
## 15. Security and secrets
Provider tokens must not be embedded in task files or committed docs.
Recommended practice:
- keep live secrets in local config or environment variables
- never paste tokens into a task Markdown file
- rotate any token that is accidentally exposed outside the machine
The task inbox is for instructions, not secrets.
## 16. Operational examples
### Submit a task
Drop a file into the inbox:
```text
ralph/tasks/inbox/2026-04-03-fix-automation.md
```
### Process one task manually
The daemon should be able to run a single pass and exit.
### Run continuously
The daemon should stay alive, poll the inbox, and process tasks as they arrive.
### Review a completed run
Inspect:
- `ralph/runs/<run-id>/SUMMARY.md`
- `ralph/runs/<run-id>/reviews/`
- `ralph/runs/<run-id>/outputs/`
- `ralph/runs/<run-id>/implementer.patch`
## 17. What this architecture is not
This is not:
- a blind autonomous coder with no review loop
- a merge bot
- a cloud-only pipeline
- a single model pretending to be a team
- a system that trusts one summary file over the actual repository state
## 18. Recommended next implementation steps
If this architecture is implemented in code, the first concrete pieces should be:
1. inbox daemon
2. Windows Scheduled Task installer
3. Markdown task submitter
4. state snapshot writer
5. Codex master review wrapper
6. dashboard refresh hooks
That sequence gives the highest leverage with the lowest risk.
## 19. Final rule
The system is only healthy if it can keep running, keep explaining itself, and keep proving its claims with files on disk.
That is the standard for a real 24/7 Ralph pipeline.