Sync: Complete project state with all MEGA SPRINT V1-V3 features and Codex stubs
This commit is contained in:
397
docs/RALPH_24_7_AUTOMATION_ARCHITECTURE.md
Normal file
397
docs/RALPH_24_7_AUTOMATION_ARCHITECTURE.md
Normal file
@@ -0,0 +1,397 @@
|
||||
# Ralph 24/7 Automation Architecture
|
||||
|
||||
This document defines the operating model for running Ralph as a persistent, local, task-driven swarm for this repository.
|
||||
|
||||
The goal is simple:
|
||||
|
||||
- accept tasks as Markdown files
|
||||
- turn each task into a structured task pack
|
||||
- dispatch implementation to a provider-backed agent
|
||||
- run multiple reviewers
|
||||
- run Codex as the final master reviewer
|
||||
- keep every run isolated in its own worktree
|
||||
- keep the system alive through a Windows Scheduled Task and a background daemon
|
||||
|
||||
The design is intentionally local-first. The source of truth is always the repository on disk, the task files on disk, and the run artifacts on disk.
|
||||
|
||||
## 1. Why this architecture exists
|
||||
|
||||
The current project needs two things at the same time:
|
||||
|
||||
- reliable engineering work on a shared codebase
|
||||
- long-lived orchestration that can run without manual babysitting
|
||||
|
||||
The architecture below is meant to support both.
|
||||
|
||||
It avoids the most common failure modes of agent swarms:
|
||||
|
||||
- editing the main tree directly
|
||||
- losing task context after a single run
|
||||
- trusting a model's self-report instead of persisted artifacts
|
||||
- overclaiming success without validation
|
||||
- letting one provider become the entire system
|
||||
|
||||
Ralph is the orchestration layer that prevents those problems.
|
||||
|
||||
## 2. High-level flow
|
||||
|
||||
The intended 24/7 flow is:
|
||||
|
||||
1. A human drops a task Markdown file into an inbox folder.
|
||||
2. A daemon notices the file and turns it into a task pack.
|
||||
3. Ralph creates an isolated worktree for the run.
|
||||
4. One provider acts as implementer.
|
||||
5. Multiple providers act as reviewers.
|
||||
6. Codex runs as the persistent master reviewer.
|
||||
7. A fix pass can be executed if the review warrants it.
|
||||
8. The run is archived with prompts, outputs, diffs and logs.
|
||||
|
||||
Nothing merges automatically into the main branch. Every result must be inspected through the persisted run artifacts.
|
||||
|
||||
## 3. Core directories
|
||||
|
||||
The architecture should use these directories under `ralph/`:
|
||||
|
||||
- `ralph/tasks/inbox/`
|
||||
- `ralph/tasks/processing/`
|
||||
- `ralph/tasks/completed/`
|
||||
- `ralph/tasks/failed/`
|
||||
- `ralph/tasks/current/`
|
||||
- `ralph/runs/`
|
||||
- `ralph/worktrees/`
|
||||
- `ralph/state/`
|
||||
- `ralph/logs/`
|
||||
|
||||
Suggested responsibilities:
|
||||
|
||||
- `inbox/`: raw Markdown tasks waiting to be picked up
|
||||
- `processing/`: task packs currently under execution
|
||||
- `completed/`: successful completed task packages
|
||||
- `failed/`: task packages that failed validation or execution
|
||||
- `current/`: optional manually curated task pack for one-off runs
|
||||
- `runs/`: immutable run artifacts for each execution
|
||||
- `worktrees/`: isolated git worktrees per run
|
||||
- `state/`: machine-readable live state for dashboards and automation
|
||||
- `logs/`: daemon and runner logs
|
||||
|
||||
## 4. Task contract
|
||||
|
||||
Every task should begin as a single Markdown file.
|
||||
|
||||
The file can be simple or structured, but the daemon should be able to extract:
|
||||
|
||||
- task goal
|
||||
- acceptance criteria
|
||||
- context
|
||||
- constraints
|
||||
- expected outputs
|
||||
|
||||
If a section is missing, the system should fall back to the standard task pack templates.
|
||||
|
||||
Recommended minimum structure:
|
||||
|
||||
```md
|
||||
# Task Title
|
||||
|
||||
## Goal
|
||||
What needs to be built or fixed.
|
||||
|
||||
## Acceptance Criteria
|
||||
- measurable outcome 1
|
||||
- measurable outcome 2
|
||||
|
||||
## Context
|
||||
Relevant background, references or links.
|
||||
|
||||
## Constraints
|
||||
- scope limits
|
||||
- runtime limits
|
||||
- no-go areas
|
||||
```
|
||||
|
||||
The parser should prefer explicit headings, but it should also work if the task is just a plain instruction block.
|
||||
|
||||
## 5. From Markdown to task pack
|
||||
|
||||
The daemon should convert the inbox Markdown into a task pack directory.
|
||||
|
||||
Each task pack directory should contain at least:
|
||||
|
||||
- `TASK.md`
|
||||
- `ACCEPTANCE.md`
|
||||
- `CONTEXT.md`
|
||||
- optionally `SOURCE.md`
|
||||
|
||||
Recommended behavior:
|
||||
|
||||
- if the source Markdown has no acceptance section, insert the standard acceptance template
|
||||
- if the source Markdown has no context section, insert the standard context template
|
||||
- preserve the original task Markdown as `SOURCE.md` for traceability
|
||||
- record the original file name, timestamp and run id in metadata
|
||||
|
||||
This keeps the task pack stable even when the original inbox file is messy.
|
||||
|
||||
## 6. Daemon responsibilities
|
||||
|
||||
The daemon is the always-on process that watches the inbox and triggers runs.
|
||||
|
||||
Its responsibilities are:
|
||||
|
||||
- watch `ralph/tasks/inbox/` for new `.md` files
|
||||
- move a file into a processing folder before starting work
|
||||
- generate a run id
|
||||
- build the task pack directory
|
||||
- launch the autopilot runner against that task pack
|
||||
- move the task into archive or failed state after execution
|
||||
- write live state into `ralph/state/`
|
||||
- keep logs in `ralph/logs/`
|
||||
|
||||
The daemon should be conservative:
|
||||
|
||||
- process one task at a time
|
||||
- prevent duplicate execution with a lock or mutex
|
||||
- never run two autopilot jobs on the same inbox item
|
||||
- never delete original task content
|
||||
|
||||
## 7. Scheduled Task behavior on Windows
|
||||
|
||||
The daemon should be started and kept alive by a Windows Scheduled Task.
|
||||
|
||||
Recommended behavior:
|
||||
|
||||
- trigger at logon or startup
|
||||
- run with the current user context when possible
|
||||
- restart on failure
|
||||
- keep a log file for stdout and stderr
|
||||
- avoid hidden state outside the repository
|
||||
|
||||
The Scheduled Task should only be a launcher. The actual orchestration logic belongs in the daemon.
|
||||
|
||||
This separation matters:
|
||||
|
||||
- Scheduled Task = persistence and recovery
|
||||
- daemon = queue processing and orchestration
|
||||
|
||||
## 8. Worktree isolation
|
||||
|
||||
Every run must happen in a dedicated git worktree.
|
||||
|
||||
That is a hard rule.
|
||||
|
||||
Why:
|
||||
|
||||
- providers can make destructive or exploratory edits without polluting the main tree
|
||||
- diffs are easier to review
|
||||
- a failed run can be inspected after the fact
|
||||
- multiple runs can coexist without overwriting each other
|
||||
|
||||
Recommended worktree lifecycle:
|
||||
|
||||
1. create a detached worktree from the current HEAD
|
||||
2. run the implementer in that worktree
|
||||
3. capture the diff after the first pass
|
||||
4. run reviewers against the diff
|
||||
5. run Codex against the same artifacts
|
||||
6. optionally run a fix pass in the same worktree
|
||||
7. keep the worktree for inspection or cleanup later
|
||||
|
||||
The main tree should remain untouched unless a human explicitly decides to merge.
|
||||
|
||||
## 9. Provider flow
|
||||
|
||||
Ralph should treat providers as roles, not as equal interchangeable machines.
|
||||
|
||||
Recommended role split:
|
||||
|
||||
- implementer: one model does the first pass
|
||||
- reviewers: multiple models critique the diff
|
||||
- Codex master: final senior review and sprint writer
|
||||
|
||||
Recommended policy:
|
||||
|
||||
- do not let the implementer self-approve
|
||||
- do not accept a run based only on compilation
|
||||
- do not trust a single provider's summary if the persisted artifacts disagree
|
||||
- do not let reviewers edit the main tree directly
|
||||
|
||||
## 10. Codex master reviewer
|
||||
|
||||
Codex should act as the final reviewer, not the only actor.
|
||||
|
||||
Its job is to:
|
||||
|
||||
- read the task pack
|
||||
- read reviewer outputs
|
||||
- read the worktree diff
|
||||
- compare claims against repository truth
|
||||
- point out missing validations
|
||||
- write the next sprint document when the run is incomplete
|
||||
|
||||
Codex is especially valuable for:
|
||||
|
||||
- overclaim detection
|
||||
- acceptance verification
|
||||
- long-lived project memory through a persistent session
|
||||
- final synthesis after multiple providers disagree
|
||||
|
||||
The Codex master should always prefer persisted facts over model self-reporting.
|
||||
|
||||
## 11. Suggested run lifecycle
|
||||
|
||||
This is the recommended state machine for a single task:
|
||||
|
||||
### 11.1 Ingest
|
||||
|
||||
- inbox file appears
|
||||
- daemon moves it to processing
|
||||
- task pack is created
|
||||
- run id is assigned
|
||||
|
||||
### 11.2 Implement
|
||||
|
||||
- isolated worktree is created
|
||||
- implementer receives the task pack and worktree
|
||||
- implementer writes code changes only inside the worktree
|
||||
|
||||
### 11.3 Review
|
||||
|
||||
- reviewers receive the task pack plus the diff
|
||||
- reviewers produce structured critiques
|
||||
- reviewer output is persisted in the run folder
|
||||
|
||||
### 11.4 Codex master review
|
||||
|
||||
- Codex receives the task pack, diff and reviewer outputs
|
||||
- Codex decides whether the work is actually complete
|
||||
- if the work is incomplete, Codex writes the next sprint
|
||||
|
||||
### 11.5 Fix pass
|
||||
|
||||
- if the reviews identify high-signal issues, a fix pass runs in the same worktree
|
||||
|
||||
### 11.6 Archive
|
||||
|
||||
- run artifacts are finalized
|
||||
- task is archived or marked failed
|
||||
- state files are updated
|
||||
|
||||
## 12. Run artifacts
|
||||
|
||||
Every run should leave a complete artifact trail.
|
||||
|
||||
At minimum, the run folder should contain:
|
||||
|
||||
- the task pack files
|
||||
- the implementer prompt
|
||||
- reviewer prompts
|
||||
- reviewer outputs
|
||||
- Codex master output
|
||||
- patch files
|
||||
- status snapshots
|
||||
- summary markdown
|
||||
- state metadata
|
||||
|
||||
This is what makes the system auditable.
|
||||
|
||||
If a model says the task is complete, the artifacts should prove it.
|
||||
|
||||
## 13. Validation rules
|
||||
|
||||
Ralph should not accept success based on compile-only checks.
|
||||
|
||||
The validation standard should be:
|
||||
|
||||
- code compiles
|
||||
- runtime behavior is validated when relevant
|
||||
- the diff matches the claim
|
||||
- reviewer feedback is addressed
|
||||
- Codex master agrees the task is actually complete
|
||||
|
||||
For Ableton or MCP work, runtime truth matters more than static validity.
|
||||
|
||||
## 14. Recommended failure handling
|
||||
|
||||
The daemon should treat these as failures or partial failures:
|
||||
|
||||
- provider timeout
|
||||
- missing prompt artifact
|
||||
- empty diff when changes were expected
|
||||
- divergence between claimed result and persisted state
|
||||
- Codex master flags unresolved acceptance issues
|
||||
|
||||
On failure:
|
||||
|
||||
- persist the error
|
||||
- mark the run as failed
|
||||
- preserve the worktree and logs
|
||||
- move the task to `failed/`
|
||||
|
||||
Failure should be informative, not destructive.
|
||||
|
||||
## 15. Security and secrets
|
||||
|
||||
Provider tokens must not be embedded in task files or committed docs.
|
||||
|
||||
Recommended practice:
|
||||
|
||||
- keep live secrets in local config or environment variables
|
||||
- never paste tokens into a task Markdown file
|
||||
- rotate any token that is accidentally exposed outside the machine
|
||||
|
||||
The task inbox is for instructions, not secrets.
|
||||
|
||||
## 16. Operational examples
|
||||
|
||||
### Submit a task
|
||||
|
||||
Drop a file into the inbox:
|
||||
|
||||
```text
|
||||
ralph/tasks/inbox/2026-04-03-fix-automation.md
|
||||
```
|
||||
|
||||
### Process one task manually
|
||||
|
||||
The daemon should be able to run a single pass and exit.
|
||||
|
||||
### Run continuously
|
||||
|
||||
The daemon should stay alive, poll the inbox, and process tasks as they arrive.
|
||||
|
||||
### Review a completed run
|
||||
|
||||
Inspect:
|
||||
|
||||
- `ralph/runs/<run-id>/SUMMARY.md`
|
||||
- `ralph/runs/<run-id>/reviews/`
|
||||
- `ralph/runs/<run-id>/outputs/`
|
||||
- `ralph/runs/<run-id>/implementer.patch`
|
||||
|
||||
## 17. What this architecture is not
|
||||
|
||||
This is not:
|
||||
|
||||
- a blind autonomous coder with no review loop
|
||||
- a merge bot
|
||||
- a cloud-only pipeline
|
||||
- a single model pretending to be a team
|
||||
- a system that trusts one summary file over the actual repository state
|
||||
|
||||
## 18. Recommended next implementation steps
|
||||
|
||||
If this architecture is implemented in code, the first concrete pieces should be:
|
||||
|
||||
1. inbox daemon
|
||||
2. Windows Scheduled Task installer
|
||||
3. Markdown task submitter
|
||||
4. state snapshot writer
|
||||
5. Codex master review wrapper
|
||||
6. dashboard refresh hooks
|
||||
|
||||
That sequence gives the highest leverage with the lowest risk.
|
||||
|
||||
## 19. Final rule
|
||||
|
||||
The system is only healthy if it can keep running, keep explaining itself, and keep proving its claims with files on disk.
|
||||
|
||||
That is the standard for a real 24/7 Ralph pipeline.
|
||||
Reference in New Issue
Block a user