# Ralph 24/7 Automation Architecture

This document defines the operating model for running Ralph as a persistent, local, task-driven swarm for this repository.

The goal is simple:

- accept tasks as Markdown files
- turn each task into a structured task pack
- dispatch implementation to a provider-backed agent
- run multiple reviewers
- run Codex as the final master reviewer
- keep every run isolated in its own worktree
- keep the system alive through a Windows Scheduled Task and a background daemon

The design is intentionally local-first. The source of truth is always the repository on disk, the task files on disk, and the run artifacts on disk.

## 1. Why this architecture exists

The current project needs two things at the same time:

- reliable engineering work on a shared codebase
- long-lived orchestration that can run without manual babysitting

The architecture below is meant to support both.

It avoids the most common failure modes of agent swarms:

- editing the main tree directly
- losing task context after a single run
- trusting a model's self-report instead of persisted artifacts
- overclaiming success without validation
- letting one provider become the entire system

Ralph is the orchestration layer that prevents those problems.

## 2. High-level flow

The intended 24/7 flow is:

1. A human drops a task Markdown file into an inbox folder.
2. A daemon notices the file and turns it into a task pack.
3. Ralph creates an isolated worktree for the run.
4. One provider acts as implementer.
5. Multiple providers act as reviewers.
6. Codex runs as the persistent master reviewer.
7. A fix pass can be executed if the review warrants it.
8. The run is archived with prompts, outputs, diffs and logs.

Nothing merges automatically into the main branch. Every result must be inspected through the persisted run artifacts.

## 3. Core directories

The architecture should use these directories under `ralph/`:

- `ralph/tasks/inbox/`
- `ralph/tasks/processing/`
- `ralph/tasks/completed/`
- `ralph/tasks/failed/`
- `ralph/tasks/current/`
- `ralph/runs/`
- `ralph/worktrees/`
- `ralph/state/`
- `ralph/logs/`

Suggested responsibilities:

- `inbox/`: raw Markdown tasks waiting to be picked up
- `processing/`: task packs currently under execution
- `completed/`: successful completed task packages
- `failed/`: task packages that failed validation or execution
- `current/`: optional manually curated task pack for one-off runs
- `runs/`: immutable run artifacts for each execution
- `worktrees/`: isolated git worktrees per run
- `state/`: machine-readable live state for dashboards and automation
- `logs/`: daemon and runner logs

## 4. Task contract

Every task should begin as a single Markdown file.

The file can be simple or structured, but the daemon should be able to extract:

- task goal
- acceptance criteria
- context
- constraints
- expected outputs

If a section is missing, the system should fall back to the standard task pack templates.

Recommended minimum structure:

```md
# Task Title

## Goal
What needs to be built or fixed.

## Acceptance Criteria
- measurable outcome 1
- measurable outcome 2

## Context
Relevant background, references or links.

## Constraints
- scope limits
- runtime limits
- no-go areas
```

The parser should prefer explicit headings, but it should also work if the task is just a plain instruction block.

## 5. From Markdown to task pack

The daemon should convert the inbox Markdown into a task pack directory.

Each task pack directory should contain at least:

- `TASK.md`
- `ACCEPTANCE.md`
- `CONTEXT.md`
- optionally `SOURCE.md`

Recommended behavior:

- if the source Markdown has no acceptance section, insert the standard acceptance template
- if the source Markdown has no context section, insert the standard context template
- preserve the original task Markdown as `SOURCE.md` for traceability
- record the original file name, timestamp and run id in metadata

This keeps the task pack stable even when the original inbox file is messy.

## 6. Daemon responsibilities

The daemon is the always-on process that watches the inbox and triggers runs.

Its responsibilities are:

- watch `ralph/tasks/inbox/` for new `.md` files
- move a file into a processing folder before starting work
- generate a run id
- build the task pack directory
- launch the autopilot runner against that task pack
- move the task into archive or failed state after execution
- write live state into `ralph/state/`
- keep logs in `ralph/logs/`

The daemon should be conservative:

- process one task at a time
- prevent duplicate execution with a lock or mutex
- never run two autopilot jobs on the same inbox item
- never delete original task content

## 7. Scheduled Task behavior on Windows

The daemon should be started and kept alive by a Windows Scheduled Task.

Recommended behavior:

- trigger at logon or startup
- run with the current user context when possible
- restart on failure
- keep a log file for stdout and stderr
- avoid hidden state outside the repository

The Scheduled Task should only be a launcher. The actual orchestration logic belongs in the daemon.

This separation matters:

- Scheduled Task = persistence and recovery
- daemon = queue processing and orchestration

## 8. Worktree isolation

Every run must happen in a dedicated git worktree.

That is a hard rule.

Why:

- providers can make destructive or exploratory edits without polluting the main tree
- diffs are easier to review
- a failed run can be inspected after the fact
- multiple runs can coexist without overwriting each other

Recommended worktree lifecycle:

1. create a detached worktree from the current HEAD
2. run the implementer in that worktree
3. capture the diff after the first pass
4. run reviewers against the diff
5. run Codex against the same artifacts
6. optionally run a fix pass in the same worktree
7. keep the worktree for inspection or cleanup later

The main tree should remain untouched unless a human explicitly decides to merge.

## 9. Provider flow

Ralph should treat providers as roles, not as equal interchangeable machines.

Recommended role split:

- implementer: one model does the first pass
- reviewers: multiple models critique the diff
- Codex master: final senior review and sprint writer

Recommended policy:

- do not let the implementer self-approve
- do not accept a run based only on compilation
- do not trust a single provider's summary if the persisted artifacts disagree
- do not let reviewers edit the main tree directly

## 10. Codex master reviewer

Codex should act as the final reviewer, not the only actor.

Its job is to:

- read the task pack
- read reviewer outputs
- read the worktree diff
- compare claims against repository truth
- point out missing validations
- write the next sprint document when the run is incomplete

Codex is especially valuable for:

- overclaim detection
- acceptance verification
- long-lived project memory through a persistent session
- final synthesis after multiple providers disagree

The Codex master should always prefer persisted facts over model self-reporting.

## 11. Suggested run lifecycle

This is the recommended state machine for a single task:

### 11.1 Ingest

- inbox file appears
- daemon moves it to processing
- task pack is created
- run id is assigned

### 11.2 Implement

- isolated worktree is created
- implementer receives the task pack and worktree
- implementer writes code changes only inside the worktree

### 11.3 Review

- reviewers receive the task pack plus the diff
- reviewers produce structured critiques
- reviewer output is persisted in the run folder

### 11.4 Codex master review

- Codex receives the task pack, diff and reviewer outputs
- Codex decides whether the work is actually complete
- if the work is incomplete, Codex writes the next sprint

### 11.5 Fix pass

- if the reviews identify high-signal issues, a fix pass runs in the same worktree

### 11.6 Archive

- run artifacts are finalized
- task is archived or marked failed
- state files are updated

## 12. Run artifacts

Every run should leave a complete artifact trail.

At minimum, the run folder should contain:

- the task pack files
- the implementer prompt
- reviewer prompts
- reviewer outputs
- Codex master output
- patch files
- status snapshots
- summary markdown
- state metadata

This is what makes the system auditable.

If a model says the task is complete, the artifacts should prove it.

## 13. Validation rules

Ralph should not accept success based on compile-only checks.

The validation standard should be:

- code compiles
- runtime behavior is validated when relevant
- the diff matches the claim
- reviewer feedback is addressed
- Codex master agrees the task is actually complete

For Ableton or MCP work, runtime truth matters more than static validity.

## 14. Recommended failure handling

The daemon should treat these as failures or partial failures:

- provider timeout
- missing prompt artifact
- empty diff when changes were expected
- divergence between claimed result and persisted state
- Codex master flags unresolved acceptance issues

On failure:

- persist the error
- mark the run as failed
- preserve the worktree and logs
- move the task to `failed/`

Failure should be informative, not destructive.

## 15. Security and secrets

Provider tokens must not be embedded in task files or committed docs.

Recommended practice:

- keep live secrets in local config or environment variables
- never paste tokens into a task Markdown file
- rotate any token that is accidentally exposed outside the machine

The task inbox is for instructions, not secrets.

## 16. Operational examples

### Submit a task

Drop a file into the inbox:

```text
ralph/tasks/inbox/2026-04-03-fix-automation.md
```

### Process one task manually

The daemon should be able to run a single pass and exit.

### Run continuously

The daemon should stay alive, poll the inbox, and process tasks as they arrive.

### Review a completed run

Inspect:

- `ralph/runs/<run-id>/SUMMARY.md`
- `ralph/runs/<run-id>/reviews/`
- `ralph/runs/<run-id>/outputs/`
- `ralph/runs/<run-id>/implementer.patch`

## 17. What this architecture is not

This is not:

- a blind autonomous coder with no review loop
- a merge bot
- a cloud-only pipeline
- a single model pretending to be a team
- a system that trusts one summary file over the actual repository state

## 18. Recommended next implementation steps

If this architecture is implemented in code, the first concrete pieces should be:

1. inbox daemon
2. Windows Scheduled Task installer
3. Markdown task submitter
4. state snapshot writer
5. Codex master review wrapper
6. dashboard refresh hooks

That sequence gives the highest leverage with the lowest risk.

## 19. Final rule

The system is only healthy if it can keep running, keep explaining itself, and keep proving its claims with files on disk.

That is the standard for a real 24/7 Ralph pipeline.