Experimental

Run-to-Done

A production runner where Agent Teams execute from request to completion. The system advances through requirements, planning, execution, verification, repair, delivery, and evidence. On failure, it loops back and converges instead of stopping mid-process.

Execution Atlas

One Request,
Full Completion Graph

S-1

Requirements

S-2

Plan

S-3

Execute

S-4

Verify

S-5

Repair

S-6

Deliver

S-7

Evidence

Responsibility Matrix

Governance by
Risk and Action

Research
External Action
Data Access
Delivery

Core band

ALLOW — autonomous

Approach band

REQUIRE_APPROVAL

Boundary band

ESCALATE_COUNSEL / DENY

These bands are a discretized read-out of the continuous Dynamic Harness verdict ladder — ALLOW → REQUIRE_APPROVAL → ESCALATE_COUNSEL → DENY are nested level sets of one norm ‖x‖w, and friction rises monotonically as the trajectory nears the boundary.静的な3段ではなく、ノルムの入れ子レベルセット。境界に近づくほど摩擦が単調増加する連続ラダーの離散表示。

Convergence Twin

Repair Loops Until
Done Definition Passes

Dynamic Repair Workflow

Cycle 1

While Loop GenerationStartInitializeVariablesConditionCheckEndExecute LoopBodyUpdateVariablesFalseTrue

Current Verdict

Verifier rejected: missing policy evidence

Risk Pulse

high

Loop Rule

If verify fails, route to repair and re-plan automatically.

Median Cycles

2.3

Gate Escalations

18%

Rollback Ready

100%

Pure Workflow with Loop Structure

1) Start: intake and normalize request
2) Initialize Variables: done_definition, risk_budget, retry_count, gates
3) Condition Check: done_definition_satisfied?
4) If FALSE: route to End (deliver + evidence + close)
5) If TRUE: execute loop body (plan -> execute -> verify -> classify failure)
6) Update Variables: retry_count++, patch constraints, refresh evidence state
7) Return to Condition Check
8) Exit rules: pass, max attempts reached, budget exhausted, policy denied, deadline breached

RUN PROGRESS × VITAL HEARTBEAT

Why Runs Complete, Not Just Start.

Runが「始まる」だけでなく「完走する」理由。

OVERALL PROGRESS0%
S-1Requirements要件定義
0%
S-2Plan計画
0%
S-3Execute実行
0%
S-4Verify検証
0%
S-5Repair修復
0%
S-6Deliver納品
0%
S-7Evidence証跡
0%

Requirements 要件定義 phase active

MARIA VITAL
Heartbeat
0.95

Agentの周期的活動信号

I/O Flow
0.88

入力→処理→出力の継続性

Decision Quality
0.92

基準に対する判断精度

Recovery Potential
0.97

自己回復か人間エスカレーションか

VITAL × Run-to-Done

Heartbeat低下 自動修復 / モデル切替 判断品質劣化 Repairフェーズへループバック 回復不能 人間エスカレーション + 証跡保全

WHY RUNS COMPLETE / Runが完走する理由

Done Definition satisfied — all criteria pass

Done Definitionが全基準を満たした

Convergence loop — repair cycles narrow failures to zero

修復ループが失敗をゼロに収束させた

VITAL Heartbeat — continuous health monitoring prevents zombie runs

VITAL Heartbeatが継続監視しゾンビRunを防止

Budget / Deadline gate — graceful exit with partial evidence

予算/期限ゲートで証跡付き正常終了

Drive withdrawn / max retry — freeze in place and escalate (no auto-revert)

駆動停止・最大リトライ → その場で凍結し人間へエスカレーション(自動巻き戻しはしない)

Harness Alignment

Run-to-Done Runs Inside
the Dynamic Harness

The loop above is the procedural skeleton. Governance lives in the harness it runs inside — a continuous, viscous boundary that freezes rather than guesses.上のループは手続きの骨格。ガバナンスは、それを包む連続・粘性のハーネスが担う。

Continuous verdict ladder

連続な判定ラダー

Risk is not three static tiers. ALLOW / REQUIRE_APPROVAL / ESCALATE_COUNSEL / DENY are nested level sets of one weighted norm; friction rises monotonically toward the boundary.

リスクは静的3段ではない。4判定は重み付きノルムの入れ子レベルセットで、境界に近づくほど摩擦が単調増加する。

Viscous boundary, freeze-in-place

粘性境界・その場で凍結

Near the wall, control is velocity-dependent drag (F ∝ −b(x)·ẋ), not a spring. When drive is withdrawn a run freezes where it is and escalates — it never auto-reverts, and the controller never resolves the freeze itself.

境界付近は保存力(バネ)ではなく速度依存の抗力。駆動が止まればその場で凍結しエスカレーション。自動巻き戻しはせず、凍結を自分で解かない。

Procedural vs result governance

手続き vs 結果ガバナンス

Run-to-Done guarantees the done_definition was followed with a faithful judgment trace captured at decision time. Whether the done_definition itself was the right outcome (result governance) stays a human call — the known v0.1 gap.

Run-to-Done が保証するのは「done_definition に沿って実行し、判断時点で忠実なトレースを取った」こと。done_definition 自体の妥当性(結果ガバナンス)は人間判断に残す既知の穴。

See Dynamic Harness and Quality & Governance.

Request Input
Agent Team Cabinet

Team cabinet appears after job creation.

Production Pipeline

0%

Requirements
Planning
Execution
Verification
Repair
Delivery
Evidence