ArchitectureMay 24, 2026|22 min readpublished

Dynamic Harness and Phase-Space Control: From virtual-talent to MARIA OS

Reframing runtime episodes, failure taxonomies, dynamic scorecards, repair proposals, and controlled self-healing as phase control for agentic society

Architecture ThesisReading label

A core MARIA OS thesis article. Read as a design and architecture position, not as a claim of new foundational theory.

Provenance:ARIA-RD-01G1.U1.P9.Z3.A1
Reviewed by:ARIA-TECH-01ARIA-QA-01

Abstract

The central question for agentic systems is shifting from model intelligence to runtime phase control. A long-running agent is not a single response generator. It is a dynamic system with goals, memory, identity, authority, quality, latency, cost pressure, and responsibility boundaries. Once those variables start moving together, a conventional evaluation harness can tell us whether one output passed, but it cannot tell us whether the system is drifting into retry loops, memory decay, identity fragmentation, or governance leakage.

This article defines the Dynamic Harness as a Runtime Governance Layer that observes, evaluates, and controls the phase space of an agent runtime. It connects MARIA OS research with implementation lessons from bonginkan/virtual-talent, where Producer AI already normalizes jobs into runtime episodes, classifies failures, builds dynamic scorecards, proposes repair scopes, and routes safe self-healing actions through explicit approval boundaries.

The result is a practical research frame: a harness is no longer only a test wrapper. It becomes the operating surface that converts runtime drift into reruns, quarantine, draft repair PRs, human approvals, policy changes, and measurable improvement loops.


1. From Test Harness to Control Harness

Traditional software harnesses isolate a unit, run it under fixed conditions, and compare the result against an expected contract. That remains necessary. Agentic systems still need type checks, schema checks, UI contracts, tenant boundaries, regression tests, and quality gates.

But agent runtime behavior is not a point. It is a trajectory. The system reads memory, chooses tools, coordinates agents, retries, hides failures, takes shortcuts under latency, and learns from prior outcomes. A single passing output may coexist with a worsening runtime phase: correction rates increase, retry loops thicken, identity signals degrade, or an advisory that once improved quality begins to poison future runs.

The Dynamic Harness therefore asks a different question: not only did this output pass, but what phase is the runtime entering?


2. The virtual-talent Reference Pattern

The virtual-talent Producer AI work provides a concrete implementation pattern. Producer jobs are normalized into runtime episodes. Each episode can include intent, stages, participating agents, quality gates, advisories, generated assets, retries, holds, failures, event counts, and duration.

That structure turns operational noise into a governable object. Once episodes exist, failures can be classified, owners can be assigned, scorecards can be produced, repair proposals can be scoped, and self-healing can be bounded.

Dynamic Harness layervirtual-talent patternMARIA OS expansion
Runtime episodeProducer job events become one analyzable unitDecisions, audits, sales flows, meetings, code changes
Failure taxonomyintent mismatch, identity drift, retry loop, provider failurememory drift, authority leak, responsibility mismatch
Owner mappingplanning, UX, quality, provider, platformPlanet, Zone, Agent, Human Gate, Executive Gate
Scorecardcompletion, pass rate, retry, advisory usagebusiness, trust, responsibility, and governance KPIs
Repair proposalscoped fix plus verification commandsPRs, policy updates, gate changes, memory pruning
Controlled healingrerun, quarantine, draft PR, human approvalfail-closed autonomy management

The important move is that the harness does not stop at diagnosis. It produces the next operational action.


3. Agent Runtime as Phase Space

MARIA OS can represent an agent runtime as a state vector.

x_t = [G_t, M_t, I_t, Q_t, L_t, C_t, R_t, A_t]

G_t: goal coherence M_t: memory integrity I_t: identity continuity Q_t: quality state L_t: latency pressure C_t: cost pressure R_t: responsibility demand A_t: authority boundary $$

The harness does not directly observe x_t. It observes logs, outputs, user corrections, gate decisions, tool calls, memory references, latency, cost events, and approval traces. The harness is therefore both an observation layer and a controller.

y_t = O(x_t) + noise

u_t = H(y_{0:t}) $$

The control input u_t may be a rerun, quarantine, draft repair PR, policy update, memory pruning, gate escalation, or human approval request. This makes the harness a runtime controller rather than a static checklist.


4. Phase-Level Failure Modes

A phase-level harness detects regions of self-reinforcing behavior. These states are not single failures. They are runtime attractors.

PhaseSymptomControl action
Stable productionQuality, latency, and correction rates are steadyLightweight monitoring
Retry loopThe same class of failure repeatsSuppress loop, hold, route to owner
Identity driftPersona, face, role, or voice continuity weakensIdentity gate, reference lock, memory pruning
Goal mutationThe agent optimizes away from the original goalGoal consistency check, human gate
Governance leakAuthority or responsibility boundaries blurFail closed, escalate approval
Latency freezeSlow paths collapse qualityBudgeted fallback, degradation policy
Advisory poisoningLearned guidance makes future runs worseON/OFF evaluation, quarantine

This is where Dynamic Harness becomes more than evaluation. It sees the slope, not just the point.


5. The Five-Layer Harness Stack

The minimal MARIA OS Dynamic Harness has five layers.

- Runtime Episode Layer. Normalize every meaningful agent action into a durable episode with coordinates, intent, memory, tools, gates, evidence, corrections, and final state.

- Failure Taxonomy Layer. Convert raw failure signals into typed failures with severity, confidence, owner, user visibility, suggested action, and verification.

- Dynamic Scorecard Layer. Track completion, quality pass rate, retry rate, human correction rate, advisory lift, owner failure density, duration, and release blockers over time.

- Repair Proposal Layer. Convert repeated failures and scorecard drift into scoped changes with tests or harnesses that can verify the improvement.

- Controlled Self-Healing Layer. Allow low-risk reruns or quarantine while requiring human approval for schema, deployment, global policy, core prompt, tenant boundary, or authority changes.


6. Why This Matters for MARIA OS

MARIA OS is not only an agent management surface. It is an operating system for human-agent organizations. That means it must govern the runtime, not merely orchestrate tasks.

The Dynamic Harness becomes the kernel boundary for autonomy. It determines when an agent can continue, when it must degrade gracefully, when a policy must be rewritten, when a memory should be pruned, when a draft PR is appropriate, and when the system must stop and return authority to a human.

This is also why the harness is a values layer. Values are not executed because they are written in a document. They are executed when the runtime knows when to stop, when to ask, when to quarantine, and when to preserve responsibility even if automation would be faster.


7. Research Agenda

Dynamic Harness research sits at the intersection of control theory, runtime assurance, anomaly detection, process mining, causal inference, and self-healing systems.

The open problems are clear:

  • Observability: infer hidden runtime state from partial logs, outputs, corrections, gates, and memory traces.
  • Causality: distinguish whether a quality lift came from a prompt, advisory, provider, memory, or random variation.
  • Stability: prevent self-healing from becoming control oscillation.
  • Topology: detect phase changes in high-dimensional agent state spaces.
  • Legitimacy: define who sets thresholds, who approves autonomy, and who audits the harness itself.

The last point matters most. A harness that controls autonomy is itself a governance object. It must be visible, testable, accountable, and bounded.


8. Conclusion

The next AI infrastructure race is not only about larger models. It is about the ability to operate intelligence without breaking it.

Static harnesses preserve contracts. Dynamic Harnesses control phases. Static harnesses say whether the build passed. Dynamic Harnesses say whether the runtime is drifting into a dangerous attractor and what action should happen next.

The implementation pattern emerging from virtual-talent gives MARIA OS a concrete path: runtime episodes, failure taxonomy, dynamic scorecards, repair proposals, and controlled self-healing. Extending that pattern from Producer AI to companies, governance systems, and agentic society is the next step.

To run intelligence safely, we need more than smart agents. We need harnesses that can observe the phase space, detect unstable attractors, and apply responsible control inputs before the system breaks.

R&D BENCHMARKS

State Vector

8 axes

Goal, memory, identity, quality, latency, cost, responsibility, and authority modeled as the agent runtime phase space.

Harness Layers

5 layers

Runtime episodes, failure taxonomy, dynamic scorecards, repair proposals, and controlled self-healing.

Control Actions

4 routes

Rerun, quarantine, draft repair PR, and human approval routes for runtime drift.

Governance Boundary

fail-closed

High-risk schema, deployment, global policy, core prompt, tenant boundary, and authority changes require human approval.

Published by Bonginkan and reviewed by the MARIA OS Editorial Pipeline.

© 2026 Bonginkan / MARIA OS. All rights reserved.