MARIA OS

DECISION OS
FOR AGENT
COMPANIES

Encode human judgment as an OS. AI Agents execute the business.

Not running AI. Running decisions.

Most AI tools chain prompts to accelerate automation. But what companies truly need is not just automation — where to delegate to AI, where to stop, and where humans take responsibility — this structure of decision-making. MARIA OS defines leader judgment as an operating system and transforms it into execution by AI Agents.

Preserve human authority while scaling AI execution
Make implicit judgment reusable as structured decisions
Prevent AI autonomy from becoming organizational risk through governance

This is for

  • Organizations where AI Agents execute real operations and decisions
  • Leaders who prioritize responsibility and governance, not just speed
  • Companies that demand consistent and reusable judgment structures

This is not for

  • AI tools that just chain prompts faster
  • Full automation that eliminates human responsibility
  • AI that replaces leadership judgment

Dynamic Harness / Main Concept

Autonomy needs a harness, not a bigger prompt.

MARIA OS treats each agent as a moving state in a phase space. The harness observes drift, tightens constraints, and expands autonomy only when quality, trust, and responsibility remain stable.

Intent
goal stability
Memory
context integrity
Authority
safe autonomy
x(t) phase space
H(t) control surface
drift
harness
stable path
Observe drift
Tighten constraints
Release autonomy

Implementation Pattern / Spinal Reflex Wiring

How to implement spinal-reflex neural wiring for AI agents

Do not send every known stimulus to a large model. MARIA OS routes routine, bounded, accountable events through reflex arcs, while ambiguous or risky cases rise into deliberation.

Fast path for known work. Deliberative path for unknown work.

layer 1
stimulus
layer 2
reflex arc
layer 3
harness
layer 4
envelope
layer 5
trace

01

Normalize every event into a stimulus packet

A reflex cannot fire from raw text alone. First, the system converts messages, forms, workflow changes, API callbacks, and document updates into typed stimulus packets with context, actor, object, risk, and current state.

The key is not intent detection. The key is operational typing.

raw input
stimulus packet
actorprincipal
objectresource
riskscore
statephase
authorityscope

02

Route known stimuli through bounded reflex arcs

A reflex arc is a predesigned execution path for a known class of work: reject incomplete input, classify a request, stop a prohibited transfer, attach evidence, escalate a high-risk case, or run a deterministic workflow.

A reflex is not a shortcut. It is a decision that has already been designed.

known / low risk
reflex arc
deterministic
ambiguous / novel
deliberation
LLM + human
missing authority
fail
closed
Reflex selection matrix

The routing decision is explicit: reflex for known and bounded work, deliberation for ambiguous work, fail-closed for missing authority.

03

Wrap each reflex in static and dynamic harnesses

Static harnesses define fixed authority, data, tool, and prohibition boundaries. Dynamic harnesses adjust the allowed range at runtime based on risk, confidence, state, deadline, and audit conditions.

The reflex moves fast only inside a governed action space.

04

Execute only through Envelope responsibility contracts

If the Envelope is missing or invalid, the action fails closed.

control wrapper
Wrap the reflex
reflex
check
static boundary
check
runtime risk
check
multi-stage control
check
stop condition
responsibility contract
Carry accountability
reflex
check
owner
check
authority scope
check
purpose scope
check
failure route

05

Observe, tune, and promote field patterns into OS assets

Each reflex produces traces: fired, blocked, escalated, overridden, or rolled back. FDE teams use those traces to tune local reflexes, then promote stable patterns into reusable MARIA OS assets.

Field implementation becomes platform learning.

MARIA OS
fired
blocked
returned
tuned
promoted
FDE traces become reusable reflex libraries

Operational Governance

The moat is not autonomy. It is knowing when autonomy must stop.

MARIA OS treats stopping, recovery, evidence, and human escalation as production paths, not exception handling. Internally, recovery paths are stressed aggressively. In customer environments, HITL stays heavier until trust, evidence, and repetition justify more autonomy.

Read the assessment

What we measure

Runtime proof

Fail-closed

Stop when authority, evidence, or context is insufficient.

Auto-recovery

Recover internally with causal logs and post-recovery checks.

HITL convergence

Reduce repeated human review only after the workflow proves stable.

Responsibility envelope

No execution path is valid without an accountable owner.

Features & Products

See the reality. Fix the structure. Run it every day.

We don't connect products with integrations. We align them with judgment.

01–06 Universe
07–10 Service
11–16 Platform
01
Sales Universe

Sales Universe

AI agent teams that execute deals with judgment-aware automation. Every deal stage has a specialist.

Learn more
02
Audit Universe

Audit Universe

Not a faster spreadsheet. A reproducible audit engine where every finding carries evidence.

Learn more
03
FAQ Universe

FAQ Universe

Auto-generates FAQ from real documents. Every answer cites source, page, and evidence quality.

Learn more
04
Auto-Dev Universe

Auto-Dev Universe

Agents build, test, review, and deploy. Not 'AI writes code' — the database authorizes changes.

Learn more
05
CPA Universe

CPA Universe

AI agents that study for CPA exams using knowledge graphs and spaced repetition — governed by evidence.

Learn more
06
Meeting Universe

Value Scanning

Stated values vs. practiced values. See where your organization's behavior contradicts its beliefs.

Learn more
07
Value Scanning

Workflow Scanning

Scan real processes, identify waste, responsibility gaps, and bottlenecks. Prescribe recomposition.

Learn more
08
Workflow Scanning

MVV OS Consulting

Turn your mission, vision, and values into executable governance. Philosophy becomes operating constraints.

Learn more
09
MVV OS Consulting

Agentic Company Insight

Assess your company's agentic maturity. Find where agents can act, where humans must decide, and where risk gates are missing.

Learn more
10
Agentic Company Insight

MARIA Voice

Speak to your Decision OS. Voice commands become governed actions.

Learn more
11
MARIA Voice

AI Office

A virtual office where AI agents work as departments — HR, Finance, Legal, Dev — governed by MARIA OS.

Learn more
16
MARIA BOOKING

CEO Clone OS

Voice interview, Decision OS, 5KB Genome, meeting agents, approval gates, integrations, and Doctor Agent repair.

Learn more
12
AI Office

MARIA VITAL

Life support OS for agent organizations. Continuously monitor behavior health, judgment quality, coordination state, and recoverability.

Learn more
13
CEO Clone OS

Agentic Company

From Human Company to Self-Improving — a structured evolution path with governance at every stage.

Learn more
14
MARIA VITAL

Life Support OS for Agent Orgs

Continuously monitor agent vitals — behavior health, judgment quality, coordination state, and recoverability.

Learn more
15
Agentic Company

The Destination, Not a Feature

From Human Company to Self-Improving — a structured evolution path with governance at every stage.

Learn more
MARIA OS

See → Fix → Run

Harness Adoption Map

Every LP surface gets a harness placement.

Cross harnesses share episodes, gates, scorecards, and quarantine. Individual harnesses control the failure modes unique to Sales, Audit, Voice, Meeting, and the other surfaces.

17
surfaces
68
controls
17
dynamic

Raw harness

Turns inputs, evidence, turns, and diffs into episodes

Cross harness

Shared gates and scorecards across products

Dynamic harness

Adjusts constraints and autonomy from drift

Universe runtimes

6

Sales Universe

G1.U1.P1.Z1.A1

P1

Deal Evidence Intake Harness / Deal Phase Harness

Attach episode scoring to proposal and estimate generation.

Audit Universe

G1.U1.P2.Z1.A1

P0

Evidence Chain Harness / Procedure-Specific Audit Harness

Evaluate every generated finding through evidence completeness and risk-tier gates.

FAQ Universe

G1.U1.P3.Z1.A1

P1

Source Crawl Harness / FAQ Voice Harness

Add source freshness and public-release gates to generated FAQ artifacts.

Auto-Dev Universe

G1.U1.P4.Z1.A1

P0

Diff Episode Harness / Repository-Specific Dev Harness

Attach dynamic harness scoring to CI failure triage and repair proposals.

CPA Universe

G1.U1.P5.Z1.A1

P2

Learning Evidence Harness / Exam Domain Harness

Gate pass readiness with source validity and repeated-correction signals.

Meeting Universe

G1.U1.P6.Z1.A1

P0

Consent Episode Harness / Meeting Phase Harness

Extend gate evaluation with harness interventions and episode severity.

Scanner & service loops

5

Decision Scanner

G1.U2.P5.Z1.A1

P0

Decision Evidence Harness / Decision Context Harness

Score live decision scans with evidence density, branch risk, and authority-gate pressure.

Value Scanner

G1.U2.P1.Z1.A1

P0

Value Evidence Harness / Executive Values Harness

Add harness confidence and evidence density to value scan summaries.

Workflow Scanner

G1.U2.P2.Z1.A1

P0

Process Evidence Harness / Workflow Domain Harness

Score recompose plans with flow-drift and evidence-density controls.

MVV OS Consulting

G1.U2.P3.Z1.A1

P1

MVV Interview Harness / CEO Clone Harness

Add contradiction and rule-enforceability scoring to CEO Clone outputs.

Agentic Company Insight

G1.U2.P4.Z1.A1

P2

Role Mapping Harness / Department Harness

Add role autonomy confidence and rollback conditions to insight output.

Platform surfaces

6

MARIA Voice

G1.U3.P1.Z1.A1

P0

Turn Episode Harness / Voice Mode Harness

Attach harness severity to action-chat function-call rounds.

MARIA BOOKING

G1.U3.P6.Z1.A1

P0

Booking Conversation Harness / Reservation Phase Harness

Gate booking voice and calendar-sync episodes with consent, slot, and notification evidence.

AI Office

G1.U3.P2.Z1.A1

P1

Office Event Harness / Agent Lifecycle Harness

Score task-engine events with office-health and handoff-drift signals.

CEO Clone OS

G1.U3.P3.Z1.A1

P1

Judgment Sample Harness / Executive Persona Harness

Add contradiction density and identity-boundary scoring to elicitation outputs.

MARIA VITAL

G1.U3.P4.Z1.A1

P1

Vital Signal Harness / Agent Vital Harness

Unify vital signals with the runtime harness scorecard.

Agentic Company

G1.U3.P5.Z1.A1

P2

Company Phase Harness / Evolution Path Harness

Add phase advancement criteria and rollback triggers to Agentic Company stages.

17 surfaces -> raw intake -> cross gates -> individual dynamic control

Harness Installation Plan

MARIA Self-Healing Runtime turns failures into reviewable repair PRs.

The goal is not raw autonomous repair. It is safe autonomous repair: Failure Analyzer, Meta-Harness, Envelope, Memory Store, Human Approval Gate, and Loop Control collect the episode, classify confidence, plan the smallest repair, re-run local and cross harnesses, and preserve the learning.

29
candidates
20
P0 first
7
layers
collection loop
1collect
2analyze
3plan
4repair
5re-run
6learn
first five mechanisms

Three-Layer Failure Analyzer

01

Classify failures through deterministic signals, LLM root-cause hypotheses, and historical memory before any repair is attempted.

KPI: Misclassification rate

Harness Coverage Meta-Harness

02

Detect new APIs, screens, agents, integrations, permissions, and prompts that lack the required harness coverage.

KPI: Coverage gap rate

Fixer Agent Envelope Router

03

Route repairs into low, medium, high, or memory-write envelopes so the fixer cannot exceed its authority.

KPI: Unauthorized mutation count

Failure Memory Store

04

Store failure evidence, cause, patch rationale, rerun result, side effects, review notes, human reviewer rationale, and prevention rules as reusable assets.

KPI: Repeat failure rate

Risk Calibration Ledger

05

Compare runtime risk scores, monitor findings, reviewer decisions, and later incidents so expert-prior thresholds can be calibrated from operational evidence.

KPI: Calibration error

PR-First Regression Loop

06

Make the final unit of autonomous repair a reviewable PR with human approval gates, loop controls, and scoped, cross, meta, deploy, and post-deploy harness evidence.

KPI: Autonomous repair success rate

Spec Contract Harness

G1.U4.P1.Z1.A1

P0
StaticPhase 1

observes

undefined errorsfield driftmissing owners

control: Blocks implementation when API, UI fields, DB columns, and acceptance criteria disagree.

analyzer: Deterministic schema diff first, LLM review only for ambiguous requirement language.

envelope: May block implementation and draft spec diffs; may not approve scope changes.

coverage: Flags new API, DB, or screen files that lack a spec-contract episode.

first slice: Generate a schema-to-screen diff for product specs before agent work starts.

owner: Product Architecture

Prompt Policy Harness

G1.U4.P1.Z2.A1

P0
StaticPhase 3

observes

missing output contractunsafe delegationweak evaluation rubric

control: Quarantines prompts that lack prohibited actions, output format, evidence rules, or gate policy.

analyzer: Rule-based prompt checklist with memory lookup for prior prompt failures.

envelope: May quarantine prompts and propose edits; core authority prompts require reviewer approval.

coverage: Detects production prompts without output format, forbidden actions, or evaluation criteria.

first slice: Score production prompts for format, authority boundary, and evaluation coverage.

owner: Agent Governance

Client Data Preflight Harness

G1.U4.P2.Z1.A1

P0
PreflightPhase 2

observes

tenant scopePII classagent permission tier

control: Stops an agent before it reads customer data outside its contract, role, or approval state.

analyzer: Deterministic tenant and role policy evaluation before any LLM reasoning.

envelope: May deny or request approval; may not expand customer-data access grants.

coverage: Finds data retrieval paths without tenant, PII, and permission preflight checks.

first slice: Attach a preflight decision to every customer-data retrieval and exported artifact.

owner: Security

External Action Preflight Harness

G1.U4.P2.Z2.A1

P0
PreflightPhase 2

observes

blast radiusrecipient visibilitybusiness-hour policy

control: Routes public, financial, destructive, or production actions to human approval before execution.

analyzer: Structured action taxonomy with confidence threshold and human fallback.

envelope: May draft outbound actions; public, financial, destructive, and deploy actions require approval.

coverage: Reports external side-effect commands not covered by action preflight policy.

first slice: Gate outbound email, invoice issue, GitHub PR creation, and deploy commands with one policy matrix.

owner: Operations

Agent Runtime Telemetry Harness

G1.U4.P3.Z1.A1

P0
RuntimePhase 4

observes

latencytoken spendRAG hit ratetool-call loop

control: Detects drift during execution and changes route, model, retrieval scope, or escalation state.

analyzer: Metric thresholds plus failure-taxonomy classifier backed by similar runtime episodes.

envelope: May reroute, degrade, retry, or escalate; may not change authority policy while running.

coverage: Finds agent runs missing cost, retrieval, gate, and correction telemetry.

first slice: Normalize every agent run into a runtime episode with cost, retrieval, gate, and correction signals.

owner: Runtime Platform

Voice Call Stability Harness

G1.U4.P3.Z2.A1

P1
RuntimePhase 4

observes

turn gapsTTS failurerecognition restartemotion mismatch

control: Falls back to text, pauses tool execution, or escalates when voice state becomes unstable.

analyzer: Deterministic audio-state checks with LLM review for semantic or emotion mismatch.

envelope: May pause voice execution or switch channels; may not execute irreversible customer actions.

coverage: Flags voice flows without turn continuity, TTS completion, and fallback telemetry.

first slice: Score each voice turn for recognition continuity, TTS completion, and unsafe action pressure.

owner: Voice Platform

Artifact Evidence Review Harness

G1.U4.P4.Z1.A1

P0
Post-runPhase 3

observes

source matchamount mismatchmissing TODOunsupported claim

control: Returns generated artifacts for repair when evidence, numbers, deadline, or owner is missing.

analyzer: Structured source comparison first, LLM panel only for semantic support checks.

envelope: May return artifacts for repair; may not send customer-visible artifacts automatically.

coverage: Finds generated artifacts without source episode, owner, or review outcome.

first slice: Review proposal, SOW, estimate, and meeting-minute artifacts against their source episode.

owner: Quality

Model Routing Harness

G1.U4.P5.Z1.A1

P1
DynamicPhase 6

observes

confidence sloperetry densityprovider failurecost variance

control: Switches provider, narrows retrieval, downgrades autonomy, or regenerates queries from live signals.

analyzer: Scorecard slope and provider error analysis before model-choice LLM reasoning.

envelope: May switch models within approved tiers; budget or provider-policy changes require approval.

coverage: Detects model routes without confidence, cost, retry, and provider-failure records.

first slice: Add dynamic routing decisions to failed RAG and low-confidence answer episodes.

owner: Model Ops

CI Repair Harness

G1.U4.P6.Z1.A1

P0
AutonomousPhase 5

observes

failing joblog signaturechanged filesverification command

control: Creates scoped repair PRs, reruns failed jobs, and quarantines flaky harness paths.

analyzer: Log-signature classifier, deterministic changed-file mapping, then LLM patch planning.

envelope: May create scoped repair PRs; may not merge, deploy, or weaken required checks.

coverage: Finds CI checks, harness jobs, and changed surfaces missing repair coverage.

first slice: Convert CI failures into repair scope, candidate files, validation commands, and PR body.

owner: Auto-Dev

Company Operating Harness

G1.U4.P7.Z1.A1

P1
OrganizationPhase 8

observes

stalled dealfollow-up gapcontract-invoice mismatchbranch drift

control: Turns organizational anomalies into owner alerts, follow-up tasks, policy reviews, or repair workflows.

analyzer: Business-rule anomaly detection with memory lookup for repeated operating patterns.

envelope: May create tasks and escalation briefs; may not alter contracts, invoices, or staffing authority.

coverage: Finds business processes without event source, owner, SLA, or escalation route.

first slice: Connect CRM, contract, invoice, recruiting, and support events into one operating scorecard.

owner: Executive Office

Integration Contract Runtime Harness

G1.U4.P3.Z3.A1

P0
RuntimePhase 4

observes

schema driftrate-limit pressureconnector auth decaypartial sync

control: Blocks write paths when connector schema, auth, or idempotency state is unsafe and creates bounded repair work for the owning integration.

analyzer: Connector telemetry and contract snapshots are compared first, then ambiguous partial-sync cases are routed to LLM-assisted impact analysis.

envelope: May pause connector writes, degrade to read-only, or open repair tasks; may not rotate credentials or expand third-party scopes.

coverage: Flags integrations that lack schema snapshots, retry policy, auth expiry telemetry, or partial-write reconciliation.

first slice: Attach runtime contract checks to Salesforce, freee, Google Calendar, and storage sync episodes.

owner: Integration Platform

Approval Latency Review Harness

G1.U4.P4.Z2.A1

P1
Post-runPhase 5

observes

approval waitreviewer overridestale escalationdecision reversal

control: Converts slow or unstable approval paths into owner alerts, queue reshaping proposals, and gate-policy repair tickets.

analyzer: SLA and queue metrics are inspected deterministically before LLM review summarizes why approvals are delayed or repeatedly reversed.

envelope: May recommend reviewer reassignment, SLA changes, or gate copy updates; may not bypass approval or approve work on behalf of humans.

coverage: Finds human gates without explicit SLA, reviewer owner, escalation route, reversal tracking, or stale-approval handling.

first slice: Score finance, audit, deploy, and outbound customer approval gates for wait time and reversal patterns.

owner: Risk Operations

Memory Write Harness

G1.U4.P5.Z2.A1

P0
DynamicPhase 6

observes

memory mutationsource evidenceretention policycontradiction

control: Stages learning-store writes until source evidence, retention class, contradiction status, and rollback path are attached.

analyzer: Structured provenance checks and retention rules run before semantic contradiction review decides whether a memory write is safe.

envelope: May stage or reject memory writes and request reviewer rationale; may not permanently mutate shared memory without source evidence.

coverage: Detects memory-writing agents without provenance, retention class, reviewer route, rollback key, or contradiction scan.

first slice: Gate CI repair, workflow repair, and customer-operations memory writes with provenance and contradiction checks.

owner: Memory Platform

Deployment Canary Harness

G1.U4.P6.Z2.A1

P0
AutonomousPhase 7

observes

canary error ratefeature flag staterollback pathpost-deploy probe

control: Stops rollout and produces a rollback or flag-disable proposal when canary metrics exceed the approved blast-radius envelope.

analyzer: Deployment metrics, smoke probes, and flag diffs are checked first, with LLM analysis limited to summarizing blast-radius evidence.

envelope: May disable feature flags, stop rollout, or open rollback PRs; may not promote canaries to full rollout without approval.

coverage: Finds deployable surfaces without canary probes, flag owner, rollback command, post-deploy observation, or customer-impact tier.

first slice: Add canary probes and rollback evidence to Auto-Dev repair PRs and Vercel preview promotion.

owner: Release Engineering

Customer Operations Harness

G1.U4.P7.Z2.A1

P1
OrganizationPhase 8

observes

support backlogSLA breachrenewal riskincident comms gap

control: Turns backlog, SLA, renewal, and incident-communication gaps into routed owner work with draft evidence packs.

analyzer: Operational thresholds and account-health rules are evaluated first, then LLM review drafts customer-safe escalation summaries.

envelope: May create internal tasks and draft customer updates; may not send incident, renewal, or contractual messages without approval.

coverage: Flags customer operations flows without SLA owner, customer visibility tier, account-risk signal, or approved communication path.

first slice: Join support tickets, account health, renewal dates, and incident events into one customer-ops harness scorecard.

owner: Customer Operations

Frontend Render Contract Harness

G1.U4.P1.Z3.A1

P0
StaticPhase 1

observes

hydration riskserver-client boundaryroute metadataempty state

control: Blocks UI changes when route ownership, hydration boundaries, metadata, or user-visible fallback behavior is incomplete.

analyzer: Static route and component inspection checks client directives, async boundaries, metadata, and empty-state contracts before visual review.

envelope: May block component changes and propose boundary fixes; may not convert server components to client components without owner approval.

coverage: Flags new pages, layouts, or interactive components without render contract, loading state, empty state, or ownership evidence.

first slice: Run render-contract checks on product pages, dashboard panels, and experimental surfaces added in each PR.

owner: Frontend Platform

Responsive I18n Preflight Harness

G1.U4.P2.Z3.A1

P0
PreflightPhase 2

observes

missing translation keytext overflowlocale route driftmobile snap break

control: Stops pages from shipping when English and Japanese content, route availability, or mobile layout behavior diverge.

analyzer: Message-key diffs and viewport constraints are evaluated deterministically before visual checks review overflow or layout regressions.

envelope: May block release and propose copy or layout fixes; may not change product messaging intent without content owner review.

coverage: Finds locale-aware pages without message parity, mobile viewport coverage, overflow checks, or translated route validation.

first slice: Attach locale parity and mobile text-fit checks to blog, product, dashboard, and experimental pages.

owner: Frontend Platform

UI Visual Richness Harness

G1.U4.P2.Z2.A2

P0
Post-runPhase 2

observes

text-only viewportmissing product visualflat compositionlow color variety

control: Blocks market-facing visual acceptance when a route scores below the richness threshold and queues a UI-agent repair plan.

analyzer: Playwright captures first-viewport screenshots and deterministic DOM visual metrics, then emits scoped UI-agent repair tasks for low-scoring routes.

envelope: May draft visual improvement plans and low-risk UI patches; may not ship brand direction changes or remove governance evidence without review.

coverage: Finds public routes without enough primary visual asset density, color variety, layered surfaces, hierarchy, or screenshot evidence.

first slice: Score public routes above the fold and write screenshot-backed repair tasks for any page that feels visually underbuilt.

owner: Frontend Platform

Accessibility Visual Review Harness

G1.U4.P4.Z3.A1

P1
Post-runPhase 3

observes

focus trapcontrast driftaria gapcanvas blank

control: Returns UI surfaces for repair when keyboard navigation, focus management, contrast, labels, or visual rendering evidence is missing.

analyzer: Automated accessibility and screenshot checks run first, with LLM review only for ambiguous visual hierarchy or interaction clarity.

envelope: May return UI artifacts for repair; may not waive accessibility regressions on production paths without documented approval.

coverage: Flags interactive screens without keyboard path, contrast check, semantic labels, screenshot evidence, or canvas fallback verification.

first slice: Add postrun accessibility and screenshot review to dense dashboards, voice UI, and canvas-heavy experimental pages.

owner: Design Systems

API Route Contract Harness

G1.U4.P1.Z4.A1

P0
StaticPhase 1

observes

input schema gapresponse driftstatus mismatchmissing coordinate

control: Blocks backend endpoints when request validation, response shape, error behavior, or governance coordinates are missing.

analyzer: Route-handler AST and schema checks validate methods, input parsing, status codes, and response shape before semantic contract review.

envelope: May block API route changes and draft schema repairs; may not alter public API semantics without product and backend approval.

coverage: Finds route handlers without input validation, typed response envelope, error taxonomy, MARIA coordinate, or test coverage.

first slice: Score new and modified app/api route handlers for validation, typed envelopes, and explicit error outcomes.

owner: Backend Platform

Auth Permission Preflight Harness

G1.U4.P2.Z4.A1

P0
PreflightPhase 2

observes

missing sessiontenant leakrole mismatchtool permission drift

control: Stops frontend, API, and agent actions before they cross tenant, role, data, or tool authority boundaries.

analyzer: Deterministic session, tenant, role, and tool-scope policy evaluation runs before any request or agent action mutates state.

envelope: May deny requests, downgrade to read-only, or request approval; may not grant roles, tenants, or tool permissions.

coverage: Flags server actions, API routes, and agent tools without session checks, tenant filters, role policy, or permission envelope.

first slice: Attach auth preflight results to write APIs, customer-data reads, agent tools, and external action routes.

owner: Security

DB Migration Preflight Harness

G1.U4.P2.Z5.A1

P0
PreflightPhase 2

observes

destructive migrationmissing indexrls policy gapseed drift

control: Stops DB changes when reversibility, tenant policy, data migration, index coverage, or test evidence is incomplete.

analyzer: Schema diff, migration operation, index coverage, and RLS policy checks run before reviewer-guided data-risk analysis.

envelope: May block migrations and draft reversible plans; may not apply destructive DB changes or relax RLS without explicit approval.

coverage: Finds schema changes without rollback, RLS impact, seed update, data backfill, index analysis, or integration-test plan.

first slice: Evaluate db/schema changes for destructive operations, RLS coverage, rollback path, and dependent API surfaces.

owner: Data Platform

Data Provider Runtime Harness

G1.U4.P3.Z4.A1

P1
RuntimePhase 4

observes

mock-live driftadapter timeoutshape mismatchfallback leak

control: Detects live adapter drift and switches views to bounded fallback states while routing repair work to the provider owner.

analyzer: Runtime adapter telemetry and response-shape checks compare mock and live provider contracts before fallback behavior is adjusted.

envelope: May degrade to mock-safe or read-only mode and open adapter repair tasks; may not silently mix tenant data across providers.

coverage: Flags data providers without mock-live parity tests, timeout policy, fallback state, tenant filter, or response-shape contract.

first slice: Monitor dashboard and product data providers for mock-live parity, adapter timeout, and shape mismatch episodes.

owner: Data Platform

Queue Cron Runtime Harness

G1.U4.P3.Z5.A1

P0
RuntimePhase 4

observes

missed tickduplicate jobstale lockbacklog growth

control: Prevents duplicate or stale scheduled execution and routes missed ticks, queue backlogs, and lock failures to bounded recovery.

analyzer: Schedule, idempotency, lock, and backlog telemetry are checked first, then historical incident memory ranks likely repair paths.

envelope: May pause jobs, skip duplicate ticks, or enqueue repair tasks; may not replay side-effecting jobs without approval.

coverage: Finds cron and background workflows without idempotency key, stale-lock handling, backlog metrics, or replay policy.

first slice: Add runtime checks to Civilization daily advancement, intelligence scans, and automation harness jobs.

owner: Runtime Platform

RAG Index Runtime Harness

G1.U4.P3.Z6.A1

P0
RuntimePhase 4

observes

index freshnesschunk-source mismatchretrieval misscitation gap

control: Blocks answer generation or downgrades confidence when retrieval freshness, source integrity, or citation coverage fails.

analyzer: Index timestamps, source hashes, retrieval hit rates, and citation coverage are checked before semantic answer support review.

envelope: May narrow retrieval, mark sources stale, or request reindex; may not publish unsupported answers or delete source corpora.

coverage: Flags ingestion and RAG paths without source hash, freshness SLA, retrieval metric, citation requirement, or reindex workflow.

first slice: Attach RAG freshness checks to FAQ, CPA, knowledge graph, and document-scanner answer episodes.

owner: Knowledge Platform

Streaming Output Runtime Harness

G1.U4.P3.Z7.A1

P0
RuntimePhase 4

observes

partial unsafe outputstream aborttool-call leakschema fragment

control: Stops or rewrites streamed output when partial content violates schema, authority, safety, or customer-visibility rules.

analyzer: Chunk-level schema, safety, and tool-call guards run during streaming before postrun review evaluates full artifact quality.

envelope: May stop streams, redact partial chunks, or fall back to safe summary; may not continue unsafe public output after a guard trip.

coverage: Finds streaming endpoints without chunk guard, abort policy, redaction path, final envelope validation, or audit trace.

first slice: Add chunk-level guards to audit chat, voice responses, workflow scans, and model-generated report streams.

owner: Model Ops

Trace Observability Harness

G1.U4.P5.Z3.A1

P0
DynamicPhase 6

observes

missing tracecoordinate gapmetric blind spotlog pii leak

control: Prevents blind autonomous execution by requiring traceable coordinates, redacted logs, owned metrics, and alert coverage.

analyzer: Trace coverage, coordinate presence, metric completeness, and PII log policy checks run before observability repair planning.

envelope: May add instrumentation tasks and block blind automation; may not expose sensitive logs or weaken retention policy.

coverage: Finds routes, jobs, agents, and UI workflows without trace ID, MARIA coordinate, metric owner, redaction, or alert rule.

first slice: Score new APIs, cron jobs, and agent workflows for trace coverage and coordinate completeness.

owner: Observability

E2E Journey Repair Harness

G1.U4.P6.Z3.A1

P1
AutonomousPhase 5

observes

journey failurevisual diffselector driftnavigation dead end

control: Creates scoped repair plans when user-critical flows fail through selector drift, visual regression, navigation, or data fixture mismatch.

analyzer: Playwright traces, screenshots, selector changes, and route diffs are classified before repair planning proposes the smallest UI or test fix.

envelope: May update scoped selectors, fixtures, and low-risk UI defects; may not delete user-critical assertions or weaken journey coverage.

coverage: Finds product-critical flows without E2E journey, screenshot baseline, responsive coverage, fixture owner, or failure fingerprint.

first slice: Attach E2E journey repair loops to booking, workflow scanner, audit office, and dashboard critical paths.

owner: Quality Engineering

Edge Cache Runtime Harness

G1.U4.P3.Z8.A1

P1
RuntimePhase 4

observes

cache poisoninglocale redirect loopstale pageheader drift

control: Detects stale, misrouted, or incorrectly cached responses and routes safe cache disablement or middleware repair proposals.

analyzer: Header, redirect, locale, and cache-control traces are checked deterministically before impact analysis reviews user-visible fallout.

envelope: May disable caching for affected routes or open middleware repair tasks; may not change global cache policy without approval.

coverage: Flags middleware and cached routes without cache-key policy, locale redirect tests, stale-content SLA, or header verification.

first slice: Monitor locale middleware, product pages, blog pages, and API cache headers for redirect and stale-content incidents.

owner: Web Platform
Universe Builder

Watch a zone come to life

universe-builder

Scroll to start building...

Build Sequence
Goal
Scope
Team
Responsibility
Skills
Build
Gates
Validate
Test
Deploy

Goal > Scope > Team > Responsibility > Skills > Build > Gates > Validate > Test > Deploy

Skills (K1-K8) are dynamically fetched and auto-refilled from Skill Store

Dynamic Harness

Harnesses control the phase space of an AI organization.

Instead of judging only final outputs, MARIA OS observes goal, memory, identity, quality, latency, cost, and authority as one state vector, then changes the trajectory before the system breaks.

x(t) = [g,m,i,t,q,l,c,a]
Goal
Memory
Identity
Trust
Quality
Latency
Cost
Authority
stable
adapt
quarantine

Observe the runtime

Normalize every agent run into a runtime episode with intent, memory, tools, gates, assets, latency, and corrections.

Classify the drift

Map failures to owner, severity, confidence, user visibility, and the verification command that can prove the fix.

Read the gradient

Track completion, pass rate, retry rate, advisory lift, and failure density as a time-varying scorecard.

Control the phase

Convert instability into reruns, quarantine, draft repair PRs, or human approval before autonomy can expand.

The episode extraction, failure taxonomy, scorecards, repair proposals, and controlled self-healing proven in virtual-talent become the runtime governance layer for MARIA OS and agentic society.

Read the research