Deep Dive

Recursive Intelligence

How MARIA OS implements recursive self-improvement — from chat-driven learning to deep reflection loops.

CHAT RECURSIVE LOOP

Chat Does Not Generate. Chat Improves.

Output is audited, results flow back into internal state, and the next judgment structure changes. This is structural recursion.

Quality Loop Pipeline

User InputResponseEvidence ExtractQuality ScoreDelta AnalysisPrompt RefactorRe-Execute

Quality State Update

Qt+1 = Qt + α(Scoret − Target) − β · Driftt

α: improvement learning rate. β: drift suppression coefficient. Quality converges toward Target while suppressing deviation.

Internal State Machine

IdleRespondSelf-EvaluateError DetectRefactorRe-Respond

Convergence Conditions

Drift Rate< ε
Quality DeltaΔQ < δ
Evidence Coherence> 99%

Dashboard Metrics

Logic CoherenceEvidence DensityStructural StabilityDrift DetectionRecursion Count

Humans evaluate by feeling. MARIA OS evaluates by structure.

CONCRETE CHAT IMPACT

What Changes for Users in Practice

Move from one-shot chat responses to recursive chat with built-in evidence audit and re-evaluation. Not only the output, but the next decision structure itself gets updated.

Before: Single-Pass Chat

[Output]

Draft answer emitted without contradiction audit.

[Evidence]

Sparse citations, weak traceability, manual rework needed.

After: Recursive Audited Chat

[Loop]

Output → audit → delta check → rewrite → re-verify.

[State Update]

Failed patterns are persisted and weighted in next judgment.

Agent Teams Runtime Image

Planner

Builds hypothesis tree and action structure.

Critic

Finds contradiction, drift, and unsupported claims.

Verifier

Checks evidence alignment and gate compliance.

PlannerCriticVerifierUpdated Policy

Observed Telemetry (Chat Layer)

Completed Task Ratio / Job

+28.6pt
57.1%85.7%

Action Trace Density / Job

+27.2%
10.513.3

Persisted Artifacts / Job

+50.0%
4.06.0

Delivery Gate Blocked Rate

-100pt
100%0%

Source: artifacts/run-to-done/job_*.json (26 jobs, 2026-02-13 to 2026-02-14 UTC). Before=waiting_approval cohort (n=13), After=completed cohort (n=13).

UNIVERSE RECURSIVE LOOP

Universe Does Not Just Expand. It Evolves While Stabilizing.

Artifacts are audited, structural gaps detected, policies refactored, and re-deployed. Quality and governance improve with every cycle.

Production Loop Pipeline

SpecAgent ExecArtifactAudit EngineGap DetectPolicy RefactorRe-Deploy

Artifact Quality Update

At+1 = At + γ(Compliance − Risk) + λ · InsightGain

γ: governance strength. λ: learning efficiency. Quality improves when compliance exceeds risk, accelerated by insight.

Stability Conditions

01Gate density > minimum threshold
02Responsibility boundaries explicitly defined
03Convergence function has negative gradient
dR/dt < 0  ∧  dError/dt < 0

Evolution Dashboard

Decision ThroughputGate Trigger RateHuman Override RatioRisk ContainmentImprovement Velocity

Chat + Universe =

Generative AI → Judgment OS

Chat improves output. Universe improves structure. Together they become a Judgment OS.

CONCRETE UNIVERSE IMPACT

Visualizing the Impact of Recursive Self-Improvement

In the Universe layer, artifact audits and policy redesign run continuously, raising quality and governance together. Changes are logged and reflected in the structure of the next cycle.

Governance Control Room Image

Detect

Audit engine flags policy and structure gaps.

Refactor

Rules and responsibility gates are rewritten.

Validate

Replay tests verify regression and compliance.

Deploy

Approved policy set is promoted to runtime.

SpecArtifactAudit ReportPolicy v(t+1)

Structural Meaning

1. Update decision rules themselves, not just output fixes.

2. Record errors and reflect them in the next design, staffing, and gate decisions.

3. Keep a recursive cycle that optimizes quality and governance together.

Observed Telemetry (Universe Layer)

Evidence Archive Completion

+100pt
0%100%

Delivery Gate Approved Rate

+100pt
0%100%

Pending Approval Task Rate

-14.3pt
14.3%0%

Tasks Completed per Job

+50.0%
4/76/7

Source: artifacts/run-to-done/job_*.json (26 jobs, 2026-02-13 to 2026-02-14 UTC). Before=waiting_approval cohort (n=13), After=completed cohort (n=13).

Agent Teams Deployment Example

Design Agent

Coverage 92%

Audit Agent

Detection 88%

Ops Agent

Recovery 84%

MEMORY STRATIFICATION

Estimate Interest from the Pulse of Short-Term Memory. Save only what matters. Recall only when needed.

Memory Pipeline

1Input Stream → Short-Term Memory Buffer
2Keyword Pulse Detector (freq, recency, revisit, emotion)
3Interest Vector Builder → I update
4Long-Term Memory Router (Gate decision)
5Recall Planner → on-demand retrieval
6Response Composer → optimized output

Interest Score per Keyword

S(k) = w1·freq + w2·recency + w3·revisit + w4·emotion − w5·noise

freq: occurrence count. recency: recency weighting. revisit: returned after absence. emotion: co-occurrence with emphasis. noise: transient suppression.

Interest Vector Update

It+1 = normalize(It + η · St)

Interest vector accumulates weighted keyword scores and re-normalizes each cycle.

Gate Design

Save Gate

Only save if: high repetition, impacts decisions, or high reuse value. Raw logs prohibited — summaries only.

Recall Gate

Never always-on. Retrieve only when needed. Preserves natural conversation flow.

Memory Save Gate — YAML

gate_engine:
  name: "memory-save-gate"
  defaults:
    fail_closed: true
    store_mode: "summary_only"
    pii_policy: "block"
  rules:
    - id: "MS-01-block-pii"
      if: { signal: "contains_pii", value: true }
      then: { action: "deny" }
    - id: "MS-02-allow-stable-preference"
      if: { freq: ">= 3", revisit: ">= 1" }
      then:
        action: "allow"
        store: { format: "canonical_summary" }
    - id: "MS-05-require-user-consent"
      if: { sensitivity: "high", emotion: ">= 0.7" }
      then: { action: "ask_user" }
    - id: "MS-06-fallback-deny"
      then: { action: "deny", reason: "Fail-closed" }

Estimate interest from short-term pulses. Save only what matters. Recall only when needed.

DEEP REFLECTION LOOP

From Frequent Themes to Latent Intent. Detect value conflicts. Generate hypotheses. Verify with evidence.

Reflection Pipeline

1Interest Vector I → Theme Graph Builder
2Contradiction & Tension Finder
3Latent Hypothesis Generator (multiple H)
4Evidence Test Gate → discard ungrounded
5Reflection Question Synthesizer
6Update Policy Set → adjust dialogue strategy

Latent Hypothesis Format

H = { driver, fear, value, constraint, desired_future }
drivermotivational source
fearoutcome to avoid
valuejudgment criterion
constraintreal-world limitation
desired_futureaspired outcome

Hypothesis Scoring

Score(H) = a·support + b·predictability + c·stability − d·intrusiveness

support: evidence from conversation. predictability: ability to predict next utterance. stability: robustness over time. intrusiveness: risk of overreach.

Safety Design

Privacy Gate

User can halt unwanted deep dives. Depth levels are staged.

Explainability Gate

If the reasoning for a hypothesis cannot be summarized, it is not surfaced.

Dialogue Rule

Never assert directly. Present as hypothesis, verify with confirmation question.

Hypothesis: "Safety over achievement seems prioritized"
Question: "Is it the outcome or the people involved that you want to protect?"

Detect tensions, generate hypotheses from evidence, verify through dialogue. Intent clarifies as conversation progresses.

TRIGGER RULES

Auto-Trigger Deep Dives from Short-Term Pulses.

12 observation signals, 3 invasiveness levels. Hypotheses are verified by evidence. Conversation naturalness is never broken.

IDSignalLvlAction
TD-01FrequencyL1Summarize + verbalize interest
TD-02RevisitL2Present 2 hypotheses + confirm
TD-03SpikeL2Propose deep-dive candidate
TD-04Co-occur clusterL2Name theme + structured Q
TD-05EmphasisL2Check value or fear
TD-06Emotion shiftL3Safety check + pace adjust
TD-07Open questionL3Surface + propose order
TD-08Value conflictL3Conflict hypothesis + priority Q
TD-09Fixed termL1Fix definition + glossary
TD-10AvoidanceL2Peripheral exploration
TD-11Decision proximityL2Decision frame + options
TD-12High reuse valueL1Propose template + save

L1Low Invasiveness

Summarize, verbalize interest, offer choices. No flow disruption.

L2Mid Invasiveness

Present multiple hypotheses, verify with questions. Name the theme.

L3High Invasiveness

Value conflicts, fears, constraints, desired futures. Consent required.

Auto-trigger deep dives from short-term pulses. Hypotheses verified by evidence, never breaking conversational flow.

ADAPTIVE RESPONSE ENGINE

Deep Dives Are Not One-Size-Fits-All. Adapt depth and expression to the user.

User Model — 4 Axes (KICS)

U = [K, I, C, S]
K — Knowledge Levelclarification_rate, correct_usage
I — Interest Intensitykeyword_freq, revisit_rate
C — Cognitive Resiliencemulti_step_acceptance, abstraction
S — Communication Styleverbosity, directness, tone

User Model Update

Ut+1 = (1 − ρ) Ut + ρ · f(xt)

ρ: adaptation rate (0.1–0.25). x_t: observation vector from latest conversation turn. Never fixed — continuously updated.

Deep Dive Intensity

d = σ(a1I + a2·tension + a3·decision_prox − a4·intrusiveness)

Sigmoid-bounded intensity. High interest + tension = deeper dive. High intrusiveness risk = suppression. Mapped to L1/L2/L3.

Template Selection — Optimal Utility

t* = argmaxt E[wA·Align + wL·Learn + wS·Safety − wF·Friction]

Alignment: matches interest vector. Learning: clarifies intent. Safety: within invasiveness bounds. Friction: pushiness penalty.

Template Selector — YAML

deep_dive_engine:
  name: "adaptive-deep-dive"
  defaults:
    fail_closed: true
    require_consent_level: 3
  math:
    deep_dive_intensity:
      formula: "sigmoid(a1*I + a2*tension
        + a3*decision_proximity
        - a4*intrusiveness_risk)"
    readability_target:
      formula: "r0 + r1*(1-K) + r2*(1-C)"
  templates:
    - id: "T-A-simple"
      when: "K <= 0.45"
      parts: [mirror, summary, hypothesis, Q]
    - id: "T-B-structured"
      when: "C >= 0.55 and tension >= 0.45"
      parts: [mirror, structure, conflict, Q]
    - id: "T-C-sensitive"
      when: "emotion >= 0.70"
      parts: [safety_check, soft_summary, Q]

Adaptive Weight Update

wt+1 = wt + η(yt − ŷt) · ∂Utility/∂w

y_t: observed user response quality. ŷ_t: expected response. η: learning rate. Weights converge to optimal template selection over sessions.

Deep dives are not one-size-fits-all. Depth and expression auto-optimize to the user. Safety gates control invasiveness.

INTEGRATION FLOW

End-to-End Pipeline. From short-term pulse to adaptive deep dive, closed-loop.

10-Step Execution Flow

01Window Build — last 40 utterances
02Keyword Pulse — freq, revisit, co-occur, emotion, avoidance
03Interest Vector Update — I_t+1
04User Model Update — K, I, C, S
05Deep Dive Intensity — d → L1/L2/L3
06Template Utility Evaluate — argmax
07Response Plan Compose — parts + constraints
08Response Generate — from selected template
09Outcome Observe — y_t (response quality, continuation rate)
10Weight Update — w_t+1 (closed-loop learning)
wt+1 = wt + η(yt − ŷt) ∂U/∂w

Conversations compound. Each session makes the next more precise.

Recall Gate — YAML

gate_engine:
  name: "memory-recall-gate"
  defaults:
    fail_closed: true
    recall_mode: "on_demand"
    max_recall_items: 3
    min_relevance: 0.62
    pii_policy: "block"
  triggers:
    - id: "RG-01-explicit-request"
      if: { signal: "explicit_memory_request" }
      then: { action: "recall", max_items: 3 }
    - id: "RG-02-project-continuation"
      if: { project_continuation: true }
      then: { action: "recall", mode: "project_card" }
    - id: "RG-04-preference-needed"
      if: { preference_needed: true }
      then: { action: "recall", privacy: "strict" }
    - id: "RG-06-smalltalk"
      then: { action: "deny" }
  math:
    recall_necessity:
      formula: "sigmoid(b1*explicit + b2*coref
        + b3*missing - b7*intrusiveness)"

Operational Principles

Small talk never triggers recall — preserves naturalness
High intrusiveness without explicit request = deny
Save Gate and Recall Gate are separate systems
Reason for recall must be explainable — or it is not used

Short-term pulse → user model → template select → learning update. End-to-end, closed-loop. Precision improves with every conversation.