Deep Dive

Recursive Intelligence

Name: MARIA OS
Author: MARIA OS

How MARIA OS implements recursive self-improvement — from chat-driven learning to deep reflection loops.

CHAT RECURSIVE LOOP

Chat Does Not Generate. Chat Improves.

Output is audited, results flow back into internal state, and the next judgment structure changes. This is structural recursion.

Quality Loop Pipeline

User InputResponseEvidence ExtractQuality ScoreDelta AnalysisPrompt RefactorRe-Execute

Quality State Update

Q_t+1 = Q_t + α(Score_t − Target) − β · Drift_t

α: improvement learning rate. β: drift suppression coefficient. Quality converges toward Target while suppressing deviation.

Internal State Machine

Idle→Respond→Self-Evaluate→Error Detect→Refactor→Re-Respond

Convergence Conditions

Drift Rate< ε

Quality DeltaΔQ < δ

Evidence Coherence> 99%

Dashboard Metrics

Logic CoherenceEvidence DensityStructural StabilityDrift DetectionRecursion Count

Humans evaluate by feeling. MARIA OS evaluates by structure.

CONCRETE CHAT IMPACT

What Changes for Users in Practice

Move from one-shot chat responses to recursive chat with built-in evidence audit and re-evaluation. Not only the output, but the next decision structure itself gets updated.

Before: Single-Pass Chat

[Output]

Draft answer emitted without contradiction audit.

[Evidence]

Sparse citations, weak traceability, manual rework needed.

After: Recursive Audited Chat

[Loop]

Output → audit → delta check → rewrite → re-verify.

[State Update]

Failed patterns are persisted and weighted in next judgment.

Agent Teams Runtime Image

Planner

Builds hypothesis tree and action structure.

Critic

Finds contradiction, drift, and unsupported claims.

Verifier

Checks evidence alignment and gate compliance.

PlannerCriticVerifierUpdated Policy

Observed Telemetry (Chat Layer)

Completed Task Ratio / Job

+28.6pt

57.1%85.7%

Action Trace Density / Job

+27.2%

10.513.3

Persisted Artifacts / Job

+50.0%

4.06.0

Delivery Gate Blocked Rate

-100pt

100%0%

Source: artifacts/run-to-done/job_*.json (26 jobs, 2026-02-13 to 2026-02-14 UTC). Before=waiting_approval cohort (n=13), After=completed cohort (n=13).

UNIVERSE RECURSIVE LOOP

Universe Does Not Just Expand. It Evolves While Stabilizing.

Artifacts are audited, structural gaps detected, policies refactored, and re-deployed. Quality and governance improve with every cycle.

Production Loop Pipeline

SpecAgent ExecArtifactAudit EngineGap DetectPolicy RefactorRe-Deploy

Artifact Quality Update

A_t+1 = A_t + γ(Compliance − Risk) + λ · InsightGain

γ: governance strength. λ: learning efficiency. Quality improves when compliance exceeds risk, accelerated by insight.

Stability Conditions

01Gate density > minimum threshold

02Responsibility boundaries explicitly defined

03Convergence function has negative gradient

dR/dt < 0 ∧ dError/dt < 0

Evolution Dashboard

Decision ThroughputGate Trigger RateHuman Override RatioRisk ContainmentImprovement Velocity

Chat + Universe =

Generative AI → Judgment OS

Chat improves output. Universe improves structure. Together they become a Judgment OS.

CONCRETE UNIVERSE IMPACT

Visualizing the Impact of Recursive Self-Improvement

In the Universe layer, artifact audits and policy redesign run continuously, raising quality and governance together. Changes are logged and reflected in the structure of the next cycle.

Governance Control Room Image

Detect

Audit engine flags policy and structure gaps.

Refactor

Rules and responsibility gates are rewritten.

Validate

Replay tests verify regression and compliance.

Deploy

Approved policy set is promoted to runtime.

SpecArtifactAudit ReportPolicy v(t+1)

Structural Meaning

1. Update decision rules themselves, not just output fixes.

2. Record errors and reflect them in the next design, staffing, and gate decisions.

3. Keep a recursive cycle that optimizes quality and governance together.

Observed Telemetry (Universe Layer)

Evidence Archive Completion

+100pt

0%100%

Delivery Gate Approved Rate

+100pt

0%100%

Pending Approval Task Rate

-14.3pt

14.3%0%

Tasks Completed per Job

+50.0%

4/76/7

Source: artifacts/run-to-done/job_*.json (26 jobs, 2026-02-13 to 2026-02-14 UTC). Before=waiting_approval cohort (n=13), After=completed cohort (n=13).

Agent Teams Deployment Example

Design Agent

Coverage 92%

Audit Agent

Detection 88%

Ops Agent

Recovery 84%

MEMORY STRATIFICATION

Estimate Interest from the Pulse of Short-Term Memory. Save only what matters. Recall only when needed.

Memory Pipeline

1Input Stream → Short-Term Memory Buffer

2Keyword Pulse Detector (freq, recency, revisit, emotion)

3Interest Vector Builder → I update

4Long-Term Memory Router (Gate decision)

5Recall Planner → on-demand retrieval

6Response Composer → optimized output

Interest Score per Keyword

S(k) = w₁·freq + w₂·recency + w₃·revisit + w₄·emotion − w₅·noise

freq: occurrence count. recency: recency weighting. revisit: returned after absence. emotion: co-occurrence with emphasis. noise: transient suppression.

Interest Vector Update

I_t+1 = normalize(I_t + η · S_t)

Interest vector accumulates weighted keyword scores and re-normalizes each cycle.

Gate Design

Save Gate

Only save if: high repetition, impacts decisions, or high reuse value. Raw logs prohibited — summaries only.

Recall Gate

Never always-on. Retrieve only when needed. Preserves natural conversation flow.

Memory Save Gate — YAML

gate_engine:
  name: "memory-save-gate"
  defaults:
    fail_closed: true
    store_mode: "summary_only"
    pii_policy: "block"
  rules:
    - id: "MS-01-block-pii"
      if: { signal: "contains_pii", value: true }
      then: { action: "deny" }
    - id: "MS-02-allow-stable-preference"
      if: { freq: ">= 3", revisit: ">= 1" }
      then:
        action: "allow"
        store: { format: "canonical_summary" }
    - id: "MS-05-require-user-consent"
      if: { sensitivity: "high", emotion: ">= 0.7" }
      then: { action: "ask_user" }
    - id: "MS-06-fallback-deny"
      then: { action: "deny", reason: "Fail-closed" }

Estimate interest from short-term pulses. Save only what matters. Recall only when needed.

DEEP REFLECTION LOOP

From Frequent Themes to Latent Intent. Detect value conflicts. Generate hypotheses. Verify with evidence.

Reflection Pipeline

1Interest Vector I → Theme Graph Builder

2Contradiction & Tension Finder

3Latent Hypothesis Generator (multiple H)

4Evidence Test Gate → discard ungrounded

5Reflection Question Synthesizer

6Update Policy Set → adjust dialogue strategy

Latent Hypothesis Format

H = { driver, fear, value, constraint, desired_future }

driver — motivational source

fear — outcome to avoid

value — judgment criterion

constraint — real-world limitation

desired_future — aspired outcome

Hypothesis Scoring

Score(H) = a·support + b·predictability + c·stability − d·intrusiveness

support: evidence from conversation. predictability: ability to predict next utterance. stability: robustness over time. intrusiveness: risk of overreach.

Safety Design

Privacy Gate

User can halt unwanted deep dives. Depth levels are staged.

Explainability Gate

If the reasoning for a hypothesis cannot be summarized, it is not surfaced.

Dialogue Rule

Never assert directly. Present as hypothesis, verify with confirmation question.

Hypothesis: "Safety over achievement seems prioritized"
Question: "Is it the outcome or the people involved that you want to protect?"

Detect tensions, generate hypotheses from evidence, verify through dialogue. Intent clarifies as conversation progresses.

TRIGGER RULES

Auto-Trigger Deep Dives from Short-Term Pulses.

12 observation signals, 3 invasiveness levels. Hypotheses are verified by evidence. Conversation naturalness is never broken.

ID	Signal	Condition	Lvl	Action
TD-01	Frequency	Same keyword 3+ in N turns	L1	Summarize + verbalize interest
TD-02	Revisit	Topic returns after absence	L2	Present 2 hypotheses + confirm
TD-03	Spike	2x occurrence rate increase	L2	Propose deep-dive candidate
TD-04	Co-occur cluster	Keyword cluster repeats	L2	Name theme + structured Q
TD-05	Emphasis	Assertive/emphatic co-occur	L2	Check value or fear
TD-06	Emotion shift	Polarity change or high amp	L3	Safety check + pace adjust
TD-07	Open question	Prior Q remains unresolved	L3	Surface + propose order
TD-08	Value conflict	Says A, chooses B	L3	Conflict hypothesis + priority Q
TD-09	Fixed term	Proper noun persists	L1	Fix definition + glossary
TD-10	Avoidance	Repeated topic evasion	L2	Peripheral exploration
TD-11	Decision proximity	'decide','next' increase	L2	Decision frame + options
TD-12	High reuse value	Procedure/criteria talk	L1	Propose template + save

L1 — Low Invasiveness

Summarize, verbalize interest, offer choices. No flow disruption.

L2 — Mid Invasiveness

Present multiple hypotheses, verify with questions. Name the theme.

L3 — High Invasiveness

Value conflicts, fears, constraints, desired futures. Consent required.

Auto-trigger deep dives from short-term pulses. Hypotheses verified by evidence, never breaking conversational flow.

ADAPTIVE RESPONSE ENGINE

Deep Dives Are Not One-Size-Fits-All. Adapt depth and expression to the user.

User Model — 4 Axes (KICS)

U = [K, I, C, S]

K — Knowledge Levelclarification_rate, correct_usage

I — Interest Intensitykeyword_freq, revisit_rate

C — Cognitive Resiliencemulti_step_acceptance, abstraction

S — Communication Styleverbosity, directness, tone

User Model Update

U_t+1 = (1 − ρ) U_t + ρ · f(x_t)

ρ: adaptation rate (0.1–0.25). x_t: observation vector from latest conversation turn. Never fixed — continuously updated.

Deep Dive Intensity

d = σ(a₁I + a₂·tension + a₃·decision_prox − a₄·intrusiveness)

Sigmoid-bounded intensity. High interest + tension = deeper dive. High intrusiveness risk = suppression. Mapped to L1/L2/L3.

Template Selection — Optimal Utility

t* = argmax_t E[w_A·Align + w_L·Learn + w_S·Safety − w_F·Friction]

Alignment: matches interest vector. Learning: clarifies intent. Safety: within invasiveness bounds. Friction: pushiness penalty.

Template Selector — YAML

deep_dive_engine:
  name: "adaptive-deep-dive"
  defaults:
    fail_closed: true
    require_consent_level: 3
  math:
    deep_dive_intensity:
      formula: "sigmoid(a1*I + a2*tension
        + a3*decision_proximity
        - a4*intrusiveness_risk)"
    readability_target:
      formula: "r0 + r1*(1-K) + r2*(1-C)"
  templates:
    - id: "T-A-simple"
      when: "K <= 0.45"
      parts: [mirror, summary, hypothesis, Q]
    - id: "T-B-structured"
      when: "C >= 0.55 and tension >= 0.45"
      parts: [mirror, structure, conflict, Q]
    - id: "T-C-sensitive"
      when: "emotion >= 0.70"
      parts: [safety_check, soft_summary, Q]

Adaptive Weight Update

w_t+1 = w_t + η(y_t − ŷ_t) · ∂Utility/∂w

y_t: observed user response quality. ŷ_t: expected response. η: learning rate. Weights converge to optimal template selection over sessions.

Deep dives are not one-size-fits-all. Depth and expression auto-optimize to the user. Safety gates control invasiveness.

INTEGRATION FLOW

End-to-End Pipeline. From short-term pulse to adaptive deep dive, closed-loop.

10-Step Execution Flow

01Window Build — last 40 utterances

02Keyword Pulse — freq, revisit, co-occur, emotion, avoidance

03Interest Vector Update — I_t+1

04User Model Update — K, I, C, S

05Deep Dive Intensity — d → L1/L2/L3

06Template Utility Evaluate — argmax

07Response Plan Compose — parts + constraints

08Response Generate — from selected template

09Outcome Observe — y_t (response quality, continuation rate)

10Weight Update — w_t+1 (closed-loop learning)

w_t+1 = w_t + η(y_t − ŷ_t) ∂U/∂w

Conversations compound. Each session makes the next more precise.

Recall Gate — YAML

gate_engine:
  name: "memory-recall-gate"
  defaults:
    fail_closed: true
    recall_mode: "on_demand"
    max_recall_items: 3
    min_relevance: 0.62
    pii_policy: "block"
  triggers:
    - id: "RG-01-explicit-request"
      if: { signal: "explicit_memory_request" }
      then: { action: "recall", max_items: 3 }
    - id: "RG-02-project-continuation"
      if: { project_continuation: true }
      then: { action: "recall", mode: "project_card" }
    - id: "RG-04-preference-needed"
      if: { preference_needed: true }
      then: { action: "recall", privacy: "strict" }
    - id: "RG-06-smalltalk"
      then: { action: "deny" }
  math:
    recall_necessity:
      formula: "sigmoid(b1*explicit + b2*coref
        + b3*missing - b7*intrusiveness)"

Operational Principles

Small talk never triggers recall — preserves naturalness

High intrusiveness without explicit request = deny

Save Gate and Recall Gate are separate systems

Reason for recall must be explainable — or it is not used

Short-term pulse → user model → template select → learning update. End-to-end, closed-loop. Precision improves with every conversation.