Recursive Intelligence
How MARIA OS implements recursive self-improvement — from chat-driven learning to deep reflection loops.
Chat Does Not Generate. Chat Improves.
Output is audited, results flow back into internal state, and the next judgment structure changes. This is structural recursion.
Quality Loop Pipeline
Quality State Update
α: improvement learning rate. β: drift suppression coefficient. Quality converges toward Target while suppressing deviation.
Internal State Machine
Convergence Conditions
Dashboard Metrics
Humans evaluate by feeling. MARIA OS evaluates by structure.
What Changes for Users in Practice
Move from one-shot chat responses to recursive chat with built-in evidence audit and re-evaluation. Not only the output, but the next decision structure itself gets updated.
Before: Single-Pass Chat
[Output]
Draft answer emitted without contradiction audit.
[Evidence]
Sparse citations, weak traceability, manual rework needed.
After: Recursive Audited Chat
[Loop]
Output → audit → delta check → rewrite → re-verify.
[State Update]
Failed patterns are persisted and weighted in next judgment.
Agent Teams Runtime Image
Planner
Builds hypothesis tree and action structure.
Critic
Finds contradiction, drift, and unsupported claims.
Verifier
Checks evidence alignment and gate compliance.
Observed Telemetry (Chat Layer)
Completed Task Ratio / Job
+28.6ptAction Trace Density / Job
+27.2%Persisted Artifacts / Job
+50.0%Delivery Gate Blocked Rate
-100ptSource: artifacts/run-to-done/job_*.json (26 jobs, 2026-02-13 to 2026-02-14 UTC). Before=waiting_approval cohort (n=13), After=completed cohort (n=13).
Universe Does Not Just Expand. It Evolves While Stabilizing.
Artifacts are audited, structural gaps detected, policies refactored, and re-deployed. Quality and governance improve with every cycle.
Production Loop Pipeline
Artifact Quality Update
γ: governance strength. λ: learning efficiency. Quality improves when compliance exceeds risk, accelerated by insight.
Stability Conditions
Evolution Dashboard
Chat + Universe =
Generative AI → Judgment OS
Chat improves output. Universe improves structure. Together they become a Judgment OS.
Visualizing the Impact of Recursive Self-Improvement
In the Universe layer, artifact audits and policy redesign run continuously, raising quality and governance together. Changes are logged and reflected in the structure of the next cycle.
Governance Control Room Image
Detect
Audit engine flags policy and structure gaps.
Refactor
Rules and responsibility gates are rewritten.
Validate
Replay tests verify regression and compliance.
Deploy
Approved policy set is promoted to runtime.
Structural Meaning
1. Update decision rules themselves, not just output fixes.
2. Record errors and reflect them in the next design, staffing, and gate decisions.
3. Keep a recursive cycle that optimizes quality and governance together.
Observed Telemetry (Universe Layer)
Evidence Archive Completion
+100ptDelivery Gate Approved Rate
+100ptPending Approval Task Rate
-14.3ptTasks Completed per Job
+50.0%Source: artifacts/run-to-done/job_*.json (26 jobs, 2026-02-13 to 2026-02-14 UTC). Before=waiting_approval cohort (n=13), After=completed cohort (n=13).
Agent Teams Deployment Example
Design Agent
Coverage 92%
Audit Agent
Detection 88%
Ops Agent
Recovery 84%
Estimate Interest from the Pulse of Short-Term Memory. Save only what matters. Recall only when needed.
Memory Pipeline
Interest Score per Keyword
freq: occurrence count. recency: recency weighting. revisit: returned after absence. emotion: co-occurrence with emphasis. noise: transient suppression.
Interest Vector Update
Interest vector accumulates weighted keyword scores and re-normalizes each cycle.
Gate Design
Save Gate
Only save if: high repetition, impacts decisions, or high reuse value. Raw logs prohibited — summaries only.
Recall Gate
Never always-on. Retrieve only when needed. Preserves natural conversation flow.
Memory Save Gate — YAML
gate_engine:
name: "memory-save-gate"
defaults:
fail_closed: true
store_mode: "summary_only"
pii_policy: "block"
rules:
- id: "MS-01-block-pii"
if: { signal: "contains_pii", value: true }
then: { action: "deny" }
- id: "MS-02-allow-stable-preference"
if: { freq: ">= 3", revisit: ">= 1" }
then:
action: "allow"
store: { format: "canonical_summary" }
- id: "MS-05-require-user-consent"
if: { sensitivity: "high", emotion: ">= 0.7" }
then: { action: "ask_user" }
- id: "MS-06-fallback-deny"
then: { action: "deny", reason: "Fail-closed" }Estimate interest from short-term pulses. Save only what matters. Recall only when needed.
From Frequent Themes to Latent Intent. Detect value conflicts. Generate hypotheses. Verify with evidence.
Reflection Pipeline
Latent Hypothesis Format
Hypothesis Scoring
support: evidence from conversation. predictability: ability to predict next utterance. stability: robustness over time. intrusiveness: risk of overreach.
Safety Design
Privacy Gate
User can halt unwanted deep dives. Depth levels are staged.
Explainability Gate
If the reasoning for a hypothesis cannot be summarized, it is not surfaced.
Dialogue Rule
Never assert directly. Present as hypothesis, verify with confirmation question.
Question: "Is it the outcome or the people involved that you want to protect?"
Detect tensions, generate hypotheses from evidence, verify through dialogue. Intent clarifies as conversation progresses.
Auto-Trigger Deep Dives from Short-Term Pulses.
12 observation signals, 3 invasiveness levels. Hypotheses are verified by evidence. Conversation naturalness is never broken.
| ID | Signal | Condition | Lvl | Action |
|---|---|---|---|---|
| TD-01 | Frequency | Same keyword 3+ in N turns | L1 | Summarize + verbalize interest |
| TD-02 | Revisit | Topic returns after absence | L2 | Present 2 hypotheses + confirm |
| TD-03 | Spike | 2x occurrence rate increase | L2 | Propose deep-dive candidate |
| TD-04 | Co-occur cluster | Keyword cluster repeats | L2 | Name theme + structured Q |
| TD-05 | Emphasis | Assertive/emphatic co-occur | L2 | Check value or fear |
| TD-06 | Emotion shift | Polarity change or high amp | L3 | Safety check + pace adjust |
| TD-07 | Open question | Prior Q remains unresolved | L3 | Surface + propose order |
| TD-08 | Value conflict | Says A, chooses B | L3 | Conflict hypothesis + priority Q |
| TD-09 | Fixed term | Proper noun persists | L1 | Fix definition + glossary |
| TD-10 | Avoidance | Repeated topic evasion | L2 | Peripheral exploration |
| TD-11 | Decision proximity | 'decide','next' increase | L2 | Decision frame + options |
| TD-12 | High reuse value | Procedure/criteria talk | L1 | Propose template + save |
L1 — Low Invasiveness
Summarize, verbalize interest, offer choices. No flow disruption.
L2 — Mid Invasiveness
Present multiple hypotheses, verify with questions. Name the theme.
L3 — High Invasiveness
Value conflicts, fears, constraints, desired futures. Consent required.
Auto-trigger deep dives from short-term pulses. Hypotheses verified by evidence, never breaking conversational flow.
Deep Dives Are Not One-Size-Fits-All. Adapt depth and expression to the user.
User Model — 4 Axes (KICS)
User Model Update
ρ: adaptation rate (0.1–0.25). x_t: observation vector from latest conversation turn. Never fixed — continuously updated.
Deep Dive Intensity
Sigmoid-bounded intensity. High interest + tension = deeper dive. High intrusiveness risk = suppression. Mapped to L1/L2/L3.
Template Selection — Optimal Utility
Alignment: matches interest vector. Learning: clarifies intent. Safety: within invasiveness bounds. Friction: pushiness penalty.
Template Selector — YAML
deep_dive_engine:
name: "adaptive-deep-dive"
defaults:
fail_closed: true
require_consent_level: 3
math:
deep_dive_intensity:
formula: "sigmoid(a1*I + a2*tension
+ a3*decision_proximity
- a4*intrusiveness_risk)"
readability_target:
formula: "r0 + r1*(1-K) + r2*(1-C)"
templates:
- id: "T-A-simple"
when: "K <= 0.45"
parts: [mirror, summary, hypothesis, Q]
- id: "T-B-structured"
when: "C >= 0.55 and tension >= 0.45"
parts: [mirror, structure, conflict, Q]
- id: "T-C-sensitive"
when: "emotion >= 0.70"
parts: [safety_check, soft_summary, Q]Adaptive Weight Update
y_t: observed user response quality. ŷ_t: expected response. η: learning rate. Weights converge to optimal template selection over sessions.
Deep dives are not one-size-fits-all. Depth and expression auto-optimize to the user. Safety gates control invasiveness.
End-to-End Pipeline. From short-term pulse to adaptive deep dive, closed-loop.
10-Step Execution Flow
Conversations compound. Each session makes the next more precise.
Recall Gate — YAML
gate_engine:
name: "memory-recall-gate"
defaults:
fail_closed: true
recall_mode: "on_demand"
max_recall_items: 3
min_relevance: 0.62
pii_policy: "block"
triggers:
- id: "RG-01-explicit-request"
if: { signal: "explicit_memory_request" }
then: { action: "recall", max_items: 3 }
- id: "RG-02-project-continuation"
if: { project_continuation: true }
then: { action: "recall", mode: "project_card" }
- id: "RG-04-preference-needed"
if: { preference_needed: true }
then: { action: "recall", privacy: "strict" }
- id: "RG-06-smalltalk"
then: { action: "deny" }
math:
recall_necessity:
formula: "sigmoid(b1*explicit + b2*coref
+ b3*missing - b7*intrusiveness)"Operational Principles
Short-term pulse → user model → template select → learning update. End-to-end, closed-loop. Precision improves with every conversation.