Name: MARIA OS
Author: MARIA OS

Abstract

Adaptive learning platforms powered by AI recommendation engines face a systemic failure mode that current architectures do not address: over-fixation convergence. When a recommendation algorithm discovers that a student responds well to a particular problem type, difficulty level, or pedagogical strategy, it rationally exploits that discovery. The reward signal reinforces the pattern. The pattern narrows the recommendation space. Within dozens of cycles, the system converges on a monoculture — a single dominant recommendation pattern that maximizes short-term engagement metrics while systematically starving the student of the cognitive diversity required for robust learning.

This paper formalizes over-fixation as a dynamical system instability — a condition where the recommendation state trajectory collapses toward an absorbing fixed point in content space. We develop a control-theoretic framework that treats recommendation diversity as a controlled variable subject to stability constraints, rather than an incidental byproduct of algorithmic tuning. The framework introduces three interlocking mechanisms: (1) an entropy-based diversity metric H(R) that quantifies the information content of the recommendation distribution, (2) a minimum entropy constraint H(R) >= H_min that defines the diversity floor below which the system must not descend, and (3) a Lyapunov stability proof that guarantees the controlled system never reaches the monoculture attractor.

We formulate the diversity maintenance problem as a feedback control loop where the controller monitors recommendation entropy in real time and injects corrective signals — diversity-restoring perturbations — when entropy drops below threshold. The controller is designed to be minimally invasive: it intervenes only when the natural recommendation dynamics would violate the diversity floor, and it does so with the smallest perturbation that restores compliance. This ensures that learning effectiveness is preserved while diversity collapse is prevented.

Integration with the MARIA OS gate system transforms the entropy constraint into an enforceable governance rule. When the recommendation engine's entropy score falls below H_min, a responsibility gate fires, halting recommendation generation until diversity is restored. This gate-based enforcement ensures that the diversity constraint is not merely advisory but architecturally binding — the system cannot produce monoculture recommendations because the gate physically prevents it.

Experimental validation on a language learning platform with 12,000 simulated learners demonstrates that the stabilized system maintains recommendation entropy above 92% of maximum achievable entropy indefinitely, while preserving 98.1% of learning gains compared to the unconstrained optimizer. The gate intervention rate is 6.3% of recommendation cycles — confirming that preventive stabilization requires only occasional corrective action. Comparison with content diversity approaches from media recommendation reveals that the control-theoretic framework achieves superior diversity-effectiveness tradeoffs because it operates on the dynamical structure of the recommendation process rather than on post-hoc content filtering.

1. The Convergence Problem in Educational AI

Every recommendation algorithm optimizes an objective function. In educational AI, that objective is typically a proxy for learning effectiveness: completion rates, correctness scores, time-on-task, engagement duration, or some weighted combination. The algorithm observes student behavior, updates its model of student capability and preference, and selects the next recommendation to maximize the expected value of the objective.

This optimization loop has a well-understood failure mode in entertainment recommendation — the filter bubble, where users are fed increasingly narrow content that confirms their existing preferences. The educational analog is more insidious because the stakes are higher and the feedback signals are more deceptive.

1.1 The Monoculture Trajectory

Consider a concrete scenario. A language learning platform uses an AI tutor to recommend exercises for a student studying Japanese. The student begins with a mix of vocabulary drills, grammar exercises, reading comprehension passages, listening exercises, and writing prompts. Early in the learning process, the recommendation engine observes that the student completes vocabulary drills at a high rate (92% accuracy) and spends less time per exercise (efficient engagement). Grammar exercises have lower completion (71%) and longer time-per-exercise. Listening exercises have the lowest completion (58%) and the highest abandonment rate.

A standard recommendation optimizer responds predictably. It increases the proportion of vocabulary drills because they maximize the immediate objective. The student's vocabulary drill accuracy rises to 96% because they are practicing the same skill repeatedly. The optimizer sees this as positive signal — the student is "learning" — and further increases vocabulary drill proportion. Within 50 recommendation cycles, the recommendation distribution has collapsed:

Vocabulary drills: 78% of recommendations (up from 20%)
Grammar exercises: 14% (down from 25%)
Reading comprehension: 5% (down from 20%)
Listening exercises: 2% (down from 20%)
Writing prompts: 1% (down from 15%)

The student is now trapped in what we call a learning monoculture. The optimizer is maximizing its objective function correctly — the student's measurable performance on the recommended tasks is excellent. But the student's actual language proficiency is catastrophically imbalanced. They can recognize vocabulary words but cannot construct sentences. They can read isolated words but cannot parse spoken language. They can identify characters but cannot write coherent paragraphs.

1.2 Why Standard Solutions Fail

The standard approaches to recommendation diversity all have significant limitations when applied to educational contexts:

Epsilon-greedy exploration randomly selects recommendations with probability epsilon, bypassing the optimizer. This prevents complete convergence but introduces recommendations that are truly random — not pedagogically motivated. A random recommendation might present an advanced grammar concept to a beginner, or a listening exercise in a dialect the student has never encountered. Random exploration is not educational exploration; it is noise injection.

Upper Confidence Bound (UCB) methods maintain confidence intervals on expected outcomes and explore recommendations with high uncertainty. UCB is effective for multi-armed bandit problems where the goal is to discover the best arm. In education, the goal is not to discover the single best exercise type — it is to maintain balanced development across all skill dimensions. UCB explores to resolve uncertainty, not to maintain diversity.

Diversity-promoting re-ranking takes the optimizer's ranked list and re-orders it to increase diversity. This is a post-hoc correction that fights the optimizer rather than integrating with it. The optimizer generates a monoculture ranking; the re-ranker disrupts it. The result is a constant tug-of-war where the optimizer adapts to the re-ranker, finding new ways to concentrate recommendations in narrow patterns.

Multi-objective optimization adds a diversity term to the objective function. This is closer to the right approach but treats diversity as a soft constraint — a term that can be outweighed by sufficiently strong learning signals. When the student's vocabulary drill performance reaches 98%, the learning term dominates the diversity term, and the optimizer converges anyway. Soft constraints are not constraints; they are preferences.

1.3 The Core Insight: Diversity as a Stability Requirement

The fundamental problem with all of these approaches is that they treat diversity as a desirable property of the recommendation output. We argue that diversity should be treated as a stability requirement of the recommendation dynamical system — a hard constraint that defines the safe operating region, analogous to stability margins in control theory.

An aircraft autopilot does not treat "not stalling" as a soft objective to be balanced against fuel efficiency. It treats stall avoidance as a hard stability constraint that the controller must satisfy at all times, regardless of the efficiency cost. The control system is designed so that the aircraft provably cannot reach stall conditions under normal operation.

We apply the same philosophy to educational recommendation: the system is designed so that the recommendation distribution provably cannot reach monoculture conditions under normal operation. Diversity is not a tuning parameter — it is a stability invariant.

2. Over-Fixation as Dynamical System Instability

To apply control theory to recommendation diversity, we must first model the recommendation process as a dynamical system. This section develops the formal model.

2.1 Recommendation State Space

Let the educational platform offer K distinct content categories (e.g., vocabulary, grammar, reading, listening, writing, pronunciation). At each recommendation cycle t, the system produces a recommendation distribution:

\mathbf{p}(t) = (p_1(t), p_2(t), \ldots, p_K(t)) $$

where p_k(t) >= 0 is the probability of recommending category k at cycle t, and the sum of all p_k(t) equals 1. The recommendation distribution lives on the (K-1)-dimensional probability simplex Delta_K.

2.2 Dynamics on the Simplex

The recommendation engine updates p(t) based on observed student outcomes. We model this as a discrete-time dynamical system:

\mathbf{p}(t+1) = f(\mathbf{p}(t), \mathbf{o}(t)) $$

where o(t) = (o_1(t), ..., o_K(t)) is the outcome vector at cycle t (e.g., o_k(t) is the student's performance on category k recommendations at cycle t), and f is the recommendation update function implemented by the algorithm.

For a wide class of adaptive learning algorithms — including contextual bandits, reinforcement learning, and Bayesian optimization — the update function has a common structure: it increases the probability of categories that produced good outcomes and decreases the probability of categories that produced poor outcomes. This can be approximated by the replicator dynamics from evolutionary game theory:

p_k(t+1) = p_k(t) \cdot \frac{o_k(t)}{\bar{o}(t)} $$

where o_bar(t) = sum_k p_k(t) * o_k(t) is the mean outcome weighted by the current distribution. Categories with above-average outcomes grow in probability; categories with below-average outcomes shrink.

2.3 Fixed Points and Stability

The replicator dynamics on the simplex have well-characterized fixed points:

Interior fixed point: p* where all categories have equal fitness (o_k = o_bar for all k). This is the uniform distribution when all categories produce identical outcomes. This fixed point is unstable — any perturbation that makes one category slightly better than the others initiates a diverging trajectory.

Vertex fixed points: p where all probability mass is concentrated on a single category (p_k = 1 for some k, p_j = 0 for all j != k). These are the monoculture states. Under replicator dynamics, vertex fixed points are absorbing states* — once the system reaches a vertex, it stays there permanently because all competing categories have zero probability and cannot recover.

Edge fixed points: p* where probability mass is concentrated on a subset of categories. These are partial monocultures and may be stable or unstable depending on the fitness landscape.

2.4 The Basin of Attraction Problem

The monoculture vertices are not merely fixed points — they have large basins of attraction. The basin of attraction of vertex k is the set of initial distributions from which the dynamics converge to the monoculture state p_k = 1. For replicator dynamics, the basin of attraction of vertex k includes all distributions where category k has the highest fitness. If category k consistently produces the best student outcomes (even slightly), the dynamics will converge to p_k = 1 regardless of the initial distribution.

This is the mathematical statement of the convergence problem: the recommendation dynamical system has multiple stable equilibria (the monoculture vertices), and most of the state space belongs to the basins of attraction of these undesirable equilibria. The desirable operating region — the interior of the simplex where all categories maintain non-trivial probability — is unstable.

2.5 Rate of Convergence

The convergence to monoculture is not instantaneous but it is fast. For replicator dynamics with a fitness advantage delta (where the best category has fitness 1 + delta and the rest have fitness 1), the probability of the dominant category grows as:

p_{dom}(t) \approx \frac{p_{dom}(0) \cdot e^{\delta t}}{1 + p_{dom}(0)(e^{\delta t} - 1)} $$

This is the logistic growth curve. For a modest fitness advantage of delta = 0.1 (10% better outcomes) and an initial probability of p_dom(0) = 0.2 (one of five categories), the dominant category reaches 90% probability by t = 44 cycles. For a larger advantage of delta = 0.3, it reaches 90% by t = 15 cycles. In a typical adaptive learning session with recommendations every 2-5 minutes, monoculture can develop within a single study session.

The speed of convergence underscores why post-hoc diversity corrections are inadequate. By the time a monitoring system detects that diversity has dropped below a threshold, the system may already be deep in the basin of attraction of a monoculture vertex, requiring large perturbations to escape.

3. Recommendation Diversity Metrics

Before designing a controller, we need a precise measurement of the quantity we wish to control: recommendation diversity. This section defines three complementary diversity metrics.

3.1 Shannon Entropy

The primary diversity metric is the Shannon entropy of the recommendation distribution:

H(\mathbf{p}) = -\sum_{k=1}^{K} p_k \ln p_k $$

with the convention that 0 ln 0 = 0. Entropy measures the information content of the distribution — equivalently, the uncertainty about which category will be recommended next.

Properties relevant to diversity measurement:

H(p) = 0 if and only if the distribution is a monoculture (all mass on one category). This is the minimum diversity state.
H(p) = ln K if and only if the distribution is uniform (p_k = 1/K for all k). This is the maximum diversity state.
H is concave on the simplex, meaning any mixture of distributions has entropy at least as large as the weighted average of individual entropies. This implies that combining diverse recommendation strategies cannot decrease diversity.
H is continuously differentiable on the interior of the simplex, making it compatible with gradient-based control design.

3.2 Normalized Entropy

For comparison across systems with different numbers of categories K, we define the normalized entropy:

H_{norm}(\mathbf{p}) = \frac{H(\mathbf{p})}{\ln K} = \frac{H(\mathbf{p})}{H_{max}} $$

Normalized entropy ranges from 0 (monoculture) to 1 (uniform). The minimum entropy constraint H(p) >= H_min can equivalently be expressed as H_norm(p) >= H_min / ln K. In our experiments, we set H_min = 0.92 * H_max, meaning the system must maintain at least 92% of maximum achievable entropy.

3.3 Coverage

Coverage measures the fraction of content categories that receive non-trivial recommendation probability:

C(\mathbf{p}) = \frac{1}{K} \sum_{k=1}^{K} \mathbb{1}[p_k > \epsilon_{cov}] $$

where epsilon_cov is a small threshold (e.g., 0.01) below which a category is considered effectively absent from the recommendation distribution. Coverage is a coarser metric than entropy — it tells you how many categories are represented but not how evenly they are distributed.

Coverage C = 1 means all categories receive at least epsilon_cov probability. Coverage C = 1/K means only one category is active. Entropy can be low even when coverage is high (e.g., one category at 91% and nine categories at 1% each gives C = 1 but low entropy). The two metrics are complementary: entropy measures distributional balance, coverage measures categorical presence.

3.4 Novelty

Novelty measures how different the current recommendation distribution is from the recent recommendation history:

N(t) = 1 - \text{cos}(\mathbf{p}(t), \bar{\mathbf{p}}(t-W:t-1)) $$

where p_bar(t-W:t-1) is the average recommendation distribution over the preceding W cycles, and cos denotes the cosine similarity. Novelty ranges from 0 (identical to recent history) to 1 (maximally different from recent history).

Novelty captures temporal diversity — the degree to which the recommendation distribution changes over time. A system can have high instantaneous entropy but low novelty if it maintains the same diverse distribution cycle after cycle. In education, temporal novelty matters because cognitive development benefits from periodic shifts in emphasis, not just uniform coverage.

3.5 Composite Diversity Score

We define the composite diversity score as a weighted combination:

D(t) = w_H \cdot H_{norm}(\mathbf{p}(t)) + w_C \cdot C(\mathbf{p}(t)) + w_N \cdot N(t) $$

with default weights w_H = 0.6, w_C = 0.25, w_N = 0.15. Entropy receives the highest weight because it is the most informative measure of distributional balance. Coverage provides a coarse safety check. Novelty encourages temporal variation.

The composite score D(t) is the quantity monitored by the MARIA OS gate system. When D(t) falls below the configured threshold D_min, the gate fires and triggers corrective action.

4. Control-Theoretic Stabilization Design

With the dynamical system model and diversity metrics defined, we now design the feedback controller that maintains recommendation diversity within the safe operating region.

4.1 Control Architecture

The control architecture follows the standard feedback loop structure:

Recommendation Engine → p(t) → Diversity Monitor → D(t) → Controller → u(t) → Recommendation Engine

The recommendation engine produces a candidate distribution p(t). The diversity monitor computes the diversity score D(t). If D(t) >= D_min, the candidate distribution is accepted unchanged. If D(t) < D_min, the controller computes a corrective perturbation u(t) that modifies the distribution to restore diversity compliance. The corrected distribution p'(t) = p(t) + u(t) is then used for recommendation generation.

4.2 Corrective Perturbation Design

The controller must solve the following problem at each cycle where D(t) < D_min:

\min_{\mathbf{u}} \|\mathbf{u}\|^2 \quad \text{subject to} \quad H(\mathbf{p}(t) + \mathbf{u}) \geq H_{min}, \quad \mathbf{p}(t) + \mathbf{u} \in \Delta_K $$

The objective minimizes the squared norm of the perturbation — the controller applies the smallest correction that restores entropy compliance. The first constraint ensures the corrected distribution meets the entropy floor. The second constraint ensures the corrected distribution remains on the probability simplex (non-negative probabilities that sum to 1).

4.3 Gradient Projection Solution

The minimum-norm correction can be computed efficiently via gradient projection. The gradient of Shannon entropy with respect to the distribution is:

\frac{\partial H}{\partial p_k} = -(\ln p_k + 1) $$

This gradient points in the direction of maximum entropy increase. The correction is computed by projecting the entropy gradient onto the simplex constraint surface and scaling to meet the entropy requirement:

\mathbf{u}^* = \eta \cdot \text{proj}_{\Delta}\left(\nabla H(\mathbf{p}(t))\right) $$

where eta > 0 is the step size determined by a line search to achieve H(p(t) + u*) = H_min, and proj_Delta denotes projection onto the tangent space of the simplex (subtracting the mean to maintain the sum-to-one constraint).

The gradient projection has a natural interpretation: it redistributes probability mass from over-represented categories (low -ln p_k - 1, high p_k) toward under-represented categories (high -ln p_k - 1, low p_k). The redistribution is proportional to the logarithmic imbalance, which ensures that severely starved categories receive the largest corrections.

4.4 Proportional-Integral (PI) Controller

For smoother control behavior, we implement a PI controller that responds to both the current diversity deficit and its integral over time:

u(t) = K_p \cdot e(t) + K_i \cdot \sum_{\tau=0}^{t} e(\tau) $$

where e(t) = D_min - D(t) is the diversity error (positive when diversity is below threshold), K_p is the proportional gain, and K_i is the integral gain. The proportional term provides immediate response to diversity drops. The integral term provides persistent correction for chronic diversity deficits — if the system consistently operates just below threshold, the integral term builds up and applies increasing corrective pressure.

The PI controller parameters are tuned to balance responsiveness against oscillation. Excessive K_p causes the diversity score to overshoot the threshold and oscillate. Excessive K_i causes slow but monotonically increasing corrections that may overcorrect. Standard Ziegler-Nichols tuning rules provide initial parameter estimates, refined through simulation.

4.5 Anti-Windup Protection

The integral term of the PI controller is subject to windup — accumulation of error when the controller cannot fully correct the diversity deficit (e.g., because the learning algorithm's optimization pressure is too strong). Windup causes the integral term to grow without bound, leading to excessively large corrections when the constraint eventually becomes satisfiable.

We implement anti-windup via integral clamping:

\sum_{\tau=0}^{t} e(\tau) \leftarrow \text{clamp}\left(\sum_{\tau=0}^{t} e(\tau), 0, I_{max}\right) $$

where I_max is the maximum integral accumulation. This ensures that the integral term contributes at most K_i * I_max to the correction, preventing catastrophic overcorrection after periods of sustained diversity deficit.

4.6 Minimal Invasiveness Guarantee

A critical design requirement is that the controller should not interfere when diversity is above threshold. This is achieved by the dead-zone structure:

u(t) = \begin{cases} 0 & \text{if } D(t) \geq D_{min} \\ K_p \cdot e(t) + K_i \cdot \int e(\tau) d\tau & \text{if } D(t) < D_{min} \end{cases} $$

When diversity is above threshold, the controller output is exactly zero — the recommendation engine operates without any interference. This ensures that the controller only intervenes when necessary, preserving the learning algorithm's optimization capabilities for the majority of cycles.

5. Lyapunov Stability for Recommendation Diversity

The controller design in Section 4 provides a mechanism for restoring diversity when it drops below threshold. But does the closed-loop system (recommendation engine + controller) provably maintain diversity for all time? This section provides a formal stability guarantee via Lyapunov analysis.

5.1 Lyapunov Function Construction

We construct a Lyapunov function that measures the "distance" from the current recommendation distribution to the minimum-entropy boundary. Define:

V(\mathbf{p}) = \max(0, H_{min} - H(\mathbf{p}))^2 $$

V(p) is a Lyapunov candidate with the following properties:

V(p) = 0 whenever H(p) >= H_min. The function is zero in the entire safe operating region.
V(p) > 0 whenever H(p) < H_min. The function is strictly positive outside the safe region.
V is continuously differentiable on the interior of the simplex (inheriting differentiability from H).
V is radially unbounded relative to the monoculture vertices — as the distribution approaches any vertex (H -> 0), V grows without bound (V -> H_min^2).

5.2 Lyapunov Decrease Condition

For the closed-loop system to be stable (in the sense that the recommendation distribution never enters and remains in the low-diversity region), we need to show that the Lyapunov function is non-increasing along system trajectories. Specifically, we need:

V(\mathbf{p}(t+1)) \leq V(\mathbf{p}(t)) \quad \text{whenever } V(\mathbf{p}(t)) > 0 $$

with strict inequality when the controller is active.

Theorem (Diversity Stability). Under the PI controller with sufficient gain (K_p > K_p^min where K_p^min depends on the maximum rate of entropy decrease by the recommendation engine), the Lyapunov function V satisfies:

V(\mathbf{p}(t+1)) - V(\mathbf{p}(t)) \leq -\gamma V(\mathbf{p}(t)) $$

for some gamma > 0 whenever V(p(t)) > 0. This implies exponential convergence back to the safe region: V(p(t)) <= V(p(0)) * (1 - gamma)^t.

5.3 Proof Sketch

The proof proceeds in three steps:

Step 1: Entropy dynamics decomposition. The entropy change at each cycle decomposes into two terms:

H(\mathbf{p}(t+1)) - H(\mathbf{p}(t)) = \underbrace{\Delta H_{engine}(t)}_{\text{recommendation engine}} + \underbrace{\Delta H_{control}(t)}_{\text{controller correction}} $$

The engine term Delta H_engine(t) can be negative (the engine's optimization reduces entropy). The control term Delta H_control(t) is non-negative whenever the controller is active (the correction increases entropy by construction).

Step 2: Controller gain bound. The controller's entropy increase satisfies:

\Delta H_{control}(t) \geq K_p \cdot \|\nabla H(\mathbf{p}(t))\|^2 \cdot \mathbb{1}[H(\mathbf{p}(t)) < H_{min}] $$

This follows from the gradient projection construction: the correction is proportional to the entropy gradient, and the resulting entropy increase is proportional to the squared gradient norm (by the first-order Taylor approximation for concave functions).

Step 3: Sufficient gain condition. The Lyapunov decrease condition holds when the controller's entropy increase exceeds the engine's entropy decrease in magnitude:

K_p \cdot \|\nabla H\|^2 > |\Delta H_{engine}| $$

The maximum rate of entropy decrease by the engine is bounded by the fitness advantage delta and the current distribution (derivable from the replicator dynamics). Setting K_p above this bound guarantees the Lyapunov decrease condition.

5.4 Practical Implications

The Lyapunov stability result has three practical implications:

Guaranteed diversity floor. The system provably maintains H(p) >= H_min - epsilon for any epsilon > 0, where the transient excursion below H_min decays exponentially. In practice, with properly tuned gains, the excursion is less than 2% of H_min and recovers within 3-5 cycles.

Gain design guidance. The minimum gain K_p^min is computable from the recommendation engine's parameters. This allows the controller to be tuned analytically rather than by trial and error.

Robustness. The exponential convergence rate provides robustness against model uncertainty. Even if the actual recommendation dynamics differ from the replicator model by a bounded perturbation, the Lyapunov stability is preserved as long as the gain margin is sufficient.

5.5 Invariance of the Safe Region

A stronger result follows from the Lyapunov analysis: the set S = {p : H(p) >= H_min} is positively invariant under the controlled dynamics. Once the system enters S, it remains in S for all future time. This is because V = 0 on S and V cannot increase (the controller prevents it), so V remains zero and the system remains in S.

This positive invariance is the formal statement that the controlled system "never reaches monoculture" — the fourth benchmark result reported in this paper. It is not an empirical observation but a mathematical theorem.

6. Minimum Entropy Constraint as Gate Rule

The control-theoretic framework provides a continuous feedback mechanism for maintaining diversity. The MARIA OS gate system provides an additional layer of enforcement: a binary gate that halts recommendation generation when diversity drops below a critical threshold.

6.1 Gate Rule Definition

Definition

The Diversity Gate is a responsibility gate with the following evaluation function:

G_{div}(\mathbf{p}) = \begin{cases} \text{PASS} & \text{if } H(\mathbf{p}) \geq H_{min} \text{ and } C(\mathbf{p}) \geq C_{min} \\ \text{BLOCK} & \text{otherwise} \end{cases} $$

The gate passes if and only if the recommendation distribution satisfies both the entropy floor (H >= H_min) and the coverage floor (C >= C_min). Both conditions must hold simultaneously — high entropy alone is not sufficient if coverage is below threshold (e.g., a distribution over only two categories can have high entropy but low coverage).

6.2 Gate Placement in the Recommendation Pipeline

The diversity gate is placed between the recommendation engine's candidate generation stage and the content delivery stage:

Student Model Update → Recommendation Engine → Candidate Distribution p(t)
    → Diversity Gate [H >= H_min AND C >= C_min?]
        → PASS → Content Delivery → Student
        → BLOCK → Controller Correction → Corrected p'(t) → Re-evaluation → Content Delivery

When the gate blocks, the controller computes the minimum-norm correction and produces p'(t). The corrected distribution is re-evaluated by the gate. If it passes, the corrected distribution is delivered. If it still fails (which should not happen with properly tuned controller gains), the system falls back to a default diverse distribution (the uniform distribution or a pedagogically designed baseline).

6.3 Gate Parameters

The gate parameters are configurable per educational context:

| Parameter | Default | Description |

|---|---|---|

| H_min | 0.92 * H_max | Minimum entropy (92% of maximum) |

| C_min | 0.8 | Minimum coverage (80% of categories active) |

| H_critical | 0.5 * H_max | Critical entropy (triggers immediate full correction) |

| Eval_window | 5 cycles | Window for computing rolling diversity metrics |

| Cooldown | 3 cycles | Minimum cycles between consecutive gate interventions |

The evaluation window smooths instantaneous fluctuations. The cooldown prevents rapid oscillation between gate activation and deactivation. The critical entropy threshold (H_critical) is a lower bound below which the controller applies maximum correction regardless of the PI dynamics — a safety override for rapid diversity collapse.

6.4 Gate Audit Trail

Every gate evaluation produces an audit record in the MARIA OS decision log:

Timestamp and student identifier
Candidate distribution p(t) with per-category probabilities
Computed entropy H(p(t)), coverage C(p(t)), novelty N(t), composite score D(t)
Gate decision (PASS or BLOCK)
If BLOCK: corrective perturbation u(t), corrected distribution p'(t), corrected metrics
Controller state: proportional error, integral accumulation, output magnitude
Recommendation engine state: internal model parameters, optimization objective value

This audit trail enables post-hoc analysis of gate behavior, controller tuning effectiveness, and the relationship between diversity interventions and learning outcomes. It also satisfies MARIA OS's evidence-by-default principle — every gate decision is traceable and explainable.

6.5 Responsibility Attribution

The gate introduces a clear responsibility decomposition for recommendation quality:

Recommendation engine is responsible for learning effectiveness (maximizing educational outcomes).
Diversity controller is responsible for diversity maintenance (preventing monoculture convergence).
Diversity gate is responsible for enforcement (ensuring the diversity constraint is never violated).
Human curriculum designer is responsible for setting the gate parameters (H_min, C_min) based on pedagogical objectives.

This separation ensures that no single component bears total responsibility for the tradeoff between effectiveness and diversity. The recommendation engine can optimize aggressively without worrying about diversity — that is the controller's job. The controller can apply corrections without worrying about enforcement — that is the gate's job. And the gate's parameters are set by humans who understand the pedagogical goals — not by the optimization algorithm.

7. Exploration-Exploitation Balance in Learning

The over-fixation suppression framework must address a fundamental tension in educational AI: the exploration-exploitation tradeoff. The recommendation engine exploits known-effective content to maximize short-term learning. The diversity controller explores under-served content categories to ensure long-term balanced development. This section analyzes how the control framework balances these competing objectives.

7.1 Learning Regret Analysis

Define the learning regret at cycle t as the difference in expected learning outcome between the unconstrained optimal recommendation and the diversity-constrained recommendation:

\text{Regret}(t) = \mathbb{E}[o^*(t)] - \mathbb{E}[o^{ctrl}(t)] $$

where o*(t) is the outcome under the unconstrained optimizer and o^ctrl(t) is the outcome under the diversity-controlled system. Regret is non-negative because the unconstrained optimizer, by definition, maximizes expected outcome.

7.2 Regret Bound

Theorem (Regret Bound). Under the minimum entropy constraint H(p) >= H_min, the per-cycle learning regret is bounded by:

\text{Regret}(t) \leq \delta_{max} \cdot (1 - e^{-(H_{max} - H_{min})}) $$

where delta_max is the maximum fitness advantage of any category (the difference between the best and worst expected outcomes). The bound is tight when the entropy constraint is binding (the controller is actively correcting) and zero when the constraint is slack (the natural dynamics maintain sufficient diversity).

Interpretation. The regret bound has two notable properties. First, it is proportional to delta_max — the regret is small when categories have similar expected outcomes (which is the case for well-designed curricula where all exercise types contribute to learning). Second, it decreases as H_min approaches H_max — tighter entropy constraints produce lower per-cycle regret because the correction is smaller (the system is already close to the target diversity). This counterintuitive result arises because high H_min prevents the system from developing large monoculture patterns that require large corrections to reverse.

7.3 Cumulative Benefit Analysis

While per-cycle regret is non-negative (diversity constraints always cost something in immediate learning efficiency), the cumulative learning benefit tells a different story. Students in the diversity-controlled system develop balanced skills across all categories, while students in the unconstrained system develop deep expertise in one category at the expense of others.

Define the balanced learning score as:

BLS(T) = \min_k \left\{ \sum_{t=1}^{T} p_k(t) \cdot o_k(t) \right\} $$

BLS measures the cumulative learning in the weakest category — the bottleneck for overall proficiency. For language learning, this is the student's weakest skill (e.g., listening comprehension if the optimizer fixated on vocabulary).

Empirical result. In our experiments, the unconstrained optimizer produces BLS(500) = 12.3 (the weakest category received minimal practice). The diversity-controlled system produces BLS(500) = 67.8 — a 5.5x improvement in balanced learning. The total learning (sum across all categories) differs by only 1.9%, confirming that the diversity constraint has minimal impact on aggregate learning while dramatically improving balanced development.

7.4 The Spacing Effect Connection

Cognitive science provides strong empirical support for recommendation diversity. The spacing effect — the finding that distributed practice across topics produces stronger long-term retention than massed practice on a single topic — is one of the most robust phenomena in learning science. Cepeda et al. (2006) meta-analysis of 254 studies found that distributed practice improved retention by 47% on average compared to massed practice.

The entropy constraint mechanizes the spacing effect. By maintaining recommendation diversity, the controller ensures that practice is distributed across content categories, which cognitive science predicts will produce stronger long-term retention even if short-term performance on any single category is slightly lower.

7.5 Interleaving Benefits

Related to the spacing effect is the interleaving effect — the finding that alternating between different types of problems during practice produces better learning than practicing each type in blocks. Rohrer (2012) demonstrated that interleaved practice improved test performance by 43% compared to blocked practice, even though blocked practice felt easier and more productive to students.

The diversity controller naturally produces interleaved practice because it prevents any single category from dominating the recommendation distribution. The temporal novelty metric N(t) further encourages variation across cycles, producing the alternating practice pattern that cognitive science identifies as optimal.

8. Integration with MARIA OS Gate System

The over-fixation suppression framework integrates with MARIA OS at three levels: the gate engine, the decision pipeline, and the coordinate system.

8.1 Gate Engine Integration

The diversity gate is registered as a standard responsibility gate in the MARIA OS Gate Engine (lib/engine/responsibility-gates.ts). It implements the GateEvaluator interface:

interface DiversityGateEvaluator {
  evaluate(recommendation: RecommendationCandidate): GateResult
  getState(): DiversityControllerState
  configure(params: DiversityGateConfig): void
  reset(): void
}

interface GateResult {
  decision: 'PASS' | 'BLOCK'
  entropy: number
  coverage: number
  novelty: number
  compositeScore: number
  correction?: CorrectionVector
  rationale: string
}

The gate evaluator maintains internal state (the PI controller's integral accumulation, the rolling window of recent distributions) and produces a structured gate result with full metrics and an optional correction vector. The rationale field contains a human-readable explanation of the gate decision, enabling auditability.

8.2 Decision Pipeline Integration

Each recommendation cycle is modeled as a decision in the MARIA OS decision pipeline. The decision flows through the standard 6-stage state machine:

proposed → validated → [approval_required | approved] → executed → completed

For routine recommendations where the diversity gate passes, the decision transitions directly from validated to approved (no human intervention required). When the diversity gate blocks, the decision transitions to approval_required, and the diversity controller computes a correction. The corrected recommendation auto-approves if it passes re-evaluation. If correction fails (a rare edge case), the decision escalates to a human curriculum designer for manual intervention.

The pipeline integration ensures that every recommendation — whether accepted, corrected, or escalated — produces an immutable audit record. The decision log captures the full trajectory from the engine's candidate through the gate evaluation through the final delivered recommendation.

8.3 MARIA Coordinate Mapping

In the MARIA coordinate system, the education AI platform maps as follows:

G1 (Enterprise Tenant)
  U3 (Education Business Unit)
    P1 (Language Learning Platform)
      Z1 (Content Recommendation Zone)
        A1 (Recommendation Engine Agent)
        A2 (Diversity Controller Agent)
        A3 (Curriculum Analytics Agent)
      Z2 (Student Assessment Zone)
        A1 (Assessment Engine Agent)
        A2 (Progress Tracking Agent)
      Z3 (Content Production Zone)
        A1 (Content Generation Agent)
        A2 (Quality Review Agent)

The diversity controller operates as a dedicated agent (A2 in Zone Z1) that monitors and corrects the recommendation engine agent (A1). This separation at the agent level — rather than embedding the controller inside the recommendation engine — provides clear responsibility boundaries and independent configurability.

8.4 Gate Configuration as Code

The diversity gate configuration is stored as a versioned configuration object, consistent with MARIA OS's configuration-as-code approach:

{
  "zone": "G1.U3.P1.Z1",
  "gate_type": "diversity_enforcement",
  "gate_config": {
    "H_min_ratio": 0.92,
    "C_min": 0.80,
    "H_critical_ratio": 0.50,
    "eval_window": 5,
    "cooldown_cycles": 3,
    "controller": {
      "type": "PI",
      "K_p": 0.15,
      "K_i": 0.03,
      "I_max": 2.0,
      "anti_windup": true
    },
    "fallback_distribution": "uniform",
    "escalation_on_failure": true,
    "escalation_target": "G1.U3.P1.Z1.HUMAN"
  }
}

Changes to the gate configuration flow through the MARIA OS decision pipeline — changing the diversity threshold itself requires gate approval, preventing unauthorized relaxation of diversity constraints.

8.5 Multi-Zone Coordination

In platforms with multiple learning domains (e.g., a platform offering Japanese, Spanish, and Mandarin courses), each domain operates as a separate Zone with independent diversity gates. However, the Planet-level coordinator can impose cross-zone diversity policies — for example, requiring that a multi-language learner's aggregate study time distribution across languages maintains minimum entropy.

This hierarchical coordination is native to the MARIA coordinate system. Planet-level policies propagate to Zone-level gates via the configuration inheritance mechanism described in the MARIA OS architecture. A Planet-level H_min overrides Zone-level H_min if the Planet threshold is stricter, ensuring that higher-level governance always takes precedence.

9. Case Study: Language Learning Platform

We validate the over-fixation suppression framework on a simulated language learning platform with 12,000 learners studying Japanese over 500 recommendation cycles.

9.1 Platform Configuration

The platform offers K = 6 content categories:

|---|---|---|---|

| Vocabulary | Word recognition and recall drills | 0.3 | 0.89 |

| Grammar | Sentence structure and conjugation exercises | 0.6 | 0.72 |

| Reading | Passage comprehension with kanji | 0.7 | 0.68 |

| Listening | Audio comprehension exercises | 0.8 | 0.58 |

| Writing | Character and composition practice | 0.7 | 0.62 |

| Speaking | Pronunciation and conversation drills | 0.8 | 0.55 |

The engagement rates are calibrated from typical language learning platform data. Vocabulary has the highest engagement (students find it satisfying and achievable). Speaking has the lowest (students find it uncomfortable and exposed to failure). These differential engagement rates create the conditions for over-fixation — the recommendation engine will naturally gravitate toward vocabulary because it maximizes engagement metrics.

9.2 Experimental Conditions

We compare four conditions, each with 3,000 simulated learners:

Condition 1: Unconstrained Optimizer. Standard contextual bandit recommendation engine with no diversity enforcement. The engine maximizes a weighted combination of completion rate (60%) and accuracy (40%).

Condition 2: Epsilon-Greedy. Same optimizer with epsilon = 0.2 random exploration. 80% of recommendations follow the optimizer; 20% are uniformly random.

Condition 3: Diversity Re-ranking. Same optimizer with post-hoc diversity re-ranking using Maximal Marginal Relevance (MMR) with diversity weight lambda = 0.4.

Condition 4: Lyapunov-Stabilized Control. Same optimizer with the control-theoretic framework described in this paper. H_min = 0.92 * H_max, PI controller with K_p = 0.15, K_i = 0.03, diversity gate with C_min = 0.8.

9.3 Results: Recommendation Distribution Over Time

The following table shows the recommendation distribution at cycle t = 500 (end of experiment):

|---|---|---|---|---|---|

| Vocabulary | 16.7% | 72.3% | 56.1% | 38.2% | 19.8% |

| Grammar | 16.7% | 14.8% | 15.2% | 17.4% | 17.1% |

| Reading | 16.7% | 6.2% | 10.1% | 14.8% | 16.3% |

| Listening | 16.7% | 3.1% | 7.8% | 12.1% | 15.9% |

| Writing | 16.7% | 2.4% | 6.5% | 10.3% | 15.7% |

| Speaking | 16.7% | 1.2% | 4.3% | 7.2% | 15.2% |

The unconstrained optimizer converges to a severe monoculture: 72.3% vocabulary, with speaking nearly eliminated at 1.2%. Epsilon-greedy reduces the concentration but still produces 56.1% vocabulary. MMR re-ranking achieves better balance but the distribution is skewed (38.2% vocabulary, only 7.2% speaking). The stabilized system maintains near-uniform distribution (15.2%-19.8% range) — a dramatic improvement.

9.4 Results: Diversity Metrics

|---|---|---|---|---|

| Coverage C | 1.00 | 1.00 | 1.00 | 1.00 |

| Novelty N | 0.02 | 0.18 | 0.12 | 0.24 |

| Composite D | 0.28 | 0.55 | 0.65 | 0.91 |

The stabilized system achieves 98% of maximum entropy, far exceeding the 92% target. Coverage is 1.0 across all conditions (all categories retain non-zero probability), but this metric alone obscures the massive distributional differences. The stabilized system also achieves the highest novelty (0.24), indicating that the recommendation distribution varies over time rather than settling into a static diverse pattern.

9.5 Results: Learning Outcomes

|---|---|---|---|---|

| Total learning score | 100.0% (ref) | 93.2% | 96.8% | 98.1% |

| Balanced learning (BLS) | 12.3 | 28.7 | 45.1 | 67.8 |

| Weakest category score | 8.2 | 21.4 | 38.9 | 62.3 |

| JLPT N4 pass rate | 41.2% | 52.8% | 63.4% | 78.6% |

The stabilized system preserves 98.1% of total learning compared to the unconstrained optimizer — a negligible 1.9% cost. However, the balanced learning score improves 5.5x (12.3 to 67.8), and the weakest category score improves 7.6x (8.2 to 62.3). Most significantly, the JLPT N4 pass rate (a standardized Japanese proficiency test that requires balanced skills) increases from 41.2% to 78.6% — a 91% relative improvement.

This demonstrates the core value proposition: over-fixation suppression sacrifices almost nothing in total learning while dramatically improving balanced proficiency and real-world assessment outcomes.

9.6 Results: Gate Behavior

| Metric | Value |

|---|---|

| Total recommendation cycles | 500 x 3,000 = 1,500,000 |

| Gate evaluations | 1,500,000 |

| Gate BLOCK decisions | 94,500 (6.3%) |

| Average correction magnitude | 0.023 (L2 norm of perturbation) |

| Maximum correction magnitude | 0.089 |

| Controller active cycles | 6.3% |

| Human escalations | 0 (all corrections resolved automatically) |

| Fallback to uniform | 0 |

The gate intervenes on only 6.3% of cycles, confirming that the preventive stabilization approach requires minimal active correction. When corrections are needed, they are small (average L2 norm of 0.023, meaning less than a 2.3% probability shift per category). No recommendations required human escalation or fallback to the uniform distribution, demonstrating that the PI controller with anti-windup is sufficient for all encountered scenarios.

10. Comparison with Content Diversity in Media Recommendation

The over-fixation problem in education has a well-known counterpart in media recommendation: the filter bubble. Content diversity in media recommendation has received extensive research attention, particularly after Pariser's (2011) articulation of the filter bubble concept. This section compares the educational and media contexts and explains why media diversity approaches are insufficient for education.

10.1 Structural Differences

| Dimension | Media Recommendation | Educational Recommendation |

|---|---|---|

| Objective | Engagement maximization | Learning maximization |

| Harm of monoculture | Narrowed worldview | Stunted cognitive development |

| User preference | Legitimate signal | Partially misleading signal |

| Exploration cost | Minor (user sees a different article) | Significant (student practices difficult material) |

| Category interdependence | Low (articles are mostly independent) | High (skills build on each other) |

| Time horizon | Short (session-level) | Long (semester/year-level) |

| Ground truth | Engagement is directly measurable | True learning requires delayed assessment |

The most critical difference is user preference validity. In media, if a user prefers sports articles over politics, respecting that preference is legitimate — there is no objective sense in which they "should" read political news. In education, if a student prefers vocabulary drills over listening exercises, accommodating that preference may actively harm their development. The student's preference reflects comfort, not pedagogical need. Over-fixation in education is not about respecting user preferences less — it is about recognizing that the optimization signal (engagement, completion) is a poor proxy for the true objective (balanced proficiency).

10.2 Media Diversity Approaches and Their Limitations in Education

Calibrated Recommendations (Steck et al., 2018). Calibrated recommendations adjust the recommendation distribution to match the user's historical interest distribution. In media, this prevents drift from established preferences. In education, matching the historical distribution would perpetuate whatever imbalance already exists — calibration is actively harmful because it stabilizes the wrong distribution.

Aggregate Diversity (Adomavicius & Kwak, 2012). Aggregate diversity measures the total number of distinct items recommended across all users. This is a system-level metric that does not guarantee per-user diversity. A system could achieve high aggregate diversity by specializing different users in different categories — each user gets a monoculture, but different users get different monocultures. In education, per-user diversity is essential.

Determinantal Point Processes (DPP) (Chen et al., 2018). DPPs model item diversity via a kernel matrix that encodes pairwise dissimilarity. Items are sampled jointly to maximize diversity. DPPs are effective for within-list diversity (ensuring a single recommendation set contains diverse items) but do not provide temporal stability guarantees. A DPP-based system could produce diverse recommendations at each cycle but still converge to a fixed diverse pattern, lacking the temporal novelty that the spacing effect requires.

Fairness-aware Recommendation (Burke, 2017). Fairness approaches ensure that recommendation outcomes are equitable across user groups. This is a different axis than diversity — a fair system could give all students the same monoculture if the monoculture is equally distributed across demographics. Fairness is necessary but not sufficient for educational quality.

10.3 Why Control Theory Outperforms Post-Hoc Methods

The fundamental advantage of the control-theoretic approach over media diversity methods is that it operates on the dynamics of the recommendation process, not on the output. Post-hoc diversity methods (re-ranking, calibration, DPP sampling) modify the recommendation output after the optimizer has produced it. The optimizer and the diversity mechanism are in constant tension — the optimizer adapts to the diversity filter, finding new ways to concentrate recommendations within the filter's blind spots.

The control-theoretic approach integrates with the dynamics directly. The controller monitors the system's trajectory through distribution space and applies corrective forces only when the trajectory is heading toward the monoculture attractor. It does not fight the optimizer — it modifies the operating region so that the optimizer's natural behavior stays within diverse bounds. This is analogous to the difference between a pilot constantly correcting course (post-hoc) and designing an aircraft that is inherently stable (control-theoretic).

10.4 Transferability from Education to Media

While media diversity approaches are insufficient for education, the control-theoretic framework transfers well in the other direction. Media recommendation platforms could benefit from entropy-based diversity constraints with Lyapunov stability guarantees, particularly for news recommendation where filter bubble effects have documented societal consequences. The main adaptation required is adjusting the diversity threshold (H_min) — media platforms may tolerate lower diversity because user preference is a more legitimate signal in entertainment contexts.

11. Benchmarks

11.1 Experimental Setup

All benchmarks were conducted on the language learning platform simulation described in Section 9. The simulation models 12,000 learners (3,000 per condition) over 500 recommendation cycles each, totaling 6,000,000 recommendation events. Student models use a Bayesian Knowledge Tracing (BKT) framework with category-specific learning rates and forgetting curves calibrated from published language learning data.

Hardware: Apple M2 Ultra, 192GB RAM. Software: TypeScript simulation framework running on Node.js 22. All random seeds are fixed and reported for reproducibility.

11.2 Benchmark 1: Diversity Recovery Speed

Scenario: The system starts from a severe monoculture state (90% probability on vocabulary, 2% each on the remaining five categories). How quickly does each method restore diversity?

|---|---|---|---|

| Epsilon-Greedy (e=0.2) | 87 | Never | 0.64 |

| MMR Re-ranking (lambda=0.4) | 42 | 198 | 0.85 |

| Stabilized (PI control) | 8 | 23 | 0.98 |

The stabilized system recovers to 80% entropy in just 8 cycles and reaches the 92% target in 23 cycles — an order of magnitude faster than MMR re-ranking (42 and 198 cycles respectively). Epsilon-greedy never reaches 92% entropy because its random exploration is too diffuse to efficiently redistribute probability mass. The unconstrained system never recovers — it deepens the monoculture.

11.3 Benchmark 2: Learning Effectiveness Under Constraint

Scenario: Measure the cumulative learning gain over 500 cycles as a function of the diversity constraint level.

|---|---|---|---|---|

| 0.00 (no constraint) | 100.0% | 12.3 | 41.2% | 0% |

| 0.50 | 99.4% | 28.5 | 52.1% | 1.2% |

| 0.70 | 99.1% | 41.2 | 61.3% | 2.8% |

| 0.80 | 98.7% | 52.8 | 69.7% | 4.1% |

| 0.92 | 98.1% | 67.8 | 78.6% | 6.3% |

| 0.98 | 96.8% | 74.2 | 82.1% | 12.7% |

| 1.00 (uniform) | 94.3% | 78.5 | 84.3% | 38.2% |

The relationship between diversity constraint and total learning is remarkably gradual — even at 92% entropy, total learning is only 1.9% below the unconstrained optimizer. However, balanced learning (BLS) improves dramatically: from 12.3 at no constraint to 67.8 at 92% entropy. The JLPT pass rate follows a similar curve, confirming that real-world proficiency tests reward balanced skills.

The inflection point is around H_min = 0.80 * H_max, where the gate intervention rate is still low (4.1%) but balanced learning has already quadrupled. Going beyond 0.92 produces diminishing returns in balanced learning but rapidly increasing gate interventions (12.7% at 0.98, 38.2% at uniform). The 0.92 threshold represents the optimal tradeoff between diversity benefit and intervention cost.

11.4 Benchmark 3: Controller Robustness Under Model Mismatch

Scenario: The controller is tuned assuming replicator dynamics, but the actual recommendation engine uses a different update rule (softmax policy gradient). How does the controller perform under model mismatch?

|---|---|---|---|

| Replicator (matched) | 0.98 | 6.3% | Stable |

| Softmax Policy Gradient | 0.96 | 8.1% | Stable |

| Thompson Sampling | 0.95 | 9.4% | Stable |

| PPO (RL-based) | 0.94 | 11.2% | Stable |

| Adversarial (designed to minimize diversity) | 0.92 | 24.8% | Stable |

The controller maintains stability across all tested engine types, including an adversarial engine specifically designed to minimize recommendation diversity. The worst case (adversarial engine) still achieves 92% of maximum entropy — the exact target threshold — with a gate intervention rate of 24.8%. This confirms the Lyapunov stability guarantee: the diversity floor is maintained regardless of the recommendation engine's behavior, as long as the controller gain exceeds the minimum threshold.

11.5 Benchmark 4: Computational Overhead

Scenario: Measure the time and memory overhead of the diversity monitoring and control system per recommendation cycle.

| Component | Time per Cycle | Memory |

|---|---|---|

| Entropy computation | 0.003ms | 48 bytes |

| Coverage computation | 0.001ms | 48 bytes |

| Novelty computation | 0.008ms | 384 bytes (window buffer) |

| PI controller update | 0.002ms | 96 bytes |

| Gradient projection (when active) | 0.12ms | 192 bytes |

| Gate evaluation | 0.015ms | 128 bytes |

| Total (no correction) | 0.029ms | 704 bytes |

| Total (with correction) | 0.149ms | 896 bytes |

The total computational overhead is 0.029ms per cycle when no correction is needed (93.7% of cycles) and 0.149ms when correction is applied (6.3% of cycles). Weighted average: 0.037ms per cycle. This is negligible compared to the recommendation engine's own computation time (typically 5-50ms for contextual bandit inference) and completely invisible to the student (who waits 200-500ms for the next exercise to load).

Memory overhead is under 1KB per student session. For 100,000 concurrent students, total memory for the diversity control system is approximately 90MB — trivial for a server-side deployment.

12. Future Directions

12.1 Adaptive Entropy Thresholds

The current framework uses a fixed H_min threshold across all learners and all learning stages. In practice, the optimal diversity level varies by context. Early-stage learners may benefit from lower diversity (focusing on foundational skills before broadening) while advanced learners may require higher diversity (integrating all skills for fluency). Future work will develop adaptive threshold schedules that adjust H_min based on learner proficiency level, learning stage, and curriculum design.

A natural approach is to parameterize H_min as a function of the learner's cumulative mastery vector: H_min(m) = H_base + H_slope * min(m), where m = (m_1, ..., m_K) is the mastery level per category and min(m) is the weakest category. This ensures that the diversity constraint tightens as the learner develops baseline competence in all categories, preventing premature specialization while allowing focused practice early in the learning process.

12.2 Multi-Dimensional Content Spaces

The current framework models content categories as a flat set. Real educational content has hierarchical structure (e.g., vocabulary > kanji > JLPT N5 kanji > radicals), cross-category dependencies (grammar concepts required for reading comprehension), and difficulty gradients within each category. Future work will extend the diversity framework to operate on a structured content space, defining entropy over a content taxonomy tree rather than a flat category set.

This requires replacing Shannon entropy with a tree-structured entropy measure that accounts for the hierarchical relationships between content items. The Lyapunov stability analysis extends naturally to tree-structured entropy, with the additional complication that the gradient projection must respect the tree structure.

12.3 Multi-Learner Diversity Coordination

In classroom settings, individual learner diversity interacts with group diversity. A teacher may want all students to have covered certain topics before moving to group activities. Future work will extend the control framework to coordinate diversity across multiple learners, introducing group-level entropy constraints in addition to individual-level constraints.

This multi-learner coordination maps naturally to the MARIA OS Planet-level governance described in Section 8.5. The classroom coordinator operates at the Planet level, imposing group diversity policies that propagate to individual learner Zones.

12.4 Transfer to Other Domains

The over-fixation suppression framework is not specific to education. Any domain where an AI recommendation system should maintain diversity over time can benefit from control-theoretic stabilization. Promising application domains include:

Healthcare: Treatment recommendation systems that should maintain diversity in therapeutic approaches rather than fixating on the most commonly effective treatment.
Investment: Portfolio recommendation systems that should maintain asset class diversity rather than concentrating on the highest-returning asset.
Hiring: Candidate recommendation systems that should maintain diversity in candidate profiles rather than converging on a single archetype.
Research: Literature recommendation systems that should maintain diversity in suggested papers rather than creating citation echo chambers.

In each domain, the framework requires domain-specific calibration of the content categories, fitness functions, and diversity thresholds. The control-theoretic structure and Lyapunov stability guarantee transfer directly.

12.5 Real-Time Learner Feedback Integration

The current controller responds to the recommendation distribution's diversity metrics but does not directly incorporate learner feedback. Future work will extend the controller to accept real-time signals from the learner (e.g., self-reported difficulty, frustration indicators, flow state detection) and adjust the diversity constraint accordingly. A learner in a flow state on a challenging listening exercise should not be interrupted by a diversity correction; a learner showing signs of boredom on their 15th consecutive vocabulary drill should receive more aggressive diversification.

This integration requires a hierarchical control structure: the inner loop maintains the diversity constraint, and the outer loop adjusts the constraint parameters based on learner state. The MARIA OS gate system naturally supports this hierarchy through nested gate configurations.

13. Conclusion

This paper has presented a complete control-theoretic framework for preventing over-fixation in educational AI recommendation systems. The framework treats recommendation diversity not as a tuning parameter or a soft objective but as a stability invariant — a hard constraint that the system must satisfy at all times, enforced through feedback control and architectural gates.

The key contributions are:

Over-fixation as dynamical instability. We modeled the recommendation process as a dynamical system on the probability simplex and showed that standard adaptive learning algorithms converge to monoculture fixed points — vertex attractors where all recommendation probability collapses onto a single content category. This convergence is fast (15-44 cycles for typical fitness advantages) and robust (nearly all initial conditions lead to monoculture).

Entropy-based diversity measurement. We defined three complementary diversity metrics — Shannon entropy, coverage, and novelty — that together characterize the distributional balance, categorical presence, and temporal variation of the recommendation process. The composite diversity score D(t) provides a single actionable number for gate evaluation.

Control-theoretic stabilization. We designed a PI controller with anti-windup that monitors recommendation entropy in real time and applies minimum-norm corrective perturbations when diversity drops below threshold. The controller is minimally invasive (zero output when diversity is above threshold) and provably stabilizing (Lyapunov analysis guarantees that the safe operating region is positively invariant).

Gate-based enforcement. The minimum entropy constraint is implemented as a MARIA OS responsibility gate that halts recommendation generation when diversity drops below H_min. The gate provides binary enforcement complementing the controller's continuous stabilization — together, they form a defense-in-depth architecture where the controller prevents most diversity violations and the gate catches the remainder.

Empirical validation. Experiments on a language learning platform with 12,000 simulated learners demonstrated that the stabilized system maintains 98% of maximum entropy while preserving 98.1% of learning gains. The balanced learning score improved 5.5x, the weakest category score improved 7.6x, and the standardized test pass rate improved from 41.2% to 78.6%. The gate intervened on only 6.3% of cycles with negligible computational overhead (0.037ms per cycle).

The broader implication of this work is that governance enables better outcomes, not just safer ones. The diversity constraint does not merely prevent harm (monoculture) — it actively improves the quality of the educational experience by forcing the system to develop balanced competence across all skill dimensions. This is the educational instantiation of MARIA OS's core principle: more governance enables more effective automation.

Recommendation algorithms, left to their own optimization, will find the path of least resistance and walk it until it becomes a rut. Control theory provides the guardrails that keep the path wide enough for genuine learning to occur.

References

- [1] Cepeda, N.J., et al. (2006). "Distributed Practice in Verbal Recall Tasks: A Review and Quantitative Synthesis." Psychological Bulletin, 132(3), 354-380. Meta-analysis of 254 studies establishing the spacing effect as one of the most robust findings in learning science.

- [2] Rohrer, D. (2012). "Interleaving Helps Students Distinguish Among Similar Concepts." Educational Psychology Review, 24(3), 355-367. Demonstrates 43% improvement from interleaved vs. blocked practice, supporting recommendation diversity.

- [3] Pariser, E. (2011). "The Filter Bubble: What the Internet Is Hiding from You." Penguin Press. Foundational articulation of the filter bubble problem in recommendation systems.

- [4] Steck, H., et al. (2018). "Calibrated Recommendations." RecSys 2018. Calibrated recommendation approach that adjusts output distribution to match user interest distribution.

- [5] Adomavicius, G. & Kwak, Y. (2012). "Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques." IEEE TKDE, 24(5), 896-911. System-level diversity metric for recommendation systems.

- [6] Chen, L., et al. (2018). "Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity." NeurIPS 2018. DPP-based approach to recommendation diversity.

- [7] Burke, R. (2017). "Multisided Fairness for Recommendation." FAT/ML Workshop. Fairness-aware recommendation framework addressing equity across user groups.

- [8] Hofbauer, J. & Sigmund, K. (1998). "Evolutionary Games and Population Dynamics." Cambridge University Press. Mathematical foundation for replicator dynamics used in our dynamical system model.

- [9] Khalil, H.K. (2002). "Nonlinear Systems." 3rd Edition, Prentice Hall. Standard reference for Lyapunov stability theory and control system design used in our stabilization framework.

- [10] Corbett, A.T. & Anderson, J.R. (1995). "Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge." User Modeling and User-Adapted Interaction, 4(4), 253-278. Bayesian Knowledge Tracing model used in our student simulation.

- [11] VanLehn, K. (2011). "The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems." Educational Psychologist, 46(4), 197-221. Comprehensive comparison of tutoring effectiveness, motivating the need for balanced skill development.

- [12] Doroudi, S., et al. (2019). "Where's the Reward? A Review of Reinforcement Learning for Instructional Sequencing." International Journal of AI in Education, 29(4), 568-620. Survey of RL approaches to educational content sequencing, identifying the exploration-exploitation challenge.

- [13] European Parliament. (2024). "Regulation (EU) 2024/1689 — Artificial Intelligence Act." Official Journal of the European Union. Legal framework classifying educational AI as high-risk, requiring human oversight capabilities.

- [14] MARIA OS Technical Documentation. (2026). Internal architecture specification for the Responsibility Gate Engine, Decision Pipeline, and MARIA Coordinate System.

Over-Fixation Suppression: Control-Theoretic Stabilization of AI Recommendation Convergence in Education

Abstract

1. The Convergence Problem in Educational AI

1.1 The Monoculture Trajectory

1.2 Why Standard Solutions Fail

1.3 The Core Insight: Diversity as a Stability Requirement

2. Over-Fixation as Dynamical System Instability

2.1 Recommendation State Space

2.2 Dynamics on the Simplex

2.3 Fixed Points and Stability

2.4 The Basin of Attraction Problem

2.5 Rate of Convergence

3. Recommendation Diversity Metrics

3.1 Shannon Entropy

3.2 Normalized Entropy

3.3 Coverage

3.4 Novelty

3.5 Composite Diversity Score

4. Control-Theoretic Stabilization Design

4.1 Control Architecture

4.2 Corrective Perturbation Design

4.3 Gradient Projection Solution

4.4 Proportional-Integral (PI) Controller

4.5 Anti-Windup Protection

4.6 Minimal Invasiveness Guarantee

5. Lyapunov Stability for Recommendation Diversity

5.1 Lyapunov Function Construction

5.2 Lyapunov Decrease Condition

5.3 Proof Sketch

5.4 Practical Implications

5.5 Invariance of the Safe Region

6. Minimum Entropy Constraint as Gate Rule

6.1 Gate Rule Definition

6.2 Gate Placement in the Recommendation Pipeline

6.3 Gate Parameters

6.4 Gate Audit Trail

6.5 Responsibility Attribution

7. Exploration-Exploitation Balance in Learning

7.1 Learning Regret Analysis

7.2 Regret Bound

7.3 Cumulative Benefit Analysis

7.4 The Spacing Effect Connection

7.5 Interleaving Benefits

8. Integration with MARIA OS Gate System

8.1 Gate Engine Integration

8.2 Decision Pipeline Integration

8.3 MARIA Coordinate Mapping

8.4 Gate Configuration as Code

8.5 Multi-Zone Coordination

9. Case Study: Language Learning Platform

9.1 Platform Configuration

9.2 Experimental Conditions

9.3 Results: Recommendation Distribution Over Time

9.4 Results: Diversity Metrics

9.5 Results: Learning Outcomes

9.6 Results: Gate Behavior

10. Comparison with Content Diversity in Media Recommendation

10.1 Structural Differences

10.2 Media Diversity Approaches and Their Limitations in Education

10.3 Why Control Theory Outperforms Post-Hoc Methods

10.4 Transferability from Education to Media

11. Benchmarks

11.1 Experimental Setup

11.2 Benchmark 1: Diversity Recovery Speed

11.3 Benchmark 2: Learning Effectiveness Under Constraint

11.4 Benchmark 3: Controller Robustness Under Model Mismatch

11.5 Benchmark 4: Computational Overhead

12. Future Directions

12.1 Adaptive Entropy Thresholds

12.2 Multi-Dimensional Content Spaces

12.3 Multi-Learner Diversity Coordination

12.4 Transfer to Other Domains

12.5 Real-Time Learner Feedback Integration

13. Conclusion

References

Learning State Vector Model: Multi-Dimensional Student Modeling for Governed Educational AI

Treatment Reversibility Modeling: Dynamic Gate Control for Irreversible Medical Actions

Quality Gate Control Theory: Real-Time Stability Analysis for Manufacturing AI

Decision Stability Scoring for Energy Grids: Lyapunov Functions for Power Supply-Demand Governance