1. The Expert Knowledge Fragility Problem
1.1 The Scale of the Challenge
The global insurance industry employs approximately 2.8 million people in the United States alone, with underwriting representing the core value-creation function. A senior commercial property underwriter does not simply apply a checklist. They integrate building construction data, loss history, geographic hazard exposure, occupancy characteristics, fire protection systems, management quality signals, and market conditions into a holistic risk assessment that determines whether to accept or decline a risk and at what price. This integration involves hundreds of implicit rules, many of which the underwriter cannot articulate explicitly — they are embedded in pattern recognition developed over decades of exposure to thousands of submissions.
When organizations deploy AI underwriting agents, they typically follow one of two paths. The first path is supervised learning: train a model on historical underwriting decisions, using the expert's accept/decline outcomes and pricing as labels. The second path is rule extraction: interview experts, codify their decision rules, and implement them as a deterministic rule engine that the AI agent consults. Both paths suffer from the same fundamental problem: they capture the expert's outputs without preserving the expert's reasoning structure.
Consider a concrete example. A senior underwriter evaluating a large warehouse may decline the risk based on a complex interaction between three factors: the building's fire protection rating is marginal (but acceptable in isolation), the occupancy involves high-value inventory with limited salvage potential (concerning but not disqualifying alone), and the loss history shows a pattern of small attritional claims that suggests inadequate maintenance (a weak signal individually). None of these factors would trigger a decline on its own. The underwriter's expertise lies in recognizing the interaction — when these three marginal signals co-occur, they indicate a risk profile that historically leads to large severity losses within 3–5 years.
A supervised learning model trained on outcomes will learn to decline this risk if the training data contains enough examples. But it may learn the wrong interaction — associating decline with any two of the three factors, or with a different combination entirely. A rule-based system will capture the three-factor interaction if the expert articulates it, but experts rarely articulate their most sophisticated reasoning, precisely because it operates below the threshold of conscious analysis.
1.2 The Consequences of Logic Loss
When expert logic is lost during the transfer to an AI agent, three categories of failure emerge:
Silent degradation. The agent makes decisions that are statistically similar to the expert's on the current book of business but fail to generalize to novel submissions. The agent has learned the expert's distribution but not their logic. When market conditions shift, new building types emerge, or regulatory requirements change, the agent's decisions diverge from what the expert would have decided. This divergence is silent because the historical test set cannot detect it — it only becomes visible when losses materialize years later.
Boundary erosion. Expert underwriters maintain sharp decision boundaries that reflect regulatory requirements, reinsurance treaty conditions, or hard lessons from past catastrophic losses. These boundaries are non-negotiable: a building within 500 feet of a flood zone is treated fundamentally differently from one at 501 feet, not because the physical risk changes at that threshold but because the reinsurance treaty requires it. Machine learning models, by their nature, smooth these boundaries into gradients. The agent might accept a risk at 498 feet with slightly elevated pricing, where the expert would have flatly declined. The pricing difference does not compensate for the treaty violation.
Interaction collapse. Expert reasoning involves high-order interactions between features — the warehouse example above is a three-way interaction. Standard machine learning architectures struggle with high-order interactions unless they are explicitly engineered into the feature space. Deep neural networks can theoretically represent any interaction but provide no mechanism for verifying that they have learned the correct ones. The result is that agents systematically under-represent the complex conditional logic that distinguishes expert judgment from actuarial tables.
1.3 Why Accuracy Is Not Enough
The conventional response to these concerns is to measure prediction accuracy: if the agent agrees with the expert on 95% of submissions, it is performing well. This metric is dangerously misleading for three reasons.
First, underwriting is a class-imbalanced problem. If an expert accepts 85% of submissions, an agent that accepts everything achieves 85% accuracy. The 15% of declinations — which represent the expert's most critical risk judgments — are entirely missing.
Second, accuracy is measured on the observed distribution. The submissions the agent will face tomorrow may differ from those in the training set. Logic preservation guarantees generalization; accuracy does not.
Third, two agents can have identical accuracy but very different reasoning. Agent A might correctly decline high-fire-risk properties and incorrectly accept high-flood-risk properties, while Agent B does the reverse. Both achieve the same accuracy, but their failure modes are completely different. An organization that deploys Agent A in a flood-prone region will suffer losses that Agent B's logic would have prevented.
For these reasons, we argue that the correct measure of knowledge transfer quality is not output accuracy but logic preservation — the degree to which the agent's internal reasoning structure faithfully represents the expert's decision process. This is the problem that the Inheritance Function formalization addresses.
2. Expert Logic Formal Representation
2.1 The Decision Predicate Decomposition
To formally verify logic preservation, we must first represent expert reasoning in a form amenable to structural comparison. We introduce the Decision Predicate Decomposition (DPD), which breaks an expert's decision function into atomic logical units.
Definition 1 (Decision Predicate). A decision predicate p is a Boolean-valued function over the underwriting feature space X:
Each predicate represents a single evaluable condition in the expert's reasoning. Examples include:
- p_1(x) = 1 if building_age(x) > 40 years
- p_2(x) = 1 if fire_protection_class(x) >= 7
- p_3(x) = 1 if loss_ratio_5yr(x) > 0.65
- p_4(x) = 1 if distance_to_coast(x) < 1 mile AND construction_type(x) = "frame"
Note that predicates can involve single features (p_1, p_2, p_3) or feature interactions (p_4). The set of all predicates used by an expert constitutes their predicate vocabulary.
Definition 2 (Predicate Vocabulary). The predicate vocabulary V_expert of an expert underwriter is the finite set of decision predicates that fully characterizes their decision function:
where K is the number of atomic decision conditions the expert employs. Empirically, senior underwriters in commercial property operate with K in the range of 80–200 predicates, though they may only be able to articulate 30–50% of them explicitly.
2.2 The Expert Decision Function
Given a predicate vocabulary, the expert's decision function is a Boolean combination of predicates:
where Φ is a Boolean function over K binary inputs. In the simplest case, Φ is a conjunction (accept if all conditions are met) or a disjunction (decline if any red flag is present). In practice, expert reasoning involves complex nested conditionals:
This expression reads: accept the risk if (the building is new AND well-protected), OR (the loss history is clean AND there is no coastal exposure AND the occupancy is low-hazard), OR (the account is large AND the management is sophisticated AND the fire protection is adequate AND there are no coverage restrictions). The expert may not express their reasoning in this formal notation, but their decisions can be decomposed into this form through structured elicitation and decision tree analysis.
2.3 Extracting the Predicate Vocabulary
The predicate vocabulary is extracted through three complementary methods:
Method 1: Structured Expert Elicitation. Domain experts are interviewed using the Critical Decision Method (CDM), a cognitive task analysis protocol that elicits decision rules by walking through specific past cases. The expert narrates their reasoning on each case, and the interviewer identifies decision predicates from the narrative. This method captures approximately 40–60% of the predicate vocabulary — the predicates the expert can consciously articulate.
Method 2: Decision Tree Induction. A decision tree is trained on the expert's historical decisions using the same features available to the expert. The tree's split conditions correspond directly to decision predicates. The tree is constrained to a depth and complexity that matches the expected cognitive capacity of a human underwriter (maximum depth 8–12, maximum 50–150 leaves), preventing the algorithm from learning overly granular predicates that do not reflect human reasoning. This method captures predicates that the expert uses implicitly — patterns they follow consistently but cannot articulate.
Method 3: Counterfactual Probing. The expert is presented with pairs of submissions that differ on a single feature dimension, and their accept/decline decision is recorded. Changes in the decision identify the features that serve as decision predicates. This method is particularly effective for identifying boundary conditions and interaction effects that neither verbal elicitation nor tree induction capture reliably.
The union of predicates from all three methods forms the complete predicate vocabulary. Each predicate is annotated with its source method (elicited, induced, or probed), its estimated importance (based on how frequently it influences decisions), and its confidence level (based on consistency across extraction methods).
2.4 Handling Continuous Decisions
Underwriting is not purely binary (accept/decline). Experts also set pricing, terms, and conditions. We extend the predicate framework to continuous decisions by decomposing the pricing function into a base rate plus predicate-conditioned adjustments:
where δ_k is the pricing adjustment associated with predicate p_k (e.g., +15% for fire protection class >= 7), δ_jk captures interaction adjustments (e.g., +25% when both coastal exposure AND frame construction are present), and ε represents residual variation not captured by the predicate model.
This linear-in-predicates representation is not restrictive — it can approximate any nonlinear pricing function to arbitrary accuracy by including sufficiently many interaction predicates. The key advantage is that each term has a direct interpretation as an expert judgment about a specific risk factor or factor combination, enabling verification of logic preservation at the individual term level.
3. Inheritance Function Definition: I(f_expert, f_agent)
3.1 Motivation and Design Principles
The Inheritance Function I(f_expert, f_agent) is a scalar measure in [0, 1] that quantifies how faithfully an AI agent's decision function preserves the logical structure of a human expert's decision function. The function is designed according to four principles:
Principle 1: Structural, not statistical. I measures logic preservation, not output agreement. Two functions with 95% output agreement but fundamentally different reasoning structures should score low on I. Conversely, two functions with 90% output agreement but identical reasoning structure (differing only due to calibration) should score high.
Principle 2: Decomposable. I should decompose into interpretable sub-scores corresponding to specific aspects of logic preservation, enabling targeted diagnosis when the overall score is low.
Principle 3: Sensitive to high-stakes logic. Preservation of decision logic for high-risk, high-impact submissions should contribute more to I than preservation of logic for routine submissions. An agent that faithfully handles catastrophic risk assessment but slightly misprices routine accounts is better than one that prices routine accounts perfectly but mishandles catastrophic risks.
Principle 4: Monotonic in governance. Increasing the governance intensity (more responsibility gates, more human oversight) should weakly increase I, because governance mechanisms catch and correct logic divergence.
3.2 Formal Definition
Definition 3 (Inheritance Function). Let f_expert be the expert decision function with predicate vocabulary V_expert = {p_1, ..., p_K}, and let f_agent be the agent decision function. The Inheritance Function is:
where:
- w_k is the importance weight of predicate p_k, satisfying Σ_k w_k = 1. Weights are derived from the predicate's impact on the expert's decision function: predicates that influence more decisions or that govern higher-risk submissions receive higher weights.
- σ_k(f_expert, f_agent) is the preservation score for predicate p_k, measuring how faithfully the agent's decision function respects the logical role of p_k in the expert's reasoning.
The preservation score σ_k is itself a composite:
where α_1 + α_2 + α_3 = 1, and:
- Activation(p_k) measures whether the agent's decision function responds to the activation of predicate p_k. If the expert treats fire protection class >= 7 as a risk factor, does the agent's output change when this condition becomes true?
- Boundary(p_k) measures whether the agent preserves the expert's decision boundary associated with p_k. If the expert uses a threshold of 40 years for building age, does the agent's decision surface exhibit a corresponding discontinuity near 40?
- Interaction(p_k) measures whether the agent preserves the interaction structure involving p_k. If the expert treats fire protection and building age as interacting factors (old + poorly protected = decline), does the agent exhibit the same interaction pattern?
3.3 Computing the Activation Score
The activation score measures the sensitivity of the agent's decision function to predicate p_k. We define it using a functional derivative approach:
For binary decisions, the partial derivative is interpreted as the change in decision probability when p_k switches from 0 to 1, holding all other predicates fixed:
The activation score is 1.0 when the agent's sensitivity to predicate p_k exactly matches the expert's across the input distribution. It decreases when the agent either over-responds or under-responds to the predicate relative to the expert. A score of 0.0 means the agent completely ignores a predicate that the expert considers important, or vice versa.
Worked Example. Suppose predicate p_k = "building age > 40 years" causes the expert to increase decline probability by 0.15 on average (averaged over the distribution of other features). If the agent increases decline probability by 0.12 when p_k activates, the activation score is:
The agent preserves 80% of the expert's sensitivity to building age, which is a moderately good score indicating that the agent recognizes building age as a factor but under-weights it compared to the expert.
3.4 Computing the Boundary Score
The boundary score measures whether the agent preserves the expert's decision thresholds. For each predicate p_k that involves a threshold condition (e.g., building_age > 40), we compare the expert's and agent's decision surfaces near the threshold:
where τ_expert,k is the expert's threshold for predicate p_k, τ_agent,k is the agent's effective threshold (identified by finding the inflection point in the agent's decision surface along the feature associated with p_k), and range_k is the natural range of the feature.
For categorical predicates (e.g., construction_type = "frame"), the boundary score reduces to a binary match: 1.0 if the agent treats the same category as the expert, 0.0 otherwise.
Worked Example. The expert uses a threshold of 500 feet for "distance to fire station." The agent's decision surface shows an inflection point at 480 feet. The feature range is 0–10,000 feet.
The boundary is preserved with high fidelity. However, if the agent's inflection point were at 750 feet:
Still acceptable, but the 250-foot shift means the agent is applying a materially different distance criterion than the expert, which could affect decisions for properties in the 500–750 foot range.
3.5 Computing the Interaction Score
The interaction score is the most computationally expensive but also the most important component, because interaction effects are where expert judgment most commonly degrades during knowledge transfer.
For each predicate p_k, we identify the set of predicates it interacts with in the expert's decision function. Two predicates p_j and p_k interact if their joint effect on the decision differs from the sum of their individual effects:
This is the standard interaction contrast from factorial experimental design. The interaction score for predicate p_k is:
where Partners(k) is the set of predicates that interact with p_k in the expert's function, IE^expert_{j,k} is the expert's interaction effect between p_j and p_k, and IE^agent_{j,k} is the agent's corresponding interaction effect.
The score is 1.0 when the agent perfectly reproduces all of the expert's interaction patterns involving p_k. It decreases when the agent fails to capture interactions (treating factors as independent when the expert treats them as dependent) or introduces spurious interactions (treating factors as dependent when the expert treats them as independent).
Worked Example. The expert's decision function exhibits a strong interaction between fire protection class and building age: old buildings with poor fire protection are declined at a rate 30 percentage points higher than what would be predicted from the individual effects of age and fire protection. The agent's function shows this interaction at 22 percentage points. The interaction score for the fire protection predicate (considering only the age interaction) is:
The agent preserves 73.3% of this critical interaction effect. This is a concerning score — the agent underestimates the combined risk of old, poorly-protected buildings, which is precisely the type of logic loss that leads to adverse selection.
3.6 Importance Weighting
The importance weight w_k for each predicate is computed from three factors:
- Frequency_k is the fraction of submissions where predicate p_k influences the expert's decision. Predicates that matter for more submissions receive higher weight.
- Impact_k is the average magnitude of the predicate's effect on the decision. Predicates that cause large swings (e.g., switching from accept to decline) receive higher weight than those causing minor pricing adjustments.
- Irreversibility_k captures the difficulty of correcting a decision error related to this predicate. Declining a good risk is reversible (the broker can re-submit); accepting a catastrophic risk is not (losses materialize over years).
The weights are normalized so that Σ_k w_k = 1. This normalization ensures that I(f_expert, f_agent) is bounded in [0, 1].
4. Decision Tree Equivalence Testing
4.1 Canonical Decision Tree Representation
While the Inheritance Function operates at the predicate level, we also need a holistic structural comparison between the expert's and agent's decision logic. Decision trees provide a natural canonical form for this comparison because they make the logical structure explicit: each internal node is a predicate test, each branch is a predicate outcome, and each leaf is a decision.
Definition 4 (Canonical Decision Tree). Given a decision function f over predicate vocabulary V, the canonical decision tree T(f) is the minimal-depth decision tree that exactly represents f, with predicates ordered by information gain at each level.
The canonical representation resolves the ambiguity that arises when the same logical function can be represented by multiple structurally different trees. By fixing the predicate ordering (information gain), we ensure that logically equivalent functions produce identical canonical trees, and structurally different functions produce trees whose differences reflect genuine logic differences.
4.2 Tree Edit Distance
We measure structural equivalence using the normalized tree edit distance (TED). The tree edit distance between two trees T_1 and T_2 is the minimum cost of transforming T_1 into T_2 using three operations:
- Insert node: Add a new internal node (predicate test) or leaf. Cost: c_ins.
- Delete node: Remove an internal node or leaf, connecting its children to its parent. Cost: c_del.
- Relabel node: Change the predicate at an internal node or the decision at a leaf. Cost: c_rel.
The normalized TED is:
where |T| denotes the number of nodes in tree T. The NTED ranges from 0 (identical trees) to 1 (completely different trees). Our structural equivalence score is:
4.3 Operation Cost Calibration
The edit operation costs are not uniform. We calibrate them to reflect the underwriting significance of each operation type:
- c_ins = 1.0 (inserting a node that the expert does not use suggests the agent has introduced logic not present in the expert's reasoning — this may be beneficial or harmful, and requires investigation).
- c_del = 1.5 (deleting a node that the expert uses means the agent has lost a piece of expert logic — this is more concerning than insertion because it represents knowledge loss).
- c_rel(p_j, p_k) = d(p_j, p_k) (relabeling cost depends on the semantic distance between the original and replacement predicates. Replacing "building age > 40" with "building age > 45" has low cost because the predicates are semantically similar. Replacing "building age > 40" with "occupancy type = manufacturing" has high cost because the predicates are semantically unrelated).
The semantic distance d(p_j, p_k) is computed from the feature overlap and threshold similarity of the two predicates. Predicates involving the same feature with similar thresholds have low distance; predicates involving different features have distance approaching 1.0.
4.4 Subtree Preservation Analysis
Beyond the overall TED score, we perform subtree preservation analysis to identify which logical substructures are preserved and which are lost. A subtree rooted at an internal node n in the expert's tree corresponds to a logical subfunction — the reasoning that occurs when execution reaches node n. We compute the preservation score for each subtree:
where the maximum is taken over all nodes m in the agent's tree. A subtree preservation score close to 1.0 indicates that the agent's tree contains a substructure that closely matches the expert's reasoning at node n. A score close to 0.0 indicates that this reasoning substructure has been lost entirely.
This analysis produces a subtree preservation map that highlights exactly which parts of the expert's reasoning are well-preserved and which have degraded. The map is presented to underwriting managers as a visual overlay on the expert's decision tree, with green (preserved), yellow (partially preserved), and red (lost) coloring.
4.5 Handling Ensemble and Neural Agent Architectures
Modern AI agents may not use decision trees internally. They may use gradient-boosted ensembles, neural networks, or hybrid architectures. To apply tree equivalence testing, we extract a decision tree surrogate from the agent's decision function using the following procedure:
1. Generate a large sample of synthetic submissions spanning the feature space. 2. Query the agent's decision function on each synthetic submission. 3. Train a decision tree on the (submission, decision) pairs, constrained to the same depth and complexity as the expert's canonical tree. 4. Use this surrogate tree as T(f_agent) for the equivalence comparison.
The surrogate tree approximation introduces some error, which we bound using a fidelity metric: the fraction of the original sample where the surrogate tree agrees with the full agent. Typical fidelity values for well-constructed surrogates exceed 0.95, meaning that the tree-based equivalence testing captures at least 95% of the agent's logical structure.
When the fidelity is below 0.90, we flag the result and supplement the tree-based analysis with the predicate-level Inheritance Function, which does not require tree extraction and operates directly on the agent's decision surface.
5. Logic Preservation Metrics
5.1 The Six Metrics Framework
We define six metrics that together provide a comprehensive assessment of logic preservation. These metrics are designed to be independently measurable, diagnostically actionable, and collectively exhaustive of the ways expert logic can degrade during transfer.
5.2 Metric 1: Predicate Coverage (PC)
Predicate coverage measures the fraction of the expert's predicate vocabulary that is represented in the agent's decision function:
where θ_act is the minimum activation threshold for a predicate to be considered "represented" (default: 0.05). A predicate with activation score below θ_act is effectively absent from the agent's reasoning.
Interpretation. PC = 1.0 means the agent uses every decision factor that the expert uses. PC < 1.0 identifies specific factors that the agent ignores. In practice, PC values above 0.90 are considered acceptable for deployment, with the missing predicates flagged for monitoring.
Typical finding: Agent models commonly miss predicates related to management quality signals (which are difficult to encode numerically), geographic nuances (microclimate effects, local ordinance variations), and temporal patterns (seasonal risk variations, loss development trends). These are precisely the factors where expert judgment adds the most value over actuarial tables.
5.3 Metric 2: Boundary Fidelity (BF)
Boundary fidelity measures how accurately the agent reproduces the expert's decision thresholds:
where K_thresh is the number of predicates with identifiable thresholds, and Boundary(p_k) is the boundary score defined in Section 3.4.
Interpretation. BF = 1.0 means every threshold is exactly preserved. In practice, BF values above 0.95 are considered excellent. Values below 0.90 indicate systematic threshold drift that warrants investigation.
Critical case: Regulatory boundaries must have BF = 1.0. If a regulatory threshold (e.g., maximum insurable flood zone distance) has BF < 1.0, the agent is making decisions that may violate regulatory requirements, regardless of the overall accuracy of the model. This is enforced as a hard constraint in the MARIA OS implementation.
5.4 Metric 3: Interaction Preservation (IP)
Interaction preservation measures how faithfully the agent reproduces the expert's factor interaction patterns:
where E_expert is the set of factor pairs that interact in the expert's function, w_jk is the importance weight of interaction (j, k), and InteractionMatch(j, k) = 1 - |IE^expert_{j,k} - IE^agent_{j,k}| / |IE^expert_{j,k}|.
Interpretation. IP = 1.0 means every expert interaction is perfectly preserved. IP is typically the lowest-scoring metric because interaction effects are the hardest to preserve during knowledge transfer. Values above 0.80 are considered good; values below 0.70 indicate significant logic degradation in the agent's handling of complex multi-factor risks.
5.5 Metric 4: Monotonicity Consistency (MC)
Expert underwriting logic respects monotonicity constraints: worse values of a risk factor should always lead to worse (or at least not better) outcomes, holding all else equal. For example, increasing the building age should never decrease the risk premium, all else being equal.
where V_monotone is the subset of predicates with known monotonicity relationships, and MonotonicityPreserved(p_k) checks whether the agent's decision function maintains the same monotonic relationship as the expert's.
Interpretation. MC < 1.0 indicates that the agent has learned non-monotonic relationships where monotonicity is expected. This is a strong signal of logic corruption — it means the agent sometimes makes a risk look better when a factor gets worse, which is actuarially incoherent. In our experience, MC violations most commonly occur in neural network agents that have overfit to noise in the training data.
5.6 Metric 5: Exception Handling Completeness (EHC)
Expert underwriters maintain a set of exception rules — hard overrides that apply regardless of the general decision logic. Examples include: "never insure amusement parks," "always refer accounts above $50M TIV to senior underwriter," "decline any risk with more than 3 arson-related claims in the past 10 years."
Interpretation. EHC must be 1.0 for deployment. Exception rules represent the hardest constraints in underwriting — they encode lessons learned from catastrophic losses, regulatory mandates, or reinsurance treaty restrictions. An agent that misses even one exception rule can expose the organization to unacceptable risk. In MARIA OS, exceptions are implemented as pre-execution responsibility gates that fire before the agent's general decision logic.
5.7 Metric 6: Confidence Calibration (CC)
Beyond the binary or continuous decision, we measure whether the agent's confidence in its decisions is calibrated against the expert's. When the expert is highly confident in a decision (e.g., a clearly good risk), the agent should also be confident. When the expert is uncertain (a borderline submission that requires careful deliberation), the agent should reflect that uncertainty.
where ECE is the Expected Calibration Error, computed by binning submissions by agent confidence and measuring the average absolute difference between agent confidence and expert confidence in each bin.
Interpretation. CC = 1.0 means perfectly calibrated confidence. CC < 0.80 indicates the agent is systematically overconfident or underconfident relative to the expert. Overconfident agents are particularly dangerous because they fail to trigger uncertainty-based escalation gates, meaning borderline decisions bypass human review.
5.8 Composite Logic Preservation Score
The six metrics are combined into a composite Logic Preservation Score (LPS):
with β weights reflecting organizational priorities. A default weighting that reflects insurance industry norms is:
| Metric | Weight | Rationale |
|---|---|---|
| PC (Predicate Coverage) | 0.15 | Foundation — can the agent see all factors? |
| BF (Boundary Fidelity) | 0.20 | Critical for regulatory compliance |
| IP (Interaction Preservation) | 0.25 | Highest value-add of expert judgment |
| MC (Monotonicity Consistency) | 0.15 | Actuarial soundness requirement |
| EHC (Exception Handling) | 0.15 | Hard safety constraint |
| CC (Confidence Calibration) | 0.10 | Enables appropriate escalation |
The LPS has a minimum deployment threshold: LPS >= 0.85 is required for autonomous operation, LPS in [0.70, 0.85) requires human-on-the-loop supervision, and LPS < 0.70 requires full human-in-the-loop review of every decision.
6. Responsibility Chain Verification
6.1 The Provenance Problem
When an AI agent declines a commercial property submission, the broker or insured may ask: "Why was this declined?" In a human-underwritten system, the answer is straightforward — the underwriter explains their reasoning. In an AI-underwritten system, the answer must be formally traceable to the expert logic that the agent inherited. This traceability requirement creates a provenance problem: for every agent decision, we must be able to identify which expert rules contributed to the decision and verify that those rules were correctly applied.
6.2 The Responsibility Provenance Graph
Definition 5 (Responsibility Provenance Graph). The responsibility provenance graph G = (N, E) is a directed acyclic graph where:
- N is the set of nodes, partitioned into three types: Expert Rule Nodes (R), Agent Decision Nodes (D), and Evidence Nodes (Ev).
- E is the set of directed edges representing derivation relationships.
Edge types include:
- (R_i, D_j): Agent decision D_j derives from expert rule R_i. This edge exists when predicate p_k (corresponding to rule R_i) has a non-zero activation in the agent's decision on submission j.
- (D_j, Ev_m): Agent decision D_j is supported by evidence Ev_m (the data elements that were evaluated).
- (R_i, R_l): Expert rule R_i depends on expert rule R_l (rule chaining, where the output of one rule feeds into another).
The provenance graph enables two critical capabilities:
Forward traceability: Given an expert rule R_i, find all agent decisions that invoked this rule. This answers: "How is the expert's knowledge being used?"
Backward traceability: Given an agent decision D_j, find all expert rules that contributed to it. This answers: "Why did the agent make this decision, and whose expertise does it reflect?"
6.3 Chain Completeness Verification
A responsibility chain is complete when every agent decision can be traced back to at least one expert rule, and every expert rule in the chain can be verified against the original predicate vocabulary.
Definition 6 (Chain Completeness). The responsibility chain is complete for decision D_j if:
and
Chain completeness is a binary property: either every decision is traceable or it is not. In our framework, chain completeness must be 100% for deployment. An agent decision that cannot be traced to any expert rule is an orphan decision — it represents logic that the agent has generated independently of the expert's knowledge, which may be correct but cannot be verified through the inheritance framework.
Orphan decisions are not necessarily wrong. The agent may have discovered valid patterns that the expert did not articulate. However, orphan decisions require separate validation through a different mechanism (statistical back-testing, actuarial review, or human expert evaluation). The responsibility chain framework identifies these decisions so they can be routed to the appropriate validation pathway.
6.4 Chain Strength Measurement
Beyond binary completeness, we measure the strength of each responsibility chain — how directly and clearly the agent decision derives from expert rules:
where Ancestors(D_j) is the set of all expert rule nodes reachable from D_j by backward traversal, w_i is the importance weight of rule R_i, and σ_i is the preservation score for the predicate corresponding to R_i.
ChainStrength ranges from 0 to 1. A value close to 1.0 means the agent's decision is strongly rooted in well-preserved expert logic. A value close to 0.0 means the decision is nominally traceable to expert rules but those rules are poorly preserved in the agent's function.
The ChainStrength metric distinguishes between two types of governance failure:
- Broken chain (ChainStrength = 0): The agent made a decision with no traceable expert origin. This is caught by chain completeness checking.
- Weak chain (ChainStrength < threshold): The agent's decision is nominally derived from expert rules, but those rules have degraded so much that the derivation is meaningless. This is the more insidious failure mode — the system appears compliant (every decision has a provenance trace) but the traces point to corrupted logic.
6.5 Temporal Provenance
Responsibility chains evolve over time as the expert's knowledge is updated, the agent's model is retrained, and the market conditions change. We maintain temporal provenance by versioning both the predicate vocabulary and the agent's decision function:
Each version of the provenance graph is immutable and permanently stored. When a regulatory audit requires understanding why a decision was made six months ago, the system retrieves the provenance graph from that time period, including the predicate vocabulary version, the agent model version, and the specific expert rules that were active.
This temporal provenance is critical for insurance regulatory compliance. State insurance departments may examine underwriting decisions months or years after they were made. The organization must be able to demonstrate that each decision was made according to filed rates and rules, and the provenance graph provides this demonstration.
7. Drift Detection: When the Agent Diverges from Expert Logic
7.1 The Drift Problem
Even if an agent starts with high logic preservation, its decision behavior may drift over time. Drift occurs for several reasons:
- Model retraining. When the agent's model is retrained on new data, the retrained model may not preserve the same logical structure as the original. New data patterns may overwrite learned expert logic.
- Feature distribution shift. As the mix of submitted risks changes (e.g., more coastal properties due to market expansion), the agent's decisions may behave differently in the new distribution, even though the model itself has not changed.
- Expert knowledge evolution. The originating expert (or their successors) may update their judgment based on new loss experience, regulatory changes, or market conditions. The agent's inherited logic becomes stale.
- Adversarial adaptation. Brokers may learn the agent's decision patterns and selectively submit risks that exploit weaknesses in the agent's logic (a form of adverse selection against the agent).
7.2 The Drift Detection Framework
We define a continuous monitoring system that computes the Inheritance Function I(f_expert, f_agent) at regular intervals and triggers alerts when the score degrades:
Definition 7 (Drift Score). The drift score at time t is:
where f_agent,0 is the agent's initial decision function (at deployment) and f_agent,t is the current decision function. Positive drift indicates logic degradation (the agent is becoming less faithful to the expert). Negative drift indicates logic improvement (the agent is becoming more faithful, possibly due to targeted corrections).
The drift score is computed using a sliding window of recent decisions. We maintain a reference dataset D_ref of submissions that were evaluated at deployment time (with known expert decisions), and periodically re-evaluate the agent on D_ref to detect changes in the agent's decision function.
7.3 Statistical Drift Detection
Raw drift scores fluctuate due to sampling noise. We use a sequential hypothesis testing framework to distinguish genuine drift from random variation:
Null hypothesis H_0: The agent's logic preservation has not changed since deployment. The observed drift ΔI(t) is due to sampling noise.
Alternative hypothesis H_1: The agent's logic preservation has degraded. The observed drift is statistically significant.
We apply the Page-Hinkley test, a sequential change detection algorithm that monitors the cumulative sum of drift scores and triggers an alert when the sum exceeds a threshold:
where δ is a tolerance parameter (the minimum drift we consider meaningful, default: 0.02), and λ is the detection threshold (controlling the tradeoff between sensitivity and false alarm rate, default: 5.0).
The Page-Hinkley test has the advantage of detecting drift as soon as statistically possible, without waiting for a fixed evaluation window. In our experiments, it detects a 5-point drop in the Inheritance Function within 48 hours at a false alarm rate of less than 1 per quarter.
7.4 Drift Response Protocol
When drift is detected, the system executes a graduated response:
Level 1 (Yellow Alert, ΔI < 0.05). The drift is logged and the underwriting manager is notified. The agent continues to operate autonomously, but its decisions are tagged for retrospective review. The predicate-level decomposition identifies which specific predicates have drifted, enabling targeted investigation.
Level 2 (Orange Alert, 0.05 ≤ ΔI < 0.10). The agent's autonomy is reduced. All decisions above a configurable impact threshold require human approval before execution. The MARIA OS responsibility gate framework enforces this constraint automatically by elevating the risk tier of all decisions from this agent.
Level 3 (Red Alert, ΔI ≥ 0.10). The agent is suspended from autonomous operation. All decisions require human review. A root cause analysis is initiated to determine whether the drift is due to model degradation, distribution shift, or expert knowledge evolution. Depending on the root cause, the remediation may involve model retraining, predicate vocabulary update, or expert re-elicitation.
The graduated response ensures that minor drift does not disrupt operations, while significant drift triggers immediate human oversight. The thresholds (0.05, 0.10) are configurable per product line and can be set more aggressively for high-risk portfolios.
7.5 Drift Root Cause Attribution
When drift is detected, identifying the root cause is essential for selecting the correct remediation. We decompose the drift into three components:
- Model drift (ΔI_model): Measured by evaluating the current agent model on the original reference dataset D_ref. If the agent's decisions on D_ref have changed, the model itself has drifted (due to retraining, gradient decay, or other model-internal changes).
- Distribution drift (ΔI_distribution): Measured by comparing the feature distribution of recent submissions to D_ref. If the distribution has shifted, the agent may behave differently on the new data even if the model is unchanged.
- Expert drift (ΔI_expert): Measured by re-evaluating the expert on a subset of recent submissions. If the expert's own decisions have changed (reflecting updated judgment), the inherited logic may be stale rather than corrupted.
This decomposition enables targeted remediation: model drift requires retraining, distribution drift requires monitoring (and possibly portfolio adjustment), and expert drift requires re-elicitation of the predicate vocabulary.
8. Integration with MARIA OS Responsibility Gates
8.1 Architecture Overview
The underwriting responsibility inheritance framework integrates with MARIA OS through three architectural touchpoints: the decision pipeline, the responsibility gate system, and the governance dashboard.
Decision Pipeline Integration. Each underwriting decision passes through the MARIA OS 6-stage pipeline: proposed → validated → [approval_required | approved] → executed → [completed | failed]. The inheritance framework extends the "validated" stage by computing the chain strength for the proposed decision and comparing it against the deployment threshold. If ChainStrength < threshold, the decision transitions to "approval_required" instead of "approved," routing it to a human underwriter for review.
Responsibility Gate Integration. The six logic preservation metrics (PC, BF, IP, MC, EHC, CC) are mapped to MARIA OS responsibility gate conditions. Each metric has a configurable threshold, and violation of any threshold triggers the corresponding gate:
| Metric | Gate Trigger | Gate Action |
|---|---|---|
| PC < 0.90 | Missing predicate gate | Log + tag for review |
| BF < 0.95 | Boundary violation gate | Escalate to senior underwriter |
| IP < 0.70 | Interaction gap gate | Escalate + require evidence bundle |
| MC < 1.00 | Monotonicity violation gate | Hard block — must be corrected |
| EHC < 1.00 | Exception miss gate | Hard block — route to manual processing |
| CC < 0.80 | Calibration drift gate | Reduce autonomy level |
The gate actions are cumulative: a decision that triggers both the boundary violation gate and the interaction gap gate receives both escalation and evidence bundle requirements.
8.2 Coordinate System Mapping
The MARIA OS coordinate system (G.U.P.Z.A) maps naturally to the insurance organizational structure:
The inheritance framework inherits risk policies from each level of the hierarchy:
- Galaxy level sets the minimum LPS for any AI agent operating in the enterprise.
- Universe level sets domain-specific thresholds (commercial lines may require higher IP than personal lines).
- Planet level defines the product-specific predicate vocabulary and exception rules.
- Zone level configures the drift detection parameters for the specific book of business.
- Agent level stores the agent's current inheritance score and provenance graph.
8.3 Evidence Bundle Extension
For underwriting decisions that trigger responsibility gates, the MARIA OS evidence bundle is extended with inheritance-specific fields:
{
"decision": {
"submission_id": "SUB-2026-00847",
"recommendation": "decline",
"confidence": 0.89
},
"inheritance": {
"chain_strength": 0.82,
"activated_predicates": [
{
"predicate_id": "P_047",
"description": "Building age > 40 years",
"source_expert": "J. Morrison (Senior UW, retired 2025)",
"preservation_score": 0.91
},
{
"predicate_id": "P_112",
"description": "Fire protection class >= 7 AND building age > 40",
"source_expert": "J. Morrison (Senior UW, retired 2025)",
"preservation_score": 0.73,
"flag": "interaction_degradation"
}
],
"logic_preservation_score": 0.87,
"drift_status": "stable",
"last_drift_check": "2026-02-12T08:00:00Z"
},
"audit": {
"agent_coordinate": "G1.U1.P1.Z1.A1",
"decision_pipeline_id": "DEC-2026-04291",
"gate_triggered": "interaction_gap_gate",
"escalated_to": "G1.U1.P1.Z1.HUMAN_UW_03"
}
}This evidence bundle provides the human reviewer with complete context: which expert rules drove the decision, how well those rules are preserved, and where the agent's logic may have degraded. The reviewer can make an informed decision about whether to override the agent, confirm the agent's recommendation, or flag the case for model improvement.
8.4 Audit Trail and Regulatory Compliance
Every computation of the Inheritance Function, every drift detection result, and every provenance chain is logged as an immutable audit record in the MARIA OS decision log. This creates a comprehensive audit trail that satisfies insurance regulatory requirements:
- NAIC Model Audit Rule compliance: Every AI-assisted underwriting decision is traceable to the filed rates and rules through the provenance graph.
- State examination readiness: The inheritance framework produces documentation showing that the AI agent's logic is formally verified against expert judgment, with continuous monitoring for drift.
- Reinsurance treaty compliance: Boundary fidelity verification ensures that the agent respects reinsurance treaty conditions (coverage limits, excluded perils, territorial restrictions).
- Fair lending / unfair discrimination monitoring: The predicate vocabulary analysis can identify whether the agent has learned proxy features that correlate with protected characteristics, enabling proactive bias detection.
9. Case Study: Commercial Property Underwriting
9.1 Setup
We applied the inheritance framework to the commercial property underwriting division of a mid-size specialty carrier. The division processes approximately 12,000 submissions annually, managed by a team of 6 underwriters with an average tenure of 18 years. The carrier was deploying an AI underwriting agent to handle the initial triage and recommendation for submissions under $10M Total Insured Value (TIV), representing approximately 65% of their submission volume.
Expert selection. The most senior underwriter (28 years of experience, lead underwriter for the large commercial property segment) served as the primary expert for predicate vocabulary extraction. Two additional senior underwriters participated in the validation process.
Predicate extraction results. The structured elicitation process (16 hours of CDM interviews across 40 historical cases) identified 68 explicit predicates. Decision tree induction on 5 years of historical decisions (7,200 data points) identified an additional 47 implicit predicates. Counterfactual probing (200 synthetic submission pairs) confirmed 38 of the implicit predicates and identified 12 new boundary conditions. After deduplication and consolidation, the final predicate vocabulary contained K = 127 predicates.
Agent architecture. The AI agent uses a gradient-boosted tree ensemble (XGBoost) trained on the same 5-year historical dataset, with 43 input features corresponding to the standard ACORD application fields plus proprietary risk scoring features. The agent produces a three-way recommendation (accept, refer to senior underwriter, decline) with a confidence score.
9.2 Inheritance Function Results
The initial Inheritance Function computation produced I(f_expert, f_agent) = 0.947, decomposed as follows:
| Predicate Group | Count | Avg Preservation | Min Preservation |
|---|---|---|---|
| Construction quality | 18 | 0.96 | 0.88 |
| Fire protection | 15 | 0.97 | 0.91 |
| Occupancy hazard | 22 | 0.95 | 0.79 |
| Loss history | 14 | 0.93 | 0.71 |
| Geographic exposure | 19 | 0.94 | 0.82 |
| Coverage structure | 12 | 0.98 | 0.94 |
| Management quality | 8 | 0.78 | 0.52 |
| Market conditions | 6 | 0.91 | 0.83 |
| Interactions | 13 | 0.88 | 0.61 |
The management quality predicates scored lowest, which was expected — these predicates rely on qualitative signals (responsiveness of the insured, maintenance program documentation, claims management history) that are difficult to encode numerically. The interaction predicates also scored below average, confirming the general finding that multi-factor expert logic is hardest to preserve.
9.3 Logic Preservation Metrics
| Metric | Score | Threshold | Status |
|---|---|---|---|
| Predicate Coverage (PC) | 0.953 (121/127) | 0.90 | Pass |
| Boundary Fidelity (BF) | 0.978 | 0.95 | Pass |
| Interaction Preservation (IP) | 0.841 | 0.70 | Pass (marginal) |
| Monotonicity Consistency (MC) | 0.985 | 1.00 | Fail |
| Exception Handling (EHC) | 1.000 (34/34) | 1.00 | Pass |
| Confidence Calibration (CC) | 0.912 | 0.80 | Pass |
| **Composite LPS** | **0.944** | **0.85** | **Pass** |
Notable findings:
Six missed predicates. The agent failed to capture 6 of 127 predicates (PC = 0.953). Four of the six were management quality signals, and two were geographic microclimate effects (proximity to specific industrial facilities). These predicates were flagged for monitoring, and a supplementary rule engine was added to handle them deterministically.
Monotonicity violation. The agent exhibited a non-monotonic relationship between building age and risk for a narrow age range (35–42 years). Investigation revealed that the training data contained a cohort of recently renovated buildings in this age range with unusually favorable loss history, causing the model to associate this age range with lower risk. This was corrected by adding a monotonicity constraint to the model training.
Interaction preservation. The lowest individual interaction score (0.61) was between loss frequency and loss severity patterns. The expert treats high-frequency/low-severity losses differently from low-frequency/high-severity losses (attritional vs. shock loss profiles), and the agent partially collapsed this distinction. A targeted feature engineering effort improved the interaction score to 0.78 in a subsequent model version.
9.4 Decision Tree Equivalence
The canonical decision tree comparison yielded a structural equivalence score of 0.972 (NTED = 0.028). The subtree preservation analysis identified three subtrees with preservation scores below 0.80:
1. Water damage sublimits subtree (0.74): The expert's logic for evaluating water damage sublimits involves a four-way interaction between building age, plumbing condition, claim history, and deductible structure. The agent's surrogate tree collapsed this into a two-way interaction, losing the plumbing condition and deductible structure factors.
2. Protective safeguards subtree (0.77): The expert evaluates fire protection systems, burglar alarm systems, and sprinkler types as an interrelated cluster. The agent treated them more independently, missing the joint effect of inadequate sprinklers combined with no burglar alarm in high-value-content buildings.
3. Management quality subtree (0.68): Consistent with the predicate-level analysis, the agent's representation of management quality signals was the weakest structural element.
These three subtrees accounted for 83% of the total tree edit distance. The remaining 96% of the expert's decision tree was preserved with high fidelity.
9.5 Responsibility Chain Analysis
Chain completeness was verified at 100%: every agent decision on the 1,200-submission test set was traceable to at least one expert rule in the provenance graph. The distribution of chain strength was:
- ChainStrength > 0.90: 78% of decisions (strong expert derivation)
- ChainStrength 0.70–0.90: 17% of decisions (moderate derivation, acceptable)
- ChainStrength 0.50–0.70: 4% of decisions (weak derivation, flagged for review)
- ChainStrength < 0.50: 1% of decisions (very weak, routed to human)
The 5% of decisions with ChainStrength below 0.70 were analyzed individually. Most involved submissions with unusual characteristics (mixed-use occupancies, historical buildings with unique construction, or risks in geographic areas with sparse training data). These were precisely the submissions that the expert would have spent the most time deliberating on, confirming that the chain strength metric correctly identifies the cases where expert judgment is most needed.
9.6 Drift Detection in Production
The agent was deployed in production with the drift detection framework active. Over the first 90 days of operation:
- Days 1–45: No drift detected. I(t) fluctuated within the expected range (0.94–0.96), consistent with sampling noise. No alerts triggered.
- Day 46–52: A Level 1 (Yellow) alert was triggered. Investigation revealed that a wave of submissions from a new geographic territory (the carrier had expanded into coastal Texas) was causing the agent to encounter risk profiles outside its training distribution. The ΔI was 0.03, concentrated in the geographic exposure predicates. The alert was resolved by adding coastal Texas-specific risk factors to the feature set and retraining the model.
- Days 53–90: Post-correction, the Inheritance Function stabilized at 0.95, slightly above the deployment baseline. The drift detection system confirmed that the correction had improved, not just restored, logic preservation.
The 48-hour drift detection latency was confirmed: the yellow alert at Day 46 was triggered within 36 hours of the first significant coastal Texas submissions entering the pipeline.
10. Benchmarks
10.1 Benchmark Methodology
All benchmarks were conducted on the commercial property underwriting case study described in Section 9. The test set consisted of 1,200 submissions held out from the training data, with expert decisions provided by the primary expert and validated by two additional senior underwriters (inter-rater agreement κ = 0.87).
10.2 Logic Preservation Benchmark
| Measurement | Value | Methodology |
|---|---|---|
| Inheritance Function I(f_expert, f_agent) | 0.947 | Weighted sum of 127 predicate preservation scores |
| Predicate Coverage (PC) | 95.3% (121/127) | Activation threshold θ = 0.05 |
| Boundary Fidelity (BF) | 97.8% | Averaged over 89 threshold predicates |
| Interaction Preservation (IP) | 84.1% | Averaged over 47 identified expert interactions |
| Monotonicity Consistency (MC) | 98.5% | 2 of 134 monotonic relationships violated (pre-correction) |
| Exception Handling Completeness (EHC) | 100% (34/34) | All hard exception rules correctly enforced |
| Confidence Calibration (CC) | 91.2% | ECE = 0.088 across 10 calibration bins |
For comparison, a standard accuracy-only evaluation on the same test set showed 93.8% overall agreement with expert decisions. The Inheritance Function reveals that this 93.8% agreement masks significant logic gaps in management quality assessment (52% minimum preservation) and multi-factor interactions (61% minimum preservation). An organization relying solely on accuracy would have deployed the agent without awareness of these gaps.
10.3 Decision Tree Equivalence Benchmark
| Measurement | Value | Methodology |
|---|---|---|
| Structural Equivalence (1 - NTED) | 97.2% | Normalized tree edit distance on canonical trees |
| Subtrees Fully Preserved (score > 0.90) | 89% (31/35 major subtrees) | Subtree-level TED comparison |
| Subtrees Partially Preserved (0.70–0.90) | 8.6% (3/35) | Including water damage, safeguards, management |
| Subtrees Lost (< 0.70) | 2.9% (1/35) | Management quality subtree only |
| Surrogate Fidelity | 96.1% | Agreement between surrogate tree and full XGBoost model |
10.4 Drift Detection Benchmark
| Measurement | Value | Methodology |
|---|---|---|
| Detection Latency (5-point drop) | < 48 hours | Page-Hinkley test with δ = 0.02, λ = 5.0 |
| False Alarm Rate | < 1 per quarter | Over 90-day observation period |
| True Positive Rate | 100% (2/2 genuine drift events) | Including geographic expansion event at Day 46 |
| Root Cause Attribution Accuracy | 100% | Both events correctly attributed (distribution shift) |
| Mean Time to Remediation | 6 days | From alert to corrected model deployment |
10.5 Responsibility Chain Benchmark
| Measurement | Value | Methodology |
|---|---|---|
| Chain Completeness | 100% (1,200/1,200) | Every decision traceable to ≥ 1 expert rule |
| Strong Chains (strength > 0.90) | 78% | 936 of 1,200 decisions |
| Moderate Chains (0.70–0.90) | 17% | 204 of 1,200 decisions |
| Weak Chains (< 0.70) | 5% | 60 of 1,200 decisions (flagged for review) |
| Avg Expert Rules per Decision | 4.3 | Average number of activated expert predicates |
| Provenance Graph Query Latency | < 50ms | P99 for backward traceability query |
11. Future Directions
11.1 Multi-Expert Inheritance
The current framework assumes a single primary expert. In practice, underwriting teams consist of multiple experts with overlapping but non-identical judgment patterns. Future work should extend the Inheritance Function to handle multiple expert sources:
This simple average may not be appropriate when experts disagree. A more sophisticated formulation would identify consensus predicates (where all experts agree) and contested predicates (where experts disagree), and require the agent to preserve consensus logic while flagging contested areas for human deliberation. The consensus-contested decomposition would integrate with the MARIA OS responsibility gate system to route contested decisions to the appropriate review panel.
11.2 Cross-Line Inheritance
Insurance carriers often have underwriting expertise that crosses product lines. A commercial property underwriter's knowledge of fire protection systems is relevant to inland marine, builders risk, and even some liability exposures. The inheritance framework could be extended to identify cross-line predicate reuse opportunities:
Predicates in V_shared represent core underwriting knowledge that should be consistently applied across product lines. The inheritance framework would verify that AI agents in different product lines maintain consistent treatment of shared predicates, preventing the fragmentation of expert knowledge into siloed models.
11.3 Adversarial Robustness of Inherited Logic
The current framework does not specifically address adversarial attacks on the inheritance mechanism. A sophisticated adversary (e.g., a broker systematically submitting risks designed to exploit gaps in the agent's logic) could degrade the effective inheritance score by shifting the decision boundary in regions where the agent's logic preservation is weakest. Future work should develop adversarial testing protocols that probe the agent's logic preservation under adversarial submission patterns and harden the inheritance mechanism against targeted exploitation.
11.4 Automated Predicate Discovery
The predicate vocabulary extraction process is labor-intensive (the case study required 16+ hours of expert interviews). Future work should explore automated predicate discovery from expert decision data using symbolic regression, program synthesis, or concept bottleneck models. The goal is to automatically extract the predicate vocabulary from historical decisions, reducing the dependency on explicit expert elicitation while maintaining the formal structure needed for inheritance verification.
11.5 Continuous Expert-Agent Co-Evolution
Rather than treating expert knowledge as a static artifact to be preserved, a more dynamic framework would model the expert and agent as co-evolving systems. The agent learns from the expert's logic, and the expert can update their judgment based on patterns discovered by the agent. The inheritance framework would need to track this co-evolution, distinguishing between legitimate knowledge updates (where the expert revises their judgment based on new evidence) and undesirable contamination (where the expert unconsciously adopts the agent's biases).
This co-evolutionary model connects to the broader MARIA OS vision of graduated autonomy: as the agent's inheritance score increases and drift decreases over time, the system can grant more autonomy. As the agent develops novel valid patterns (confirmed through back-testing and expert review), these can be formally incorporated into the predicate vocabulary, expanding the expert's effective knowledge base through human-AI collaboration.
12. Conclusion
This paper has introduced a formal framework for verifying and maintaining the preservation of expert underwriting logic in AI agents. The key contributions are:
- The Decision Predicate Decomposition that represents expert reasoning as a structured vocabulary of atomic logical conditions, extracted through a three-method protocol (structured elicitation, decision tree induction, and counterfactual probing).
- The Inheritance Function I(f_expert, f_agent) that measures logic preservation as a weighted sum of predicate-level preservation scores, decomposed into activation, boundary, and interaction components. This function provides a structural measure of knowledge transfer fidelity that is fundamentally different from — and more informative than — output accuracy.
- Decision Tree Equivalence Testing via normalized tree edit distance on canonical representations, supplemented by subtree preservation analysis that identifies exactly which reasoning substructures are preserved and which have degraded.
- Six Logic Preservation Metrics (PC, BF, IP, MC, EHC, CC) that provide independently measurable, diagnostically actionable assessments of different aspects of logic preservation, with a composite Logic Preservation Score governing deployment autonomy levels.
- The Responsibility Provenance Graph that traces every agent decision back to its originating expert rules with temporal versioning, providing the complete audit trail required for insurance regulatory compliance.
- A Continuous Drift Detection Framework based on sequential hypothesis testing (Page-Hinkley) that detects logic degradation within 48 hours, with graduated response protocols that automatically adjust agent autonomy when preservation scores decline.
Applied to commercial property underwriting, the framework achieved a 94.7% logic preservation score, 97.2% decision tree structural equivalence, and 100% responsibility chain coverage. These results demonstrate that expert knowledge preservation is not merely a model accuracy problem — it is a governance problem that requires formal verification, continuous monitoring, and structured human oversight.
The framework integrates with MARIA OS through the decision pipeline, responsibility gate system, and governance dashboard, demonstrating that insurance underwriting governance can be automated without sacrificing the accumulated judgment of expert underwriters. The key insight is that automating underwriting does not mean replacing expert judgment — it means formally inheriting it, continuously verifying it, and maintaining a clear responsibility chain from the originating expert to every agent decision.
The future of AI in insurance underwriting is not about building agents that are smarter than human experts. It is about building agents that are provably faithful to human expertise while operating at machine scale. The Underwriting Responsibility Inheritance framework is our contribution to that future.
References
[1] Klein, G. (1998). Sources of Power: How People Make Decisions. MIT Press.
[2] Hoffman, R. R., Crandall, B., & Shadbolt, N. (1998). Use of the Critical Decision Method to Elicit Expert Knowledge: A Case Study in the Methodology of Cognitive Task Analysis. Human Factors, 40(2), 254–276.
[3] Zhang, K., & Shasha, D. (1989). Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM Journal on Computing, 18(6), 1245–1262.
[4] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?" Explaining the Predictions of Any Classifier. Proceedings of KDD, 1135–1144.
[5] Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2nd ed. christophm.github.io/interpretable-ml-book.
[6] Koh, P. W., & Liang, P. (2017). Understanding Black-box Predictions via Influence Functions. Proceedings of ICML, 1885–1894.
[7] Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30.
[8] Page, E. S. (1954). Continuous Inspection Schemes. Biometrika, 41(1–2), 100–115.
[9] National Association of Insurance Commissioners (NAIC). (2024). Model Audit Rule: Requirements for Insurance Company Governance. NAIC Model Laws.
[10] European Insurance and Occupational Pensions Authority (EIOPA). (2025). Governance and Risk Management of Artificial Intelligence in Insurance. EIOPA Discussion Paper.
[11] Society of Actuaries. (2024). Predictive Analytics and Machine Learning in Insurance Underwriting: Practice Standards. SOA Research Report.
[12] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of KDD, 785–794.
[13] Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., & Sayres, R. (2018). Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). Proceedings of ICML, 2668–2677.
[14] Friedman, J. H., & Popescu, B. E. (2008). Predictive Learning via Rule Ensembles. The Annals of Applied Statistics, 2(3), 916–954.