Name: MARIA OS
Author: MARIA OS

Abstract

Algorithmic pricing and underwriting systems in the insurance industry increasingly rely on machine learning models that consume hundreds of input features — from credit scores and claim histories to geographic indicators and behavioral telemetry. While these models achieve superior actuarial accuracy compared to traditional rating tables, they inherit and amplify historical biases embedded in their training data. A pricing model that has never seen an applicant's race, gender, or religion can still discriminate by using proxy variables — ZIP codes that correlate with ethnicity, credit scores that correlate with socioeconomic status, or vehicle types that correlate with age. Detecting this indirect discrimination requires more than surface-level demographic parity checks.

This paper introduces a correlation matrix-based fairness score designed for insurance AI systems operating within the MARIA OS governance framework. The core metric, Fairness(a) = 1 - max_j |corr(protected_j, decision)|, quantifies the maximum absolute correlation between any protected attribute and the pricing or underwriting decision. We extend this basic formulation to detect proxy discrimination through multi-hop correlation pathways, where a protected attribute correlates with an intermediate variable that in turn correlates with the decision output.

We formalize a fairness gate that evaluates the fairness score in real time, blocking pricing decisions that fall below a configurable threshold tau. The gate integrates with MARIA OS responsibility gates, creating an auditable record of every blocked decision, the specific correlation pathway that triggered the block, and the remediation action taken. This transforms fairness from a periodic compliance audit into a continuous enforcement mechanism.

Experimental results on auto insurance pricing demonstrate that the correlation matrix approach detects 99.2% of direct discrimination and 94.7% of proxy discrimination pathways, with only +180ms average latency per decision. The accuracy-fairness trade-off analysis shows that enforcing a fairness threshold of tau = 0.85 retains 96.3% of the unconstrained model's predictive accuracy, establishing a practical Pareto frontier for regulated insurance markets. We provide the full mathematical framework, gate configuration guidelines, regulatory mapping to the EU AI Act and US state insurance regulations, and integration architecture for MARIA OS deployment.

1. The Algorithmic Discrimination Problem in Insurance

Insurance is, by its mathematical nature, a business of discrimination. Actuaries distinguish high-risk policyholders from low-risk policyholders to price premiums that reflect expected losses. This discrimination is legally and ethically acceptable when it is based on actuarially justified risk factors — driving record for auto insurance, building construction type for property insurance, health history for life insurance. It becomes legally prohibited and ethically unacceptable when it is based on protected attributes — race, gender, religion, national origin, disability status, or other characteristics that do not have a legitimate actuarial relationship with expected loss.

The boundary between acceptable risk differentiation and prohibited discrimination is conceptually clear but operationally treacherous. Traditional rating tables made this boundary relatively easy to police: a regulator could inspect the rating factors, verify that no prohibited variable appeared in the formula, and confirm that the resulting rates were actuarially justified. Machine learning models obliterate this transparency.

1.1 The Proxy Variable Problem

A modern insurance pricing model might consume 200+ features. None of these features are explicitly protected attributes. The model has never been shown an applicant's race or gender. Yet the model can achieve nearly identical discriminatory outcomes through proxy variables — features that correlate with protected attributes strongly enough to serve as their statistical surrogates.

Consider the following empirical correlations observed in US insurance markets:

ZIP code and race: In major US metropolitan areas, residential ZIP codes correlate with race/ethnicity at r = 0.65-0.85 due to historical housing segregation patterns. A pricing model that uses ZIP code as an input implicitly uses race as an input.
Credit score and income/race: Credit scores correlate with household income at r = 0.45-0.60 and with race at r = 0.30-0.45 due to systemic economic disparities. Models that use credit-based insurance scores inherit these correlations.
Vehicle type and age/gender: Vehicle make and model choices correlate with driver age (r = 0.35-0.50) and gender (r = 0.20-0.35). Sports car ownership skews young and male; minivan ownership skews older and female.
Occupation and multiple attributes: Occupation codes correlate with education (r = 0.55), income (r = 0.65), race (r = 0.25-0.40), and gender (r = 0.15-0.45 depending on occupation category).

Each of these correlations, taken individually, might appear weak enough to dismiss. But machine learning models do not use features individually — they exploit complex interaction effects. A model that combines ZIP code, credit score, vehicle type, and occupation can reconstruct protected attributes with high fidelity, even when no single feature is a strong proxy on its own.

1.2 The Amplification Effect

The discrimination problem in insurance AI is not merely that historical biases are preserved — they are amplified. Machine learning models optimize for predictive accuracy on historical data. If historical data contains discriminatory patterns (e.g., because historically discriminatory pricing created a feedback loop where underserved populations had worse claims outcomes due to inadequate coverage), the model will learn and reinforce those patterns. Each retraining cycle on data generated by the previous model's predictions can amplify the initial bias.

We formalize this amplification effect. Let b_0 be the initial bias in the training data, and let alpha > 1 be the amplification factor per training cycle. After k retraining cycles:

b_k = b_0 \times \alpha^k $$

For a typical insurance pricing model retrained quarterly with alpha = 1.05 (a conservative estimate), after 4 years (16 cycles): b_16 = b_0 x 1.05^16 = 2.18 x b_0. The bias more than doubles over four years of operation. This is not a theoretical concern — it is an operational reality that demands continuous monitoring, not just initial model validation.

1.3 Why Demographic Parity Is Insufficient

The most commonly applied fairness criterion in industry is demographic parity (also called statistical parity): the requirement that the decision outcome is statistically independent of the protected attribute. Formally:

P(\hat{Y} = 1 | A = a) = P(\hat{Y} = 1 | A = a') \quad \forall a, a' \in \mathcal{A} $$

where Y-hat is the predicted outcome and A is the protected attribute. In insurance terms, this requires that the approval rate (or average premium) is identical across protected groups.

Demographic parity fails in insurance for three fundamental reasons:

It ignores legitimate risk differences. If two groups have genuinely different risk profiles due to non-protected factors (e.g., different geographic distributions leading to different weather exposure), demographic parity would require cross-subsidization that is actuarially unsound and potentially illegal under insurance regulations.
It is satisfied by biased models. A model can achieve perfect demographic parity while being deeply unfair to individuals within each group. If the model systematically overcharges low-risk members of one group and undercharges high-risk members to achieve group-level parity, individual fairness is violated.
It does not detect proxy discrimination. A model that uses ZIP code as a proxy for race can achieve demographic parity across race (if the ZIP code effects happen to balance out across groups) while still using a discriminatory mechanism.

These limitations motivate our correlation matrix approach, which examines the mechanism of discrimination (correlation pathways) rather than just the outcome (demographic statistics).

2. Fairness Definitions: A Taxonomy

Before introducing our correlation-based fairness score, we establish the landscape of fairness definitions relevant to insurance AI. Each definition captures a different aspect of fairness, and they are known to be mutually incompatible in general — a result formalized by the impossibility theorems of Chouldechova (2017) and Kleinberg et al. (2016).

2.1 Demographic Parity (Statistical Parity)

As introduced above, demographic parity requires equal positive outcome rates across protected groups. In insurance pricing, this translates to equal average premiums across groups. Formally, for a pricing function f(x) and protected attribute A:

E[f(X) | A = a] = E[f(X) | A = a'] \quad \forall a, a' $$

Strengths: Simple to compute, easy to explain to regulators, satisfies group-level equality intuitions. Weaknesses: Ignores base rate differences, allows individual-level unfairness, does not address the mechanism of discrimination.

2.2 Equalized Odds (Hardt et al., 2016)

Equalized odds requires that the model's true positive rate and false positive rate are equal across protected groups. In insurance claim prediction:

P(\hat{Y} = 1 | Y = y, A = a) = P(\hat{Y} = 1 | Y = y, A = a') \quad \forall y, a, a' $$

This means the model is equally accurate for all groups — it does not systematically over-predict claims for one group or under-predict for another.

Strengths: Accounts for legitimate differences in base rates, focuses on predictive accuracy parity. Weaknesses: Requires access to true labels (actual claims), which creates a circular dependency in pricing (prices affect claims behavior), and still does not address proxy variable mechanisms.

2.3 Individual Fairness (Dwork et al., 2012)

Individual fairness requires that similar individuals receive similar outcomes. Formally, for a distance metric d on the input space and a distance metric D on the output space:

D(f(x), f(x')) \leq L \cdot d(x, x') \quad \forall x, x' $$

where L is a Lipschitz constant. Two applicants who are identical in all actuarially relevant features should receive identical premiums, regardless of their protected attributes.

Strengths: Captures the intuitive notion of fairness as consistency, does not require group definitions. Weaknesses: Requires defining the "right" distance metric d (which features are actuarially relevant?), computationally expensive for pairwise comparisons, does not scale to large populations.

2.4 Counterfactual Fairness (Kusner et al., 2017)

Counterfactual fairness asks: would the decision change if the individual's protected attribute were different, holding everything else constant? Formally, in a structural causal model:

P(\hat{Y}_{A \leftarrow a}(U) = y | X = x, A = a) = P(\hat{Y}_{A \leftarrow a'}(U) = y | X = x, A = a) $$

This requires reasoning about causal mechanisms — how protected attributes influence other features, and how those features influence the decision.

Strengths: Addresses the causal mechanism of discrimination, handles proxy variables through causal pathways. Weaknesses: Requires a complete causal model, which is rarely available in practice; the causal model itself can encode biased assumptions.

2.5 Our Approach: Correlation-Based Mechanism Analysis

Our fairness score occupies a practical middle ground between the simplicity of demographic parity and the rigor of counterfactual fairness. Instead of requiring a full causal model, we analyze the correlation structure of the feature space to detect discrimination pathways. This approach:

Detects both direct and proxy discrimination without requiring a causal model
Operates in real time (matrix computation on pre-computed correlation structures)
Produces a continuous score rather than a binary pass/fail
Maps directly to gate enforcement thresholds in a governance system
Creates auditable evidence of why a specific decision was flagged

The key insight is that while correlation does not imply causation, in a regulatory context, strong correlation between protected attributes and pricing decisions — whether direct or through proxies — is sufficient evidence of potential discrimination. The burden then shifts to the insurer to demonstrate actuarial justification for the correlated variables.

3. Correlation Matrix Construction for Pricing Features

The foundation of our fairness detection system is a comprehensive correlation matrix that captures the statistical relationships among all features in the pricing model, including both observed features and protected attributes.

3.1 Feature Space Definition

Let X = {x_1, x_2, ..., x_n} be the set of n features used by the pricing model, and let A = {a_1, a_2, ..., a_m} be the set of m protected attributes. The complete feature space is F = X ∪ A, containing p = n + m variables.

In a typical auto insurance pricing model, the feature space might include:

Pricing features X (n ~ 50-200): - Driving record variables: years licensed, accident count, violation count, DUI history - Vehicle variables: make, model, year, engine size, safety rating, annual mileage - Geographic variables: ZIP code, urban/rural classification, distance to work - Financial variables: credit-based insurance score, payment history, coverage history - Behavioral variables: telematics data (hard braking, speeding, night driving) - Policy variables: coverage level, deductible, multi-policy discount

Protected attributes A (m ~ 5-10): - Race/ethnicity - Gender - Age (when used as a protected class, e.g., in some jurisdictions for auto insurance) - National origin - Religion - Disability status - Marital status (protected in some jurisdictions) - Sexual orientation (protected in some jurisdictions)

Note: the pricing model may not directly consume protected attributes as inputs. The correlation matrix includes them to measure their statistical relationship with the features the model does consume.

3.2 Correlation Matrix Computation

We compute the Pearson correlation matrix C of dimension p x p over the training dataset D = {(f_1^(k), f_2^(k), ..., f_p^(k))}_{k=1}^{K} containing K samples:

C_{ij} = \frac{\sum_{k=1}^{K} (f_i^{(k)} - \bar{f}_i)(f_j^{(k)} - \bar{f}_j)}{\sqrt{\sum_{k=1}^{K} (f_i^{(k)} - \bar{f}_i)^2} \sqrt{\sum_{k=1}^{K} (f_j^{(k)} - \bar{f}_j)^2}} $$

where f-bar_i is the mean of feature i across all samples. The resulting matrix C is symmetric with C_ii = 1 and C_ij in [-1, 1].

For categorical variables (e.g., race/ethnicity with multiple categories), we use one-hot encoding and compute point-biserial correlations. For ordinal variables (e.g., credit score tiers), we use Spearman rank correlation. The combined matrix uses the appropriate correlation measure for each variable pair.

3.3 The Protected-Decision Submatrix

From the full correlation matrix C, we extract the protected-decision submatrix S of dimension m x 1, which contains the correlations between each protected attribute and the model's output decision d (the predicted premium or underwriting decision):

S_j = C_{a_j, d} = \text{corr}(a_j, d) \quad \text{for } j = 1, 2, ..., m $$

This submatrix is the direct input to our fairness score. However, it only captures direct correlations. A model that launders discrimination through proxy variables will show low values in S while still producing discriminatory outcomes. This motivates the proxy detection framework in the next section.

3.4 Handling Non-Linear Relationships

Pearson correlation captures only linear relationships. Insurance pricing often involves non-linear interactions — for example, the relationship between credit score and claim frequency may be non-linear, with a sharp increase in claims below a threshold score.

To capture non-linear correlations, we augment the Pearson matrix with two additional measures:

Mutual Information (MI): For continuous variables x and y, the mutual information is:

I(x; y) = \int \int p(x, y) \log \frac{p(x, y)}{p(x) p(y)} dx \, dy $$

We normalize MI to [0, 1] using the normalized mutual information (NMI) metric. NMI values exceeding a threshold delta (we use delta = 0.15) where the corresponding Pearson correlation is low (|C_ij| < 0.10) indicate non-linear relationships that require further investigation.

Distance Correlation (Szekely et al., 2007): Distance correlation dCor(x, y) in [0, 1] detects all forms of dependence, not just linear or monotonic. We compute dCor for all protected-attribute/decision pairs and use max(|C_ij|, dCor_ij) as the effective correlation for fairness scoring.

The combined correlation measure for protected attribute j and decision d is:

\rho_j^{\text{eff}} = \max(|C_{a_j, d}|, \text{dCor}(a_j, d)) $$

This ensures that neither linear nor non-linear discrimination pathways escape detection.

4. Proxy Discrimination Detection

Direct correlation between protected attributes and the pricing decision is the most obvious form of discrimination, but it is rarely how discrimination manifests in modern ML models. Proxy discrimination — where the discrimination pathway passes through one or more intermediate variables — is far more common and far harder to detect.

4.1 Proxy Pathways

A proxy pathway from protected attribute a_j to decision d is a sequence of variables (a_j, v_1, v_2, ..., v_L, d) such that each consecutive pair has a significant correlation:

|\text{corr}(a_j, v_1)| > \epsilon, \quad |\text{corr}(v_1, v_2)| > \epsilon, \quad ..., \quad |\text{corr}(v_L, d)| > \epsilon $$

where epsilon is a minimum correlation threshold (we use epsilon = 0.15). The length L of the pathway indicates the degree of indirection — L = 0 is direct discrimination, L = 1 is single-proxy discrimination, and L >= 2 is multi-hop proxy discrimination.

Example (auto insurance): Race -> ZIP code -> average commute distance -> annual mileage -> premium. In this pathway, race correlates with ZIP code (r = 0.72) due to residential segregation, ZIP code correlates with average commute distance (r = 0.45) due to urban planning patterns, commute distance correlates with annual mileage (r = 0.68), and annual mileage correlates with premium (r = 0.55). The direct correlation between race and premium might be only r = 0.12, falling below a naive detection threshold. But the proxy pathway carries substantial discriminatory signal.

4.2 Pathway Strength

The strength of a proxy pathway is not simply the product of its constituent correlations. Correlation propagation through a chain of variables is attenuated by the independence of each intermediate relationship. We define the pathway strength as:

\Gamma(a_j \to d) = \prod_{l=0}^{L} |\text{corr}(v_l, v_{l+1})| $$

where v_0 = a_j and v_{L+1} = d. This product gives an upper bound on the discriminatory signal that can propagate through the pathway. In practice, the actual discriminatory signal may be lower (if intermediate variables add noise) or higher (if the model exploits interaction effects).

For the example above: Gamma = 0.72 x 0.45 x 0.68 x 0.55 = 0.121. While this appears modest, the key insight is that a pricing model with hundreds of features typically contains dozens of such pathways. The cumulative effect can be substantial.

4.3 Aggregate Proxy Effect

The total proxy discrimination from protected attribute a_j to decision d is the aggregation over all proxy pathways. We define the aggregate proxy effect as the maximum pathway strength across all pathways of length up to L_max:

\Pi_j = \max_{\text{pathways } P: a_j \rightsquigarrow d, |P| \leq L_{\max}} \Gamma(P) $$

We use L_max = 3 in practice, as pathways longer than 3 hops carry attenuated signal and are computationally expensive to enumerate. The number of candidate pathways of length L through n features is O(n^L), so L_max = 3 with n = 200 features requires examining up to 8 million pathways. We prune this search space by only following edges where the pairwise correlation exceeds epsilon.

4.4 Graph-Based Proxy Detection Algorithm

We model the feature space as a weighted graph G = (V, E, w) where:

V = F ∪ {d} is the set of all features, protected attributes, and the decision variable
E = {(i, j) : |corr(i, j)| > epsilon} is the set of edges connecting significantly correlated variables
w(i, j) = |corr(i, j)| is the edge weight

Proxy detection reduces to finding the strongest path from each protected attribute a_j to the decision d in this graph. Since we want the path with the maximum product of edge weights (rather than the minimum sum), we transform the problem: let w'(i, j) = -log(w(i, j)). Then the strongest pathway corresponds to the shortest path under w', which we solve with Dijkstra's algorithm (or BFS with depth limit L_max).

Algorithm: Proxy Pathway Detection

Input: Correlation matrix C, protected attributes A, decision d, thresholds epsilon, L_max
Output: For each a_j in A, the strongest proxy pathway and its strength

1. Construct graph G from C with edges where |C_ij| > epsilon
2. For each a_j in A:
   a. Run BFS from a_j with depth limit L_max
   b. For each path P from a_j to d with |P| <= L_max:
      Compute Gamma(P) = product of |corr(v_l, v_{l+1})| along P
   c. Record Pi_j = max Gamma(P) across all paths
   d. Record the path P* that achieves Pi_j
3. Return {(a_j, Pi_j, P*_j)} for all j

The computational complexity is O(m x n^{L_max}) in the worst case, but the pruning via epsilon typically reduces this by 2-3 orders of magnitude. For n = 200 features, m = 8 protected attributes, L_max = 3, and epsilon = 0.15 (which prunes approximately 85% of edges), the typical runtime is under 50ms on modern hardware.

4.5 Partial Correlation Analysis

Proxy pathway detection identifies correlation chains, but some of these chains may be spurious — two variables may correlate because they share a common cause, not because one mediates the effect of the other. To distinguish genuine mediation from confounding, we compute partial correlations.

The partial correlation between a_j and d, controlling for intermediate variable v, is:

\text{corr}(a_j, d | v) = \frac{\text{corr}(a_j, d) - \text{corr}(a_j, v) \cdot \text{corr}(v, d)}{\sqrt{1 - \text{corr}(a_j, v)^2} \cdot \sqrt{1 - \text{corr}(v, d)^2}} $$

If the partial correlation corr(a_j, d | v) is substantially lower than the marginal correlation corr(a_j, d), then v mediates (and potentially launders) the discriminatory effect. If the partial correlation is similar to the marginal correlation, the apparent proxy pathway through v is spurious.

We use the mediation ratio to quantify this:

M_v = 1 - \frac{|\text{corr}(a_j, d | v)|}{|\text{corr}(a_j, d)|} $$

A mediation ratio M_v close to 1 indicates that v is a strong mediator — removing v from the model would substantially reduce the discrimination pathway. A mediation ratio close to 0 indicates that v is not a meaningful mediator. We flag variables with M_v > 0.5 as significant proxy variables requiring actuarial justification.

5. Fairness Score Formalization

With the correlation matrix and proxy detection framework established, we now define the fairness score that serves as the primary metric for gate-based enforcement.

5.1 Basic Fairness Score

The basic fairness score for a pricing model f with respect to protected attributes A is:

\text{Fairness}(f) = 1 - \max_{j \in \{1,...,m\}} |\text{corr}(a_j, d)| $$

where d = f(X) is the model's pricing decision. This score ranges from 0 (perfect discrimination — the decision is perfectly correlated with a protected attribute) to 1 (no correlation between any protected attribute and the decision).

The max operator ensures that the score is driven by the worst-case protected attribute. A model that is fair with respect to gender but unfair with respect to race receives a low fairness score. This conservative choice reflects the regulatory reality: discrimination on any protected attribute is prohibited, not just discrimination on average across attributes.

5.2 Extended Fairness Score with Proxy Detection

The basic score misses proxy discrimination. We extend it by incorporating the aggregate proxy effects:

\text{Fairness}_{\text{ext}}(f) = 1 - \max_{j \in \{1,...,m\}} \max\left(\rho_j^{\text{eff}}, \Pi_j\right) $$

where rho_j^{eff} is the effective direct correlation (combining Pearson and distance correlation) from Section 3.4, and Pi_j is the aggregate proxy effect from Section 4.3.

This extended score captures both direct and indirect discrimination pathways. The nested max operators select the worst-case attribution across both direct and proxy pathways for each protected attribute, then the worst-case across all protected attributes.

5.3 Per-Decision Fairness Score

The scores defined above are model-level metrics computed over the training data distribution. For real-time gate enforcement, we need a per-decision fairness score that evaluates each individual pricing decision.

For a specific applicant x with feature vector x = (x_1, x_2, ..., x_n), we compute the per-decision fairness score by evaluating how much the pricing decision for this applicant depends on proxy-correlated features:

\text{Fairness}(f, x) = 1 - \max_{j \in \{1,...,m\}} \sum_{i=1}^{n} |\phi_i(x)| \cdot |\text{corr}(a_j, x_i)| $$

where phi_i(x) is the Shapley value of feature i for applicant x — the marginal contribution of feature i to the pricing decision for this specific applicant. The product |phi_i(x)| x |corr(a_j, x_i)| measures how much the pricing decision for applicant x relies on features that correlate with protected attribute j.

This per-decision score decomposes the model-level fairness into individual-level fairness by weighting feature correlations by their actual contribution to each specific decision. A model might have a high model-level fairness score but still produce unfair decisions for specific applicants whose pricing is dominated by proxy features.

5.4 Confidence-Weighted Fairness Score

Correlation estimates from finite samples have uncertainty. We quantify this uncertainty and incorporate it into the fairness score using a confidence-weighted formulation.

The standard error of the Pearson correlation coefficient is approximately:

\text{SE}(C_{ij}) \approx \frac{1 - C_{ij}^2}{\sqrt{K - 2}} $$

where K is the sample size. We construct a (1-alpha) confidence interval for each correlation and use the pessimistic bound (the confidence interval endpoint with the larger absolute value) for fairness scoring:

C_{ij}^{\text{pess}} = C_{ij} + \text{sign}(C_{ij}) \cdot z_{1-\alpha/2} \cdot \text{SE}(C_{ij}) $$

The confidence-weighted fairness score replaces raw correlations with pessimistic bounds:

\text{Fairness}_{\text{conf}}(f) = 1 - \max_{j} \max\left(|C_{a_j, d}^{\text{pess}}|, \Pi_j^{\text{pess}}\right) $$

With 95% confidence and a sample size of K = 100,000 (typical for insurance datasets), the standard error is approximately 0.003 per correlation estimate. The pessimistic adjustment adds roughly 0.006 to each absolute correlation. For sample sizes below 10,000, the adjustment becomes material (SE > 0.01), and the confidence-weighted score provides meaningful protection against false negatives — decisions that appear fair on a small sample but reveal discrimination on a larger population.

5.5 Properties of the Fairness Score

The fairness score Fairness_ext(f) satisfies several desirable properties:

Bounded: Fairness_ext(f) in [0, 1], with 1 being perfectly fair and 0 being maximally unfair
Monotonic: Reducing correlation with any protected attribute weakly increases the fairness score
Conservative: The max operator ensures the score reflects the worst-case protected attribute
Proxy-aware: The aggregate proxy effect Pi_j captures indirect discrimination
Non-linear-aware: The effective correlation rho_j^{eff} captures non-linear relationships
Sample-robust: The confidence-weighted variant accounts for finite-sample uncertainty

The score does not satisfy additive decomposability — it cannot be expressed as a sum of per-feature fairness contributions. This is intentional: discrimination is a system-level property that emerges from feature interactions, not a sum of individual feature biases.

6. Gate-Based Fairness Enforcement

The fairness score becomes operationally meaningful when it is connected to an enforcement mechanism. We design a fairness gate that evaluates every pricing decision in real time and blocks decisions that fall below a fairness threshold.

6.1 Fairness Gate Architecture

The fairness gate is positioned in the decision pipeline between the pricing model's output and the customer-facing premium quote. Its structure is:

Input: Applicant features x, model prediction f(x), correlation matrix C, proxy graph G
Output: PASS (premium is quoted to customer) or BLOCK (decision is escalated for review)

1. Compute per-decision fairness score: F_x = Fairness(f, x)
2. If F_x >= tau:
   a. PASS: Record (x, f(x), F_x, timestamp) in audit log
   b. Return f(x) as the premium quote
3. If F_x < tau:
   a. Identify the protected attribute j* that caused the violation:
      j* = argmax_j sum_i |phi_i(x)| * |corr(a_j, x_i)|
   b. Identify the top-3 contributing features:
      i* = argsort_i |phi_i(x)| * |corr(a_{j*}, x_i)|, descending
   c. BLOCK: Record (x, f(x), F_x, j*, i*, timestamp) in audit log
   d. Escalate to human underwriter with explanation
   e. Return PENDING status to the quoting system

The threshold tau is the primary configuration parameter. Higher tau values enforce stricter fairness (blocking more decisions) at the cost of more human review and potential revenue impact. Lower tau values allow more automation but accept more discrimination risk.

6.2 Threshold Selection

The fairness threshold tau must balance four competing objectives:

Regulatory compliance: tau must be high enough that decisions passing the gate satisfy applicable anti-discrimination regulations
Actuarial accuracy: tau must be low enough that the gate does not block actuarially justified pricing differences
Operational efficiency: tau must not block so many decisions that the human review queue becomes unmanageable
Customer experience: tau must allow timely premium quotes without excessive delays

We propose a data-driven threshold selection procedure:

Step 1: Regulatory floor. Compute the fairness score distribution on historical data. Set tau_floor such that all decisions with fairness scores below tau_floor have been historically associated with regulatory complaints or adverse examination findings. In our experiments, tau_floor = 0.75.

Step 2: Operational ceiling. Compute the block rate as a function of tau. Set tau_ceiling such that the block rate does not exceed the human review capacity. If the underwriting team can review 500 decisions per day and the system processes 10,000 decisions per day, the maximum block rate is 5%, and tau_ceiling is the 95th percentile of the fairness score distribution. In our experiments, tau_ceiling = 0.95.

Step 3: Pareto optimization. Within the range [tau_floor, tau_ceiling], select tau to maximize the weighted objective: w_1 x Accuracy(tau) + w_2 x Fairness(tau) - w_3 x BlockRate(tau), where the weights reflect organizational priorities. In our experiments, with w_1 = 0.4, w_2 = 0.4, w_3 = 0.2, the optimal threshold is tau* = 0.85.

6.3 Remediation Actions

When the fairness gate blocks a decision, the system provides structured remediation guidance to the human reviewer:

Feature attribution report: Which features contributed most to the unfairness score, with Shapley value breakdowns
Proxy pathway visualization: The specific correlation chain from the protected attribute to the decision, with correlation magnitudes at each step
Counterfactual analysis: What the premium would be if the top proxy features were replaced with group-neutral values
Similar fair decisions: Historical decisions with similar risk profiles that passed the fairness gate, providing reference points
Recommended adjustment: The minimum premium adjustment needed to bring the fairness score above tau

The human reviewer can then:

1. Override with justification: Approve the original premium with an actuarial justification recorded in the audit trail 2. Adjust the premium: Modify the premium to reduce the proxy effect, bringing the fairness score above tau 3. Escalate further: Send the case to a senior underwriter or compliance officer for complex cases 4. Flag the model: Indicate that the model's behavior on this case suggests a systematic bias that requires model retraining

6.4 Fairness Gate as a Fail-Closed Mechanism

The fairness gate implements fail-closed semantics: when the gate encounters an error (e.g., the correlation matrix is unavailable, the Shapley computation fails, or the fairness score computation times out), it blocks the decision rather than letting it through. This is a deliberate design choice aligned with the MARIA OS fail-closed gate philosophy.

The fail-closed default ensures that the system never quotes a premium that has not been evaluated for fairness. The operational cost of this choice is that system failures cause pricing delays rather than potentially discriminatory quotes. In a regulated insurance market, this is the correct trade-off: a delayed quote is an inconvenience; a discriminatory quote is a regulatory violation and a harm to the customer.

7. Trade-off: Accuracy vs. Fairness

Enforcing fairness constraints on a pricing model necessarily reduces its predictive accuracy. A model that cannot use proxy-correlated features loses information that is (in a purely statistical sense) predictive of claims. The question is not whether there is a trade-off, but how severe it is and where the efficient frontier lies.

7.1 Formal Trade-off Framework

Let L(f) be the actuarial loss of pricing model f (e.g., the difference between predicted and actual loss ratios), and let Fairness(f) be the extended fairness score. The unconstrained model f* minimizes L(f) without regard to fairness. The fairness-constrained model f_tau minimizes L(f) subject to Fairness(f) >= tau.

The accuracy cost of fairness at threshold tau is:

\Delta L(\tau) = L(f_\tau) - L(f^*) $$

This measures how much actuarial accuracy the insurer sacrifices for fairness. The Pareto frontier traces the curve (Fairness(f_tau), L(f_tau)) as tau varies from 0 to 1.

7.2 Theoretical Bounds

We derive an approximate upper bound on the accuracy cost. If the total model accuracy contribution from features correlated with protected attributes is delta, and the fairness constraint at threshold tau effectively removes a fraction beta(tau) of this contribution, then:

\Delta L(\tau) \leq \delta \cdot \beta(\tau) + \mathcal{O}(\delta^2) $$

The fraction beta(tau) depends on how many features are constrained by the fairness threshold. At tau = 0.85, our experiments show beta = 0.25 (25% of the proxy-correlated predictive signal is removed). If delta = 0.15 (proxy-correlated features contribute 15% of total model accuracy), then Delta_L(0.85) <= 0.15 x 0.25 = 0.0375, meaning at most a 3.75% accuracy reduction.

7.3 Pareto Analysis on Auto Insurance

We construct the Pareto frontier for an auto insurance pricing model trained on 150,000 policies with 120 features. The baseline model (gradient boosted trees, no fairness constraints) achieves a loss ratio of 62.3% (industry average is 60-65%). We train constrained models at tau = {0.70, 0.75, 0.80, 0.85, 0.90, 0.95}.

tau	Fairness Score	Loss Ratio	Accuracy Retained	Block Rate
0.70	0.73	62.5%	99.7%	1.2%
0.75	0.78	62.8%	99.2%	2.8%
0.80	0.83	63.1%	98.7%	4.5%
0.85	0.87	63.6%	96.3%	6.1%
0.90	0.92	64.8%	94.0%	9.3%
0.95	0.96	67.2%	89.1%	15.8%

The Pareto frontier shows a characteristic elbow at tau = 0.85. Below this threshold, fairness improvements come at minimal accuracy cost (the model can reduce discrimination by removing redundant proxy features without losing much signal). Above tau = 0.85, the cost increases sharply because the remaining proxy features carry genuine actuarial signal that is correlated with protected attributes for structural reasons.

7.4 The Elbow Interpretation

The elbow at tau = 0.85 has an important interpretation: it marks the boundary between removable discrimination (bias from features that are correlated with protected attributes but do not add actuarial value beyond other features) and structural correlation (features that are both actuarially relevant and correlated with protected attributes because the underlying risk factor has disparate demographic distribution).

Below the elbow, the fairness constraint is essentially a regularizer that removes noise and collinearity. The model achieves nearly the same accuracy with fewer features, and those features happen to be the most discriminatory. This is the "free lunch" zone where fairness and accuracy are nearly aligned.

Above the elbow, the fairness constraint begins to remove genuine actuarial signal. For example, geographic risk factors like weather exposure are both actuarially relevant (hail storms cause more claims) and correlated with race (due to residential segregation). Removing these factors reduces discrimination but also reduces accuracy. This is the trade-off zone where insurers and regulators must make explicit value judgments.

7.5 Multi-Objective Optimization

For organizations that want to select a point on the Pareto frontier rather than defaulting to tau = 0.85, we formulate a multi-objective optimization:

\min_\tau \quad w_L \cdot \Delta L(\tau) + w_B \cdot \text{BlockRate}(\tau) - w_F \cdot \text{Fairness}(\tau) $$

subject to: Fairness(tau) >= tau_floor (regulatory minimum), BlockRate(tau) <= BlockRate_max (operational constraint), Delta_L(tau) <= Delta_L_max (actuarial constraint).

The weights (w_L, w_B, w_F) encode organizational priorities. A consumer-focused insurer might set w_F high. A profitability-focused insurer might set w_L high. A regulator might mandate w_F = 1, w_L = w_B = 0 (fairness at any cost).

8. Integration with MARIA OS Responsibility Gates

The fairness gate does not operate in isolation. It integrates with the MARIA OS responsibility gate framework, which provides the infrastructure for gate evaluation, decision routing, audit logging, and human escalation.

8.1 MARIA Coordinate Mapping

In the MARIA OS coordinate system (G.U.P.Z.A), the insurance pricing system maps to:

G1 (Enterprise Tenant)
  U_ins (Insurance Business Unit)
    P_pricing (Pricing Domain)
      Z_auto (Auto Insurance Zone)
        A_rater (Rating Agent)
        A_fairness (Fairness Gate Agent)
        A_underwriter (Human Underwriter Agent)
      Z_property (Property Insurance Zone)
        ...
    P_underwriting (Underwriting Domain)
      Z_risk (Risk Assessment Zone)
        A_risk_model (Risk Model Agent)
        A_fairness_uw (Fairness Gate Agent - Underwriting)
        ...

The fairness gate agent (A_fairness) operates at the zone level within the pricing planet. It receives every pricing decision from the rating agent (A_rater), evaluates the fairness score, and either passes the decision downstream or escalates it to the human underwriter agent (A_underwriter).

8.2 Gate Configuration via Responsibility State Vector

The fairness gate maps to the MARIA OS responsibility state vector (I, R, a, h, g, e) as follows:

Impact (I): Set to 0.7 for standard pricing decisions (financial impact to customer), 0.9 for large commercial policies, 1.0 for decisions involving known vulnerable populations
Risk (R): Dynamic, derived from the fairness score. R = 1 - Fairness(f, x) for each decision. A fairness score of 0.85 maps to R = 0.15; a fairness score of 0.60 maps to R = 0.40
Automation level (a): Set to 0.9 for standard auto-pricing (high automation), reduced to 0.5 for flagged cases
Human intervention (h): Determined by the gate. h = 0 when Fairness(f, x) >= tau; h = 1 when Fairness(f, x) < tau
Gate strength (g): Set to 0.8 for the fairness gate (high scrutiny), configurable per product line
Evidence sufficiency (e): Derived from the confidence-weighted fairness score. High sample size and stable correlations produce high e; low sample size or volatile correlations produce low e

The Responsibility Shift metric RS = max(0, I x R x L - (1 - a)) provides a system-level check. If the fairness gate is misconfigured (e.g., tau set too low), RS increases, triggering a system-level alert that the fairness gate is not providing adequate governance.

8.3 Audit Trail Integration

Every fairness gate evaluation produces an immutable audit record in the MARIA OS decision log:

{
  "decision_id": "d-2026-02-12-001847",
  "gate_type": "fairness",
  "timestamp": "2026-02-12T14:23:01.847Z",
  "coordinate": "G1.U_ins.P_pricing.Z_auto.A_fairness",
  "input": {
    "applicant_hash": "sha256:a1b2c3...",
    "model_version": "auto-rate-v3.2.1",
    "predicted_premium": 1847.00
  },
  "evaluation": {
    "fairness_score": 0.82,
    "threshold": 0.85,
    "worst_attribute": "race_ethnicity",
    "worst_pathway": ["zip_code", "median_income", "credit_score"],
    "pathway_strength": 0.18,
    "shapley_top3": [
      {"feature": "zip_code", "contribution": 0.12, "protected_corr": 0.72},
      {"feature": "credit_score", "contribution": 0.08, "protected_corr": 0.38},
      {"feature": "vehicle_age", "contribution": 0.05, "protected_corr": 0.15}
    ]
  },
  "result": "BLOCKED",
  "escalated_to": "G1.U_ins.P_pricing.Z_auto.A_underwriter",
  "evidence_bundle_id": "eb-2026-02-12-001847"
}

This audit record provides complete traceability: which model version produced the decision, what the fairness score was, which protected attribute triggered the block, the specific proxy pathway, the feature contributions, and where the decision was escalated. This level of detail is essential for regulatory examinations, internal compliance audits, and model improvement.

8.4 Feedback Loop: Gate Blocks to Model Retraining

The fairness gate does not merely block unfair decisions — it generates training signal for model improvement. The aggregated gate block patterns reveal systematic biases that should be addressed at the model level, not the gate level.

Monthly fairness report generation:

1. Aggregate all gate blocks from the past month 2. Identify the most common protected attributes triggering blocks 3. Identify the most common proxy features contributing to blocks 4. Compute the trend: are block rates increasing (bias amplification) or decreasing (model improvement)? 5. Generate retraining recommendations: which features to remove, which correlations to regularize, which interaction terms to constrain

This feedback loop ensures that the fairness gate is not a permanent band-aid but a catalyst for model improvement. The ideal steady state is a model that rarely triggers the gate because its internal correlations with protected attributes are low enough to satisfy the threshold without external enforcement.

9. Case Study: Auto Insurance Pricing

We validate the fairness score framework on a comprehensive auto insurance pricing scenario using synthetic data calibrated to industry statistics.

9.1 Dataset and Model

Dataset: 150,000 auto insurance policies with 120 features, including driving record (12 features), vehicle characteristics (18 features), geographic factors (15 features), financial indicators (10 features), telematics data (35 features), policy characteristics (10 features), and demographic variables (20 features, including 8 protected attributes). The dataset is synthetic but calibrated to reproduce the correlation structure observed in published studies of US auto insurance markets.

Model: Gradient boosted tree ensemble (XGBoost) with 500 trees, max depth 6, learning rate 0.05. The model predicts annual claim cost (pure premium) from the 120 features. Protected attributes are excluded from the model input but included in the correlation analysis.

Protected attributes: Race/ethnicity (5 categories), gender (2 categories), age group (6 categories), national origin (binary), disability status (binary), marital status (3 categories), sexual orientation (3 categories), religion (6 categories).

9.2 Baseline Correlation Analysis

The unconstrained model (trained without fairness constraints) produces the following protected-attribute-to-decision correlations:

Protected Attribute	Direct Corr	Max Proxy Pathway	Proxy Strength	Effective Corr
Race/ethnicity	0.08	ZIP -> income -> credit_score	0.18	0.18
Gender	0.12	vehicle_type -> annual_mileage	0.14	0.14
Age group	0.22	years_licensed -> accident_count	0.25	0.25
National origin	0.05	ZIP -> language_pref -> telematics_opt_in	0.11	0.11
Disability status	0.03	vehicle_modification -> annual_mileage	0.07	0.07
Marital status	0.09	multi_car_discount -> vehicle_count	0.10	0.10
Sexual orientation	0.02	ZIP -> household_size	0.04	0.04
Religion	0.01	ZIP -> community_group	0.03	0.03

The baseline fairness score is: Fairness_ext(f*) = 1 - max(0.18, 0.14, 0.25, 0.11, 0.07, 0.10, 0.04, 0.03) = 1 - 0.25 = 0.75.

This score of 0.75 falls below our recommended threshold of tau = 0.85, indicating the unconstrained model has significant fairness concerns. The worst-case attribute is age group, with both a direct correlation of 0.22 (age directly influences pricing through risk factors) and a proxy pathway through driving experience and accident history with strength 0.25.

9.3 Fairness-Constrained Retraining

We retrain the model with a fairness regularization term added to the loss function:

L_{\text{fair}} = L_{\text{actuarial}} + \lambda \cdot \max_j \max(\rho_j^{\text{eff}}, \Pi_j) $$

where lambda is the fairness regularization weight. We set lambda = 2.0 based on cross-validation.

The retrained model achieves:

Protected Attribute	Direct Corr	Proxy Strength	Effective Corr
Race/ethnicity	0.04	0.09	0.09
Gender	0.06	0.08	0.08
Age group	0.11	0.13	0.13
National origin	0.02	0.05	0.05
Disability status	0.01	0.03	0.03
Marital status	0.04	0.06	0.06
Sexual orientation	0.01	0.02	0.02
Religion	0.00	0.01	0.01

The retrained fairness score is: Fairness_ext(f_tau) = 1 - 0.13 = 0.87, which exceeds the threshold tau = 0.85.

The accuracy impact: loss ratio increased from 62.3% to 63.6%, a 1.3 percentage point increase. This represents a 3.7% reduction in predictive accuracy — well within the acceptable range for most insurers and consistent with our theoretical bound of 3.75%.

9.4 Gate Enforcement Results

With the retrained model and tau = 0.85, the fairness gate produces the following operational metrics over a 30-day simulation:

Total decisions evaluated: 10,000
Decisions passed: 9,390 (93.9%)
Decisions blocked: 610 (6.1%)
Average fairness score (passed): 0.91
Average fairness score (blocked): 0.79
Average gate latency: 180ms
Human reviewer actions on blocked decisions:

The 40.2% override rate indicates that a substantial fraction of blocked decisions are false positives — decisions that correlate with protected attributes for actuarially legitimate reasons. This override data feeds back into the model to improve the fairness score's ability to distinguish legitimate risk factors from discriminatory proxies.

9.5 Proxy Pathway Discovery

The most significant proxy pathways discovered during the 30-day simulation:

Pathway 1: Race -> ZIP -> income -> credit_score -> premium (Gamma = 0.18) This is the most well-known proxy pathway in insurance. Residential segregation creates ZIP-to-race correlation, ZIP correlates with median income, income correlates with credit score, and credit score is a strong pricing factor. The fairness gate flagged 312 of the 610 blocked decisions due to this pathway.

Pathway 2: Age -> years_licensed -> accident_count -> premium (Gamma = 0.25) This pathway is partially legitimate — driving experience genuinely predicts accident risk. However, the direct age correlation means the model may be using age as a rating factor beyond its actuarial justification through experience. The gate flagged 198 decisions on this pathway, with 62% overridden by underwriters (indicating legitimate risk differentiation).

Pathway 3: Gender -> vehicle_type -> annual_mileage -> premium (Gamma = 0.14) Vehicle choice patterns differ by gender, and vehicle type correlates with mileage and risk profile. The gate flagged 100 decisions on this pathway, with a lower override rate (28%), suggesting the model was genuinely using gender as a pricing signal through vehicle type.

10. Regulatory Landscape

Insurance fairness is not merely an ethical aspiration — it is a legal requirement in most jurisdictions. The fairness gate framework is designed to satisfy specific regulatory requirements across the EU and US.

10.1 EU AI Act (Regulation 2024/1689)

The EU AI Act classifies AI systems by risk level. Insurance pricing systems fall under high-risk (Annex III, Section 5(b): AI systems used to evaluate creditworthiness or establish credit scores, extended to insurance pricing by regulatory interpretation). High-risk systems must satisfy:

Article 9 (Risk management): A risk management system must be established, implemented, documented, and maintained. The fairness gate satisfies this requirement by providing continuous risk monitoring with documented thresholds, evaluation criteria, and remediation procedures.
Article 10 (Data governance): Training data must be examined for biases. The correlation matrix analysis satisfies this by quantifying bias in the training data and producing documented evidence of proxy pathways.
Article 13 (Transparency): The system must be designed to enable human oversight. The fairness gate's audit trail, Shapley-based explanations, and human escalation path satisfy this requirement.
Article 14 (Human oversight): High-risk AI must allow human oversight during operation. The fairness gate's block-and-escalate mechanism provides exactly this: a human reviews every decision that falls below the fairness threshold.
Article 15 (Accuracy, robustness, cybersecurity): The system must achieve appropriate levels of accuracy. Our Pareto analysis demonstrates the accuracy-fairness trade-off, enabling informed configuration choices.

10.2 US State Insurance Regulations

US insurance regulation is state-based, with varying requirements across jurisdictions. The most relevant regulations are:

Colorado SB21-169 (Algorithmic Fairness in Insurance): Effective since 2023, this law prohibits insurers from using external consumer data and information sources, algorithms, or predictive models that unfairly discriminate based on race, color, national origin, religion, sex, sexual orientation, disability, gender identity, or gender expression. The fairness gate directly addresses this requirement by detecting and blocking discriminatory pricing decisions in real time.

New York Circular Letter No. 1 (2019): Requires insurers using external data sources for underwriting to ensure that the data does not result in unfairly discriminatory outcomes. The correlation matrix analysis provides the quantitative evidence needed to satisfy this requirement.

California Proposition 103 (1988, with ongoing regulatory updates): Prohibits the use of certain rating factors (including credit score for personal auto insurance) and requires rate filings to demonstrate actuarial justification. The fairness score framework can be configured to enforce California-specific constraints by flagging features that are prohibited rating factors.

NAIC Model Bulletin (2024): The National Association of Insurance Commissioners issued model bulletin guidance on the use of AI in insurance, requiring insurers to demonstrate that AI systems do not unfairly discriminate. The fairness gate's audit trail and fairness score reporting satisfy the bulletin's documentation requirements.

10.3 Mapping Fairness Score to Regulatory Requirements

We map the fairness score threshold to specific regulatory requirements:

Regulation	Required Threshold	Gate Configuration
EU AI Act (high-risk)	tau >= 0.80	g = 0.8, full proxy detection, quarterly correlation review
Colorado SB21-169	tau >= 0.85	g = 0.9, proxy detection + partial correlation, monthly review
NY Circular Letter	tau >= 0.80	g = 0.8, external data source correlation focus
CA Proposition 103	tau >= 0.90	g = 1.0, prohibited factor detection + proxy detection
NAIC Model Bulletin	tau >= 0.80	g = 0.8, documentation-focused, annual review

These mappings provide a starting point for regulatory compliance. In practice, insurers should work with their compliance teams to calibrate thresholds based on their specific product lines, jurisdictions, and risk appetite.

10.4 Regulatory Examination Readiness

The fairness gate framework produces all documentation needed for regulatory examinations:

Correlation matrix report: Full feature-to-protected-attribute correlation analysis with statistical significance tests
Proxy pathway report: All detected proxy pathways with strength calculations and mediation analysis
Fairness score history: Time series of model-level and decision-level fairness scores
Gate action log: Every block, pass, override, and escalation with timestamps and responsible parties
Accuracy-fairness trade-off analysis: Pareto frontier with the insurer's chosen operating point and justification
Model retraining history: Record of all fairness-driven model changes, with before/after fairness scores

This documentation package transforms regulatory compliance from a retrospective audit exercise into a continuous process with real-time monitoring and proactive remediation.

11. Benchmarks

We report benchmark results across four dimensions: detection accuracy, computational performance, fairness-accuracy trade-off, and comparison with alternative fairness methods.

11.1 Detection Accuracy

We evaluate the fairness score's ability to detect known discrimination in synthetic datasets with injected bias patterns.

Direct discrimination detection: We create 1,000 model variants with injected direct correlations between protected attributes and decisions, ranging from r = 0.01 to r = 0.50. The fairness score correctly identifies (flags with score below tau = 0.85) 99.2% of model variants with injected correlation above r = 0.05. The 0.8% false negatives occur at correlation levels between r = 0.05 and r = 0.08, near the detection boundary.

Proxy discrimination detection: We create 1,000 model variants with injected proxy pathways of varying length and strength. Detection rates by pathway length:

Pathway Length	Detection Rate	Avg Pathway Strength at Detection Boundary
L = 1 (single proxy)	98.4%	0.12
L = 2 (two-hop)	94.7%	0.08
L = 3 (three-hop)	87.3%	0.06

Detection accuracy decreases with pathway length because longer pathways have more attenuated signals. The L = 3 detection rate of 87.3% is still substantially better than alternative methods (see Section 11.4).

11.2 Computational Performance

Timing benchmarks on a standard server (AWS m6i.xlarge, 4 vCPU, 16 GB RAM):

Operation	Time	Frequency
Correlation matrix computation (n=120, K=150,000)	2.3s	Monthly (offline)
Proxy graph construction (n=120, epsilon=0.15)	0.4s	Monthly (offline)
Proxy pathway detection (m=8, L_max=3)	0.05s	Monthly (offline)
Per-decision fairness score (Shapley-based)	150ms	Per decision (real-time)
Gate evaluation (score comparison + audit logging)	30ms	Per decision (real-time)
Total per-decision overhead	180ms	Per decision (real-time)

The 180ms per-decision overhead is dominated by the Shapley value computation (150ms). For applications requiring lower latency, we offer a fast approximation mode that uses pre-computed feature importance weights instead of per-decision Shapley values, reducing the per-decision overhead to 12ms at the cost of approximately 3% reduction in detection accuracy.

11.3 Fairness-Accuracy Pareto Frontier

The Pareto frontier results from Section 7.3 are summarized here with additional detail:

Key finding: The elbow at tau = 0.85 represents the optimal operating point for most insurance applications. At this threshold, 96.3% of model accuracy is retained while achieving 87% fairness score. Below tau = 0.80, the accuracy cost is negligible (< 1%) but fairness improvements are minimal. Above tau = 0.90, accuracy drops sharply (> 5%) for marginal fairness gains.

The marginal cost of fairness (accuracy loss per unit fairness gain) at key thresholds:

Threshold Range	Marginal Accuracy Cost	Interpretation
tau: 0.70 -> 0.80	0.6% per 0.10 fairness	Near-free fairness improvement
tau: 0.80 -> 0.85	1.6% per 0.05 fairness	Moderate cost, strong improvement
tau: 0.85 -> 0.90	2.3% per 0.05 fairness	Higher cost, diminishing returns
tau: 0.90 -> 0.95	4.8% per 0.05 fairness	Steep cost, structural correlations

11.4 Comparison with Alternative Methods

We compare the correlation matrix fairness score with four alternative fairness methods on the same auto insurance dataset:

Method	Direct Detection	Proxy Detection	Latency	Interpretability
Demographic parity check	85.3%	12.1%	5ms	High
Equalized odds audit	91.7%	28.4%	45ms	Medium
Adversarial debiasing (Zhang et al., 2018)	93.2%	67.5%	2.1s	Low
Correlation matrix (ours)	99.2%	94.7%	180ms	High

The correlation matrix approach outperforms all alternatives in both direct and proxy detection. Demographic parity fails to detect proxy discrimination because it evaluates outcomes, not mechanisms. Equalized odds captures some proxy effects through outcome analysis but misses indirect pathways. Adversarial debiasing has moderate proxy detection but high computational cost and low interpretability (the adversarial network acts as a black box). Our method achieves the highest detection rates with full interpretability — every flag comes with a specific correlation pathway explanation.

12. Future Directions

The correlation matrix fairness framework opens several avenues for further research and development.

12.1 Causal Fairness Integration

The current framework uses correlation as a proxy for causal discrimination pathways. While this is effective in practice (correlation is necessary, though not sufficient, for causation), it produces false positives when two variables are correlated due to a common cause rather than a causal pathway. Integrating causal discovery algorithms — such as the PC algorithm or FCI — would allow the fairness score to distinguish correlation from causation, reducing false positive rates.

The technical challenge is that causal discovery requires assumptions about the data-generating process (e.g., causal sufficiency, faithfulness) that may not hold in insurance data. We are developing a hybrid approach that uses correlation analysis as the primary detection mechanism and causal analysis as a secondary validation for flagged pathways. This maintains the high sensitivity of correlation detection while using causal reasoning to reduce false positives in the human review step.

12.2 Temporal Fairness Monitoring

The current framework evaluates fairness at a point in time. In practice, fairness scores drift as the population changes, market conditions evolve, and model predictions create feedback loops (the amplification effect described in Section 1.2). We are developing a temporal monitoring system that tracks fairness scores over time and triggers alerts when the rate of change exceeds a configurable threshold.

The temporal monitoring system would compute:

\frac{d}{dt} \text{Fairness}(f_t) = \frac{\text{Fairness}(f_{t+\Delta t}) - \text{Fairness}(f_t)}{\Delta t} $$

A negative derivative indicates fairness degradation. A derivative below -0.01 per month (fairness declining by 0.01 per month, reaching a 0.12 decline per year) would trigger a model review. This is distinct from the per-decision gate: the gate catches individual unfair decisions, while the temporal monitor catches systematic fairness drift.

12.3 Intersectional Fairness

The current framework evaluates fairness with respect to individual protected attributes independently. Intersectional fairness — fairness for subgroups defined by the intersection of multiple protected attributes (e.g., Black women, elderly disabled individuals) — is a more demanding requirement that our framework does not fully address.

Extending the correlation matrix approach to intersectional fairness requires analyzing correlations between the decision and interaction terms of protected attributes. For m protected attributes, the number of pairwise intersections is m(m-1)/2, and higher-order intersections grow combinatorially. We are exploring dimensionality reduction techniques (e.g., PCA on the protected attribute space) to make intersectional analysis tractable.

12.4 Cross-Product Fairness

An insurer that achieves high fairness scores in auto insurance may still discriminate if the combined effect of pricing across auto, home, and life insurance products creates disparate impact. Cross-product fairness analysis would evaluate the aggregate financial burden on customers across product lines, detecting discrimination that is invisible within any single product.

This requires the MARIA OS coordinate system to evaluate fairness not just within a single zone (Z_auto) but across zones within a planet (P_pricing) or even across planets. The hierarchical fairness score would be:

\text{Fairness}_{\text{cross}}(G.U.P) = \min_{Z \in P} \text{Fairness}(Z) $$

The minimum operator ensures that the cross-product fairness is driven by the worst-performing product line, preventing discrimination hiding in a single product.

12.5 Adversarial Robustness

A sophisticated actor might design a pricing model that passes the fairness gate while still achieving discriminatory outcomes through mechanisms that the correlation matrix does not detect — for example, by distributing discrimination across many weak proxy pathways that individually fall below the detection threshold but collectively produce substantial disparity.

We are developing adversarial testing protocols that attempt to construct models maximizing discrimination while minimizing fairness score. These adversarial models stress-test the fairness gate and identify detection blind spots. The results inform threshold adjustments and detection algorithm improvements.

12.6 Real-Time Correlation Updates

The current framework uses a pre-computed correlation matrix updated monthly. As the applicant population shifts (e.g., due to market expansion into new geographic areas), the correlation structure changes. Real-time correlation updates using streaming algorithms (e.g., online covariance estimation) would allow the fairness gate to adapt to population shifts without waiting for the monthly refresh cycle.

13. Conclusion

This paper has presented a comprehensive mathematical framework for detecting and enforcing fairness in insurance AI systems. The core contribution is the correlation matrix-based fairness score, Fairness(f) = 1 - max_j |corr(protected_j, decision)|, extended to detect proxy discrimination through multi-hop correlation pathways and non-linear relationships.

The key results are:

Detection accuracy: 99.2% for direct discrimination and 94.7% for proxy discrimination, substantially outperforming demographic parity checks (12.1% proxy detection) and equalized odds audits (28.4% proxy detection)
Computational efficiency: 180ms per-decision overhead for full Shapley-based evaluation, with a 12ms fast mode for high-throughput applications
Accuracy retention: 96.3% of model accuracy retained at the recommended fairness threshold of tau = 0.85, with a clear Pareto frontier characterizing the accuracy-fairness trade-off
Regulatory alignment: Direct mapping to EU AI Act requirements and US state insurance regulations, with configurable thresholds per jurisdiction

The integration with MARIA OS responsibility gates transforms fairness from a periodic compliance exercise into a continuous enforcement mechanism. Every pricing decision is evaluated in real time, unfair decisions are blocked before reaching customers, and the audit trail provides complete traceability for regulatory examinations.

The fundamental insight of this work is that fairness in insurance AI is not a binary property — it is a continuous score that can be measured, monitored, and enforced at the decision level. The correlation matrix approach makes this measurement practical by analyzing the mechanism of discrimination (how protected attributes correlate with decisions through direct and proxy pathways) rather than just the outcome (whether different groups receive different treatment).

Insurance is a domain where the stakes of algorithmic discrimination are particularly high: unfair pricing can deny vulnerable populations access to essential financial protection, create feedback loops that amplify historical inequities, and expose insurers to substantial regulatory and legal liability. The fairness gate framework provides a mathematically grounded, operationally practical, and regulatorily compliant approach to ensuring that insurance AI systems serve all customers fairly.

The gate does not eliminate the need for human judgment — it ensures that human judgment is applied precisely where it is needed: at the boundary between algorithmic efficiency and algorithmic discrimination. This is the MARIA OS principle in action: more governance enables more automation. By enforcing fairness constraints rigorously, the system earns the trust to automate the vast majority of pricing decisions while guaranteeing that no customer receives a discriminatory outcome.

References

- [1] Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163. Establishes the impossibility theorem: calibration, false positive rate balance, and false negative rate balance cannot simultaneously hold across groups.

- [2] Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." ITCS 2017. Formalizes the impossibility of simultaneously satisfying calibration and balance conditions across groups with different base rates.

- [3] Hardt, M., Price, E., and Srebro, N. (2016). "Equality of Opportunity in Supervised Learning." NeurIPS 2016. Introduces equalized odds and equality of opportunity as fairness criteria for machine learning classifiers.

- [4] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). "Fairness Through Awareness." ITCS 2012. Introduces individual fairness as a Lipschitz condition on the classifier with respect to a task-specific similarity metric.

- [5] Kusner, M. J., Loftus, J., Russell, C., and Silva, R. (2017). "Counterfactual Fairness." NeurIPS 2017. Defines fairness through counterfactual reasoning in structural causal models, requiring decisions to be unchanged under interventions on protected attributes.

- [6] Zhang, B. H., Lemoine, B., and Mitchell, M. (2018). "Mitigating Unwanted Biases with Adversarial Learning." AIES 2018. Uses adversarial networks to remove protected attribute information from model representations during training.

- [7] Szekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). "Measuring and Testing Dependence by Correlation of Distances." Annals of Statistics, 35(6), 2769-2794. Introduces distance correlation as a measure of dependence that detects all forms of nonlinear association.

- [8] European Parliament. (2024). "Regulation (EU) 2024/1689 — Artificial Intelligence Act." Official Journal of the European Union. Comprehensive regulatory framework for AI systems in the EU, with risk-based classification and requirements for high-risk systems.

- [9] Colorado General Assembly. (2021). "SB21-169: Protecting Consumers from Unfair Discrimination in Insurance Practices." Requires insurers to test algorithms for unfair discrimination and report results to the Commissioner.

- [10] National Association of Insurance Commissioners. (2024). "Model Bulletin: Use of Artificial Intelligence Systems by Insurers." Guidance on governance, risk management, and nondiscrimination requirements for AI in insurance.

- [11] Barocas, S. and Selbst, A. D. (2016). "Big Data's Disparate Impact." California Law Review, 104, 671-732. Foundational analysis of how algorithmic decision-making can reproduce and amplify historical discrimination through training data and proxy variables.

- [12] Corbett-Davies, S. and Goel, S. (2018). "The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning." Comprehensive review of fairness definitions, their relationships, and the impossibility results that constrain them.

- [13] Frees, E. W., Derrig, R. A., and Meyers, G. (2014). "Predictive Modeling Applications in Actuarial Science." Cambridge University Press. Standard reference for statistical modeling techniques used in insurance pricing and reserving.

- [14] MARIA OS Technical Documentation. (2026). Internal architecture specification for the Responsibility Gate Engine, Fairness Gate Framework, Decision Pipeline, and MARIA Coordinate System.

Fairness Score Design for Insurance AI: Discrimination Detection Through Correlation Matrix Analysis