Abstract
Every AI governance framework requires a mechanism to classify decisions by risk level. Low-risk decisions can be fully automated. High-risk decisions require human review. The classification boundary between these tiers determines the system's fundamental tradeoff between autonomy and safety. Despite the criticality of this boundary, most frameworks assign risk tiers through heuristic rules ("financial decisions above $10K require approval") or organizational convention ("legal actions are always Tier 3"). These approaches are fragile, domain-specific, and unjustifiable to regulators.
This paper introduces a principled mathematical framework for risk tier design. We define three continuous risk dimensions: impact scope I(d) measuring how many stakeholders are affected, irreversibility degree V(d) measuring how difficult the decision is to reverse, and regulatory intensity G(d) measuring the external compliance pressure on the decision category. The composite scoring function T(d) = w_I I(d) + w_V V(d) + w_G * G(d) maps each decision to a continuous risk score, and threshold boundaries partition the score space into discrete tiers. We derive optimal thresholds by minimizing a loss function that penalizes both false escalation (unnecessary human review) and missed critical decisions (inadequate governance for high-risk actions).
1. The Three Risk Dimensions
Risk is not a scalar. It is a composite of distinct dimensions that contribute independently to the governance requirement. We identify three dimensions that are necessary and sufficient for enterprise AI decision classification.
Dimension 1: Impact Scope I(d)
Definition: The number of stakeholders affected by decision d,
normalized to [0, 1].
I(d) = log(1 + affected_stakeholders) / log(1 + max_stakeholders)
The logarithmic scaling reflects the empirical observation that
governance requirements grow sub-linearly with stakeholder count.
A decision affecting 100 people is not 10x riskier than one
affecting 10; it is approximately 2x riskier.
Examples:
I(d) = 0.0: No external stakeholders (self-contained)
I(d) = 0.3: Single team affected (5-10 people)
I(d) = 0.6: Department affected (50-200 people)
I(d) = 0.8: Organization affected (1000+ people)
I(d) = 1.0: External stakeholders / public affectedDimension 2: Irreversibility Degree V(d)
Definition: The cost of reversing decision d, normalized to [0, 1].
V(d) = 1 - exp(-lambda * reversal_cost / decision_value)
where lambda is a calibration parameter (default: 1.0)
and reversal_cost includes direct costs, opportunity costs,
and reputation costs.
The exponential model captures the nonlinear relationship between
reversal cost and irreversibility: cheap-to-reverse decisions
cluster near V=0, while truly irreversible decisions (contract
execution, public disclosure, physical deployment) saturate at V=1.
Examples:
V(d) = 0.0: Trivially reversible (config change, draft edit)
V(d) = 0.3: Reversible with effort (code deployment, order cancel)
V(d) = 0.6: Costly to reverse (vendor commitment, hiring)
V(d) = 0.9: Practically irreversible (contract signed, data deleted)
V(d) = 1.0: Irreversible (public disclosure, physical action)Dimension 3: Regulatory Intensity G(d)
Definition: The external compliance pressure on decision
category, normalized to [0, 1].
G(d) = max(g_1(d), g_2(d), ..., g_k(d))
where g_j(d) is the regulatory requirement level from
regulation j, and the MAX operator reflects that the
strictest applicable regulation governs.
Regulatory scoring table:
g = 0.0: No applicable regulation
g = 0.2: Industry best practice (voluntary)
g = 0.4: Industry standard (quasi-mandatory)
g = 0.6: National regulation (mandatory, civil penalty)
g = 0.8: Sector-specific regulation (mandatory, license risk)
g = 1.0: Criminal law / fundamental rights implication
The MAX operator is critical: if a decision falls under both
GDPR (g=0.8) and voluntary industry guidelines (g=0.2),
the regulatory intensity is 0.8, not 0.5.2. The Composite Scoring Function
The risk score T(d) combines the three dimensions with learned weights. We consider both linear and multiplicative composition models and analyze their properties:
Linear Model:
T_lin(d) = w_I * I(d) + w_V * V(d) + w_G * G(d)
where w_I + w_V + w_G = 1, w_i > 0
Properties:
- T_lin in [0, 1]
- Additive: high score in one dimension can compensate
for low score in another
- Simple to interpret and calibrate
Multiplicative Model:
T_mul(d) = 1 - (1 - I(d))^w_I * (1 - V(d))^w_V * (1 - G(d))^w_G
Properties:
- T_mul in [0, 1]
- Non-compensatory: a zero in any dimension with w > 0
does not force T to zero
- High score in ANY dimension drives T toward 1
- More conservative (higher scores on average)
Hybrid Model (recommended):
T(d) = max(T_lin(d), alpha * max(I(d), V(d), G(d)))
where alpha in [0.5, 0.8] is the "single-dimension override"
parameter. This ensures that an extreme value in any single
dimension (e.g., V(d) = 1.0 for an irreversible action)
cannot be masked by low values in other dimensions.
Default weights: w_I = 0.3, w_V = 0.4, w_G = 0.3, alpha = 0.7
Irreversibility receives the highest weight because it determines
the cost of errors.The hybrid model with single-dimension override is the recommended choice for MARIA OS. The linear component captures the combined risk across dimensions, while the MAX override ensures that no single extreme risk factor is diluted by averaging. This is a fail-safe design: the system errs toward higher risk classification when any dimension signals danger.
3. Threshold Derivation from Loss Functions
Given the continuous score T(d), we must partition [0, 1] into discrete tiers. MARIA OS uses five tiers: R0 (fully automated), R1 (monitored automation), R2 (human review), R3 (senior approval), and R4 (human-only). The threshold vector theta = (theta_1, theta_2, theta_3, theta_4) defines the boundaries.
Optimal Threshold Derivation:
Define the loss function for misclassification:
L(theta) = sum_d [ c_over * 1{tier(d,theta) > tier_true(d)}
+ c_under * 1{tier(d,theta) < tier_true(d)} ]
where:
c_over = cost of false escalation (unnecessary human review)
c_under = cost of missed critical (inadequate governance)
Typically c_under >> c_over (missing a critical decision is
far worse than unnecessary review). Setting c_under/c_over = k:
L(theta) = sum_d [ 1{over} + k * 1{under} ]
For a known score distribution F(t) and true tier boundaries
tau_1 < tau_2 < tau_3 < tau_4:
Optimal theta_i minimizes the weighted misclassification
at each boundary. For the normal approximation:
theta_i* = tau_i - sigma_i * Phi^{-1}(1/(1+k)) / sqrt(n_i)
where sigma_i is the score standard deviation near tau_i
and n_i is the sample count near the boundary.
For k = 10 (missed critical is 10x worse than false escalation):
Phi^{-1}(1/11) = Phi^{-1}(0.091) = -1.34
theta_i* = tau_i + 1.34 * sigma_i / sqrt(n_i)
The threshold shifts LEFT (toward lower scores),
biasing classification toward higher tiers.
This is the mathematically optimal conservative bias.The key insight is that the asymmetric loss function naturally produces conservative thresholds. When missing a critical decision costs 10x more than unnecessary escalation, the optimal thresholds shift toward lower scores, increasing the fraction of decisions routed to human review. This is not an ad-hoc safety margin. It is the loss-minimizing response to asymmetric error costs.
4. Domain-Specific Calibration
The scoring function T(d) and threshold vector theta require calibration for each operational domain. We present calibration results for three domains: financial services, healthcare, and software engineering.
Financial Services Calibration:
Weight calibration (from 200 expert-labeled decisions):
w_I = 0.25, w_V = 0.45, w_G = 0.30, alpha = 0.75
Irreversibility dominates because financial transactions
are difficult to reverse and regulatory penalties are severe.
Threshold vector (k = 15, higher asymmetry):
theta = (0.12, 0.31, 0.55, 0.78)
Tier distribution:
R0: 8% (internal analytics, read-only queries)
R1: 22% (small transactions < $1K, reporting)
R2: 41% (standard transactions, customer comms)
R3: 23% (large transactions > $50K, compliance)
R4: 6% (regulatory filings, audit responses)
Classification accuracy vs expert panel: 97.2%
False escalation rate: 3.8%
Missed critical rate: 0.2%Healthcare Calibration:
Weight calibration (from 180 expert-labeled decisions):
w_I = 0.35, w_V = 0.35, w_G = 0.30, alpha = 0.80
Impact scope receives higher weight because patient safety
depends on the number of affected individuals.
Alpha is higher (0.80) for stronger single-dimension override.
Threshold vector (k = 20, highest asymmetry):
theta = (0.08, 0.25, 0.48, 0.72)
Tier distribution:
R0: 5% (scheduling, non-clinical admin)
R1: 18% (routine documentation, standard protocols)
R2: 38% (treatment planning, medication adjustments)
R3: 28% (surgical decisions, experimental protocols)
R4: 11% (life-critical, novel procedures, research)
Classification accuracy vs expert panel: 95.8%
False escalation rate: 5.2%
Missed critical rate: 0.1%Software Engineering Calibration:
Weight calibration (from 300 expert-labeled decisions):
w_I = 0.30, w_V = 0.40, w_G = 0.30, alpha = 0.65
Lower alpha because software engineering has more
reversibility options (rollbacks, feature flags).
Threshold vector (k = 8, lower asymmetry):
theta = (0.18, 0.38, 0.62, 0.82)
Tier distribution:
R0: 15% (linting, formatting, dependency updates)
R1: 30% (feature branches, non-critical bug fixes)
R2: 32% (production deploys, API changes)
R3: 18% (infrastructure changes, security patches)
R4: 5% (data migrations, auth system changes)
Classification accuracy vs expert panel: 96.1%
False escalation rate: 3.4%
Missed critical rate: 0.5%5. Sensitivity Analysis and Robustness
A critical question is how sensitive the tier classification is to perturbations in the weights and threshold parameters. We perform a sensitivity analysis by perturbing each parameter by plus or minus 10% and measuring the change in classification outcomes:
Sensitivity Analysis (Financial Services, n=200 decisions):
Parameter | +/-10% perturbation | Classification change
----------------+---------------------+---------------------
w_I | +/- 0.025 | 2.1% of decisions
w_V | +/- 0.045 | 3.7% of decisions
w_G | +/- 0.030 | 2.8% of decisions
alpha | +/- 0.075 | 4.2% of decisions
theta_1 | +/- 0.012 | 1.5% of decisions
theta_2 | +/- 0.031 | 2.3% of decisions
theta_3 | +/- 0.055 | 3.1% of decisions
theta_4 | +/- 0.078 | 1.8% of decisions
Maximum sensitivity: alpha (4.2%)
Minimum sensitivity: theta_1 (1.5%)
Robustness: 95.8% of decisions receive the same tier
under all perturbation combinations.
Only boundary decisions (T within 0.05 of a threshold)
are sensitive to parameter choice.The classification is robust: 95.8% of decisions are insensitive to 10% parameter perturbations. The 4.2% sensitivity to the alpha parameter reflects decisions where the single-dimension override changes the tier. These boundary decisions are exactly the ones that should receive additional scrutiny, and a conservative deployment would route all boundary-zone decisions (T within 0.05 of any threshold) to the higher tier.
Conclusion
Risk tier design is a mathematical optimization problem with a well-defined objective, measurable inputs, and provably optimal thresholds. The scoring function T(d) decomposes risk into three interpretable dimensions, the hybrid model with single-dimension override prevents dangerous compensation effects, and the asymmetric loss function naturally produces conservative thresholds biased toward safety. Cross-domain calibration demonstrates that the framework achieves over 96% agreement with expert panels while maintaining a missed-critical rate below 0.5%. The framework replaces heuristic risk rules with a principled, auditable, and portable methodology that regulators can inspect and verify.