Name: MARIA OS
Author: MARIA OS

要旨

すべての AI ガバナンスフレームワークには、リスクレベルごとに意思決定を分類するメカニズムが必要です。リスクの低い意思決定は完全に自動化できます。リスクの高い決定には人間によるレビューが必要です。これらの層間の分類境界によって、システムの自律性と安全性の間の基本的なトレードオフが決まります。この境界の重要性にもかかわらず、ほとんどのフレームワークはヒューリスティックルール (「1 万ドルを超える財務上の決定には承認が必要」) または組織の慣例 (「法的措置は常に階層 3」) を通じてリスク階層を割り当てています。これらのアプローチは脆弱であり、ドメイン固有であり、規制当局にとっては不当です。

このペーパーでは、リスク階層設計のための原則に基づいた数学的フレームワークを紹介します。私たちは 3 つの継続的なリスクの次元を定義します。影響範囲 I(d) は影響を受けるステークホルダーの数を測定し、不可逆性の度合い V(d) は決定を覆すことがどれほど難しいかを測定し、規制強度 G(d) は意思決定カテゴリーに対する外部からのコンプライアンス圧力を測定します。複合スコアリング関数 T(d) = w_I I(d) + w_V V(d) + w_G * G(d) は、各決定を連続リスクスコアにマッピングし、しきい値境界によってスコア空間が個別の層に分割されます。誤ったエスカレーション (人間による不必要なレビュー) と重要な決定の見逃し (高リスクのアクションに対する不適切なガバナンス) の両方にペナルティを与える損失関数を最小限に抑えることで、最適なしきい値を導き出します。

1. リスクの 3 つの側面

リスクはスカラーではありません。これは、ガバナンス要件に独立して寄与するさまざまな側面の複合体です。エンタープライズ AI 意思決定の分類に必要かつ十分な 3 つの側面を特定します。

Dimension 1: Impact Scope I(d)

  Definition: The number of stakeholders affected by decision d,
  normalized to [0, 1].

  I(d) = log(1 + affected_stakeholders) / log(1 + max_stakeholders)

  The logarithmic scaling reflects the empirical observation that
  governance requirements grow sub-linearly with stakeholder count.
  A decision affecting 100 people is not 10x riskier than one
  affecting 10; it is approximately 2x riskier.

  Examples:
    I(d) = 0.0:  No external stakeholders (self-contained)
    I(d) = 0.3:  Single team affected (5-10 people)
    I(d) = 0.6:  Department affected (50-200 people)
    I(d) = 0.8:  Organization affected (1000+ people)
    I(d) = 1.0:  External stakeholders / public affected

Dimension 2: Irreversibility Degree V(d)

  Definition: The cost of reversing decision d, normalized to [0, 1].

  V(d) = 1 - exp(-lambda * reversal_cost / decision_value)

  where lambda is a calibration parameter (default: 1.0)
  and reversal_cost includes direct costs, opportunity costs,
  and reputation costs.

  The exponential model captures the nonlinear relationship between
  reversal cost and irreversibility: cheap-to-reverse decisions
  cluster near V=0, while truly irreversible decisions (contract
  execution, public disclosure, physical deployment) saturate at V=1.

  Examples:
    V(d) = 0.0:  Trivially reversible (config change, draft edit)
    V(d) = 0.3:  Reversible with effort (code deployment, order cancel)
    V(d) = 0.6:  Costly to reverse (vendor commitment, hiring)
    V(d) = 0.9:  Practically irreversible (contract signed, data deleted)
    V(d) = 1.0:  Irreversible (public disclosure, physical action)

Dimension 3: Regulatory Intensity G(d)

  Definition: The external compliance pressure on decision
  category, normalized to [0, 1].

  G(d) = max(g_1(d), g_2(d), ..., g_k(d))

  where g_j(d) is the regulatory requirement level from
  regulation j, and the MAX operator reflects that the
  strictest applicable regulation governs.

  Regulatory scoring table:
    g = 0.0:  No applicable regulation
    g = 0.2:  Industry best practice (voluntary)
    g = 0.4:  Industry standard (quasi-mandatory)
    g = 0.6:  National regulation (mandatory, civil penalty)
    g = 0.8:  Sector-specific regulation (mandatory, license risk)
    g = 1.0:  Criminal law / fundamental rights implication

  The MAX operator is critical: if a decision falls under both
  GDPR (g=0.8) and voluntary industry guidelines (g=0.2),
  the regulatory intensity is 0.8, not 0.5.

2. 複合スコアリング機能

リスクスコア T(d) は、3 つの次元と学習された重みを組み合わせたものです。線形合成モデルと乗法合成モデルの両方を考慮し、その特性を分析します。

Linear Model:
  T_lin(d) = w_I * I(d) + w_V * V(d) + w_G * G(d)
  where w_I + w_V + w_G = 1, w_i > 0

  Properties:
    - T_lin in [0, 1]
    - Additive: high score in one dimension can compensate
      for low score in another
    - Simple to interpret and calibrate

Multiplicative Model:
  T_mul(d) = 1 - (1 - I(d))^w_I * (1 - V(d))^w_V * (1 - G(d))^w_G

  Properties:
    - T_mul in [0, 1]
    - Non-compensatory: a zero in any dimension with w > 0
      does not force T to zero
    - High score in ANY dimension drives T toward 1
    - More conservative (higher scores on average)

Hybrid Model (recommended):
  T(d) = max(T_lin(d), alpha * max(I(d), V(d), G(d)))

  where alpha in [0.5, 0.8] is the "single-dimension override"
  parameter. This ensures that an extreme value in any single
  dimension (e.g., V(d) = 1.0 for an irreversible action)
  cannot be masked by low values in other dimensions.

  Default weights: w_I = 0.3, w_V = 0.4, w_G = 0.3, alpha = 0.7
  Irreversibility receives the highest weight because it determines
  the cost of errors.

MARIA OS では、1 次元オーバーライドを備えたハイブリッドモデルが推奨されます。線形コンポーネントは、複数の次元にわたる総合的なリスクを取得しますが、MAX オーバーライドにより、単一の極端なリスク要因が平均化によって薄められることがなくなります。これはフェールセーフ設計です。いずれかの寸法が危険を示す場合、システムはより高いリスク分類に誤って分類されます。

3. 損失関数からのしきい値の導出

連続スコア T(d) が与えられた場合、[0, 1] を離散層に分割する必要があります。 MARIA OS は、R0 (完全に自動化)、R1 (監視された自動化)、R2 (人間によるレビュー)、R3 (上級承認)、および R4 (人間のみ) の 5 つの層を使用します。閾値ベクトル theta = (theta_1, theta_2, theta_3, theta_4) が境界を定義します。

Optimal Threshold Derivation:

Define the loss function for misclassification:
  L(theta) = sum_d [ c_over * 1{tier(d,theta) > tier_true(d)}
                    + c_under * 1{tier(d,theta) < tier_true(d)} ]

where:
  c_over  = cost of false escalation (unnecessary human review)
  c_under = cost of missed critical (inadequate governance)

Typically c_under >> c_over (missing a critical decision is
far worse than unnecessary review). Setting c_under/c_over = k:

  L(theta) = sum_d [ 1{over} + k * 1{under} ]

For a known score distribution F(t) and true tier boundaries
tau_1 < tau_2 < tau_3 < tau_4:

  Optimal theta_i minimizes the weighted misclassification
  at each boundary. For the normal approximation:

  theta_i* = tau_i - sigma_i * Phi^{-1}(1/(1+k)) / sqrt(n_i)

  where sigma_i is the score standard deviation near tau_i
  and n_i is the sample count near the boundary.

For k = 10 (missed critical is 10x worse than false escalation):
  Phi^{-1}(1/11) = Phi^{-1}(0.091) = -1.34
  theta_i* = tau_i + 1.34 * sigma_i / sqrt(n_i)

  The threshold shifts LEFT (toward lower scores),
  biasing classification toward higher tiers.
  This is the mathematically optimal conservative bias.

重要な洞察は、非対称損失関数が自然に保守的なしきい値を生成するということです。重要な決定を見逃した場合、不必要なエスカレーションの 10 倍のコストがかかり、最適なしきい値はより低いスコアにシフトし、人間によるレビューに送られる決定の割合が増加します。これは、その場限りの安全マージンではありません。これは、非対称エラーコストに対する損失を最小限に抑える対応です。

4. ドメイン固有のキャリブレーション

スコアリング関数 T(d) としきい値ベクトル θ は、各操作ドメインのキャリブレーションを必要とします。金融サービス、ヘルスケア、ソフトウェアエンジニアリングの 3 つのドメインの調整結果を示します。

Financial Services Calibration:

  Weight calibration (from 200 expert-labeled decisions):
    w_I = 0.25, w_V = 0.45, w_G = 0.30, alpha = 0.75

  Irreversibility dominates because financial transactions
  are difficult to reverse and regulatory penalties are severe.

  Threshold vector (k = 15, higher asymmetry):
    theta = (0.12, 0.31, 0.55, 0.78)

  Tier distribution:
    R0: 8%   (internal analytics, read-only queries)
    R1: 22%  (small transactions < $1K, reporting)
    R2: 41%  (standard transactions, customer comms)
    R3: 23%  (large transactions > $50K, compliance)
    R4: 6%   (regulatory filings, audit responses)

  Classification accuracy vs expert panel: 97.2%
  False escalation rate: 3.8%
  Missed critical rate: 0.2%

Healthcare Calibration:

  Weight calibration (from 180 expert-labeled decisions):
    w_I = 0.35, w_V = 0.35, w_G = 0.30, alpha = 0.80

  Impact scope receives higher weight because patient safety
  depends on the number of affected individuals.
  Alpha is higher (0.80) for stronger single-dimension override.

  Threshold vector (k = 20, highest asymmetry):
    theta = (0.08, 0.25, 0.48, 0.72)

  Tier distribution:
    R0: 5%   (scheduling, non-clinical admin)
    R1: 18%  (routine documentation, standard protocols)
    R2: 38%  (treatment planning, medication adjustments)
    R3: 28%  (surgical decisions, experimental protocols)
    R4: 11%  (life-critical, novel procedures, research)

  Classification accuracy vs expert panel: 95.8%
  False escalation rate: 5.2%
  Missed critical rate: 0.1%

Software Engineering Calibration:

  Weight calibration (from 300 expert-labeled decisions):
    w_I = 0.30, w_V = 0.40, w_G = 0.30, alpha = 0.65

  Lower alpha because software engineering has more
  reversibility options (rollbacks, feature flags).

  Threshold vector (k = 8, lower asymmetry):
    theta = (0.18, 0.38, 0.62, 0.82)

  Tier distribution:
    R0: 15%  (linting, formatting, dependency updates)
    R1: 30%  (feature branches, non-critical bug fixes)
    R2: 32%  (production deploys, API changes)
    R3: 18%  (infrastructure changes, security patches)
    R4: 5%   (data migrations, auth system changes)

  Classification accuracy vs expert panel: 96.1%
  False escalation rate: 3.4%
  Missed critical rate: 0.5%

5. 感度分析とロバスト性

重要な問題は、階層分類が重みとしきい値パラメータの変動に対してどの程度敏感であるかということです。各パラメーターをプラスまたはマイナス 10% 変動させ、分類結果の変化を測定することで感度分析を実行します。

Sensitivity Analysis (Financial Services, n=200 decisions):

  Parameter     | +/-10% perturbation | Classification change
----------------+---------------------+---------------------
  w_I           |       +/- 0.025     |  2.1% of decisions
  w_V           |       +/- 0.045     |  3.7% of decisions
  w_G           |       +/- 0.030     |  2.8% of decisions
  alpha         |       +/- 0.075     |  4.2% of decisions
  theta_1       |       +/- 0.012     |  1.5% of decisions
  theta_2       |       +/- 0.031     |  2.3% of decisions
  theta_3       |       +/- 0.055     |  3.1% of decisions
  theta_4       |       +/- 0.078     |  1.8% of decisions

  Maximum sensitivity: alpha (4.2%)
  Minimum sensitivity: theta_1 (1.5%)

  Robustness: 95.8% of decisions receive the same tier
  under all perturbation combinations.
  Only boundary decisions (T within 0.05 of a threshold)
  are sensitive to parameter choice.

この分類は堅牢です。95.8% の決定は 10% のパラメーターの変動の影響を受けません。アルファパラメーターに対する 4.2% の感度は、単一次元のオーバーライドによって層が変更される決定を反映しています。これらの境界決定はまさに追加の精査を受ける必要があるものであり、保守的な展開では、すべての境界ゾーン決定 (T が任意のしきい値の 0.05 以内) が上位層にルーティングされます。

結論

リスク層の設計は、明確に定義された目的、測定可能な入力、および最適なしきい値が証明された数学的最適化問題です。スコアリング関数 T(d) はリスクを解釈可能な 3 つの次元に分解し、1 次元オーバーライドを備えたハイブリッドモデルは危険な補償効果を防ぎ、非対称損失関数は安全性に偏った保守的なしきい値を自然に生成します。クロスドメインキャリブレーションにより、このフレームワークが専門委員会との96%以上の一致を達成しながら、クリティカルミス率を0.5%未満に維持していることが実証されました。このフレームワークは、ヒューリスティックなリスクルールを、規制当局が検査および検証できる原則に基づいた、監査可能で移植可能な方法論に置き換えます。

リスクティア設計の数理基準: 影響・不可逆性・規制圧の統合スコア

要旨

1. リスクの 3 つの側面

2. 複合スコアリング機能

3. 損失関数からのしきい値の導出

4. ドメイン固有のキャリブレーション

5. 感度分析とロバスト性

結論

責任階層型RAG出力制御: ゲート統治で検索生成精度を高める数理モデル

責任移転の定量化: 自動化が責任を減らすのかを検証する形式モデル

可逆性の形式化: 可逆/不可逆意思決定のリスク差分解析

エージェント統治のFail-Closedゲート設計: 責任分解と最適エスカレーション

リスクティア設計の数理基準: 影響・不可逆性・規制圧の統合スコア

要旨

1. リスクの 3 つの側面

2. 複合スコアリング機能

3. 損失​​関数からのしきい値の導出

4. ドメイン固有のキャリブレーション

5. 感度分析とロバスト性

結論

責任階層型RAG出力制御: ゲート統治で検索生成精度を高める数理モデル

責任移転の定量化: 自動化が責任を減らすのかを検証する形式モデル

可逆性の形式化: 可逆/不可逆意思決定のリスク差分解析

エージェント統治のFail-Closedゲート設計: 責任分解と最適エスカレーション

3. 損失関数からのしきい値の導出