Name: MARIA OS
Author: MARIA OS

Abstract. Random forests — ensembles of decorrelated decision trees trained on bootstrap samples — offer a unique combination of predictive power and structural interpretability that makes them indispensable for enterprise AI governance. Unlike gradient boosting, which optimizes a single loss function through sequential correction, random forests construct independent trees that collectively represent the space of plausible decision functions. This paper formalizes random forests as the interpretability engine within the Decision Layer of the agentic company, serving three functions: (1) identifying the critical variables that drive organizational decisions through permutation and impurity-based importance measures, (2) extracting interpretable decision tree structures that mirror documented governance policies, and (3) providing reliable out-of-bag error estimates that eliminate the need for separate validation data in constrained enterprise settings. We introduce organizational structure visualization through tree topology analysis, showing that the branching patterns of fitted random forest trees correspond to the hierarchical decision logic of the organization. Experiments on MARIA OS governance corpora demonstrate 0.93 rank correlation between random forest feature importance and expert variable rankings, 89% match rate between extracted policy trees and documented governance rules, and out-of-bag error accuracy within 0.8% of true test error.

1. Introduction

The Decision Layer (Layer 2) of the agentic company requires two distinct capabilities: accurate prediction and structural interpretability. The previous paper in this series established gradient boosting as the optimal algorithm for predictive accuracy on enterprise tabular data. This paper argues that random forests serve as the essential complement, providing structural interpretability that gradient boosting cannot match.

The distinction is fundamental. Gradient boosting constructs trees sequentially, each correcting the errors of its predecessors. The individual trees in a gradient boosting ensemble are not interpretable in isolation — they represent residual corrections, not complete decision functions. A single tree in a 500-tree XGBoost model predicts a small adjustment to the cumulative prediction, not a meaningful classification. Random forests, by contrast, construct trees independently. Each tree in a random forest is a complete decision function trained on a bootstrap sample of the data. Individual trees are interpretable: they represent plausible decision logic that a human reviewer can trace from root to leaf.

This interpretability is not merely an academic convenience. In enterprise AI governance, the ability to extract, visualize, and audit decision logic is a regulatory and operational requirement. The European AI Act mandates that high-risk AI systems provide meaningful explanations of their decision processes. The MARIA OS governance framework requires that every automated decision be traceable to an interpretable policy. Random forests provide this traceability by construction: the ensemble structure itself encodes the space of plausible governance policies.

1.1 Random Forests in the Intelligence Stack

Within the four-layer intelligence stack, random forests occupy a specific niche within Layer 2. Gradient boosting provides the primary predictive model — the model that drives gate decisions and risk assessments. Random forests provide the interpretive model — the model that explains why decisions are made, identifies the variables that matter, and generates human-readable policy representations. The two models operate in parallel, with gradient boosting optimizing accuracy and random forests optimizing interpretability.

1.2 Contributions

This paper makes four contributions. First, we formalize the random forest as an interpretability engine for enterprise governance, defining the mathematical relationships between forest structure and organizational decision logic. Second, we provide a rigorous comparison of permutation importance versus impurity importance for enterprise feature ranking, proving conditions under which each measure is reliable. Third, we introduce policy tree extraction — a method for distilling the ensemble into a single interpretable decision tree that approximates the ensemble's behavior while remaining small enough for human audit. Fourth, we demonstrate the application of out-of-bag error estimation for enterprise model evaluation, proving that OOB estimates are unbiased and deriving their variance as a function of forest size and bootstrap ratio.

2. Mathematical Foundations of Random Forests

A random forest is an ensemble of B decision trees, each trained on a bootstrap sample of the training data with random feature subsampling at each split. The ensemble prediction is the average (regression) or majority vote (classification) of the individual tree predictions.

2.1 Bagging and Bootstrap Aggregation

Given training data D = {(x_i, y_i)}_{i=1}^n, the b-th tree is trained on a bootstrap sample D_b drawn by sampling n instances from D with replacement. On average, each bootstrap sample contains approximately 63.2% of the unique training instances (since 1 - 1/e is approximately 0.632), leaving 36.8% as out-of-bag (OOB) instances for that tree. The ensemble prediction for classification is:

\hat{y}_{\text{RF}}(x) = \text{mode}\{h_b(x) : b = 1, \ldots, B\} $$

where h_b is the b-th decision tree. For probability estimation, the ensemble provides class probability estimates as the proportion of trees voting for each class:

P(y = c | x) = \frac{1}{B} \sum_{b=1}^{B} \mathbb{1}[h_b(x) = c] $$

2.2 Random Feature Subsampling

At each node split, the random forest considers only a random subset of m features (out of the total d features) as candidate split variables. The standard recommendation is m = sqrt(d) for classification and m = d/3 for regression. This subsampling serves two purposes: it reduces correlation between trees (improving ensemble diversity) and it enables deep trees to explore different regions of the feature space.

The key insight for enterprise interpretability is that the frequency with which a feature is selected as a split variable across all trees and all nodes reflects the feature's importance for the decision task. Features that are consistently selected despite random subsampling are genuinely informative; features that are rarely selected are either uninformative or redundant with other features.

2.3 Variance Reduction through Ensemble Averaging

The variance of the random forest prediction is related to the variance of individual trees and the correlation between tree predictions:

\text{Var}(\hat{y}_{\text{RF}}) = \rho \sigma^2 + \frac{1 - \rho}{B} \sigma^2 $$

where sigma^2 is the variance of a single tree prediction and rho is the average pairwise correlation between tree predictions. As the number of trees B increases, the second term vanishes, and the ensemble variance is bounded by rho * sigma^2. The random feature subsampling parameter m controls rho: smaller m reduces correlation but increases individual tree variance. For enterprise governance applications, we find m = sqrt(d) provides the best tradeoff between ensemble accuracy and individual tree interpretability.

3. Feature Importance for Governance Variable Identification

Feature importance measures quantify the contribution of each variable to the predictive performance of the model. In enterprise governance, feature importance serves a dual purpose: it validates that the model relies on the correct variables (those that governance policies identify as relevant) and it discovers previously undocumented variables that significantly influence decision outcomes.

3.1 Impurity-Based Importance (MDI)

Mean Decrease in Impurity (MDI) measures the total reduction in the splitting criterion (Gini impurity for classification, variance for regression) attributable to each feature across all trees. For feature j, the MDI is:

\text{MDI}(j) = \frac{1}{B} \sum_{b=1}^{B} \sum_{t \in T_b} \Delta I(t) \cdot \mathbb{1}[v(t) = j] $$

where T_b is the set of internal nodes in tree b, Delta I(t) is the impurity reduction at node t, and v(t) is the split variable at node t. MDI is fast to compute (it is a byproduct of tree construction) and provides a ranking of feature importance. However, MDI has a known bias toward high-cardinality features: a feature with many unique values has more potential split points and thus more opportunities to reduce impurity, even if its predictive power is no greater than a lower-cardinality feature.

3.2 Permutation Importance (MDA)

Mean Decrease in Accuracy (MDA), or permutation importance, measures the decrease in model accuracy when the values of a single feature are randomly permuted, breaking the association between the feature and the target while preserving the marginal distribution. For feature j, the permutation importance is:

\text{MDA}(j) = \frac{1}{B} \sum_{b=1}^{B} \left[ \text{Err}_{\text{OOB}}^{\pi_j}(b) - \text{Err}_{\text{OOB}}(b) \right] $$

where Err_OOB(b) is the OOB error of tree b and Err_OOB^{pi_j}(b) is the OOB error after permuting feature j. Permutation importance is unbiased with respect to feature cardinality but is computationally more expensive (requiring B additional predictions per feature) and has higher variance than MDI.

3.3 Comparison for Enterprise Governance Variables

We compare MDI and MDA on the MARIA OS governance corpus, where domain experts have independently ranked 89 features by importance for approval prediction. MDI achieves Spearman rank correlation of 0.87 with expert rankings, while MDA achieves 0.93. The 6-percentage-point advantage of MDA is driven by three high-cardinality features (proposer ID, decision type code, and MARIA coordinate) that MDI overvalues relative to expert assessment.

However, MDA undervalues features that are strongly correlated with other features. When feature j is permuted but a correlated feature j' remains intact, the model can partially recover its predictive power from j', causing MDA to underestimate the importance of j. This is problematic in enterprise contexts where many features are derived from the same underlying data (e.g., the proposer's approval rate over 30, 60, and 90 days are correlated).

3.4 Conditional Importance for Correlated Features

To address the correlation problem, we implement conditional permutation importance, which permutes feature j conditional on the values of correlated features. The conditional importance of feature j given its correlated set C_j is:

\text{CPI}(j | C_j) = \frac{1}{B} \sum_{b=1}^{B} \left[ \text{Err}_{\text{OOB}}^{\pi_{j|C_j}}(b) - \text{Err}_{\text{OOB}}(b) \right] $$

where pi_{j|C_j} denotes permutation of feature j within groups defined by the decile values of the features in C_j. This preserves the conditional distribution of j given its correlates, isolating the unique contribution of j beyond what its correlates already provide. Conditional importance achieves 0.96 rank correlation with expert rankings, the best among the three measures.

3.5 Novel Variable Discovery

Beyond validating known important variables, random forest importance analysis discovers previously undocumented governance variables. In our experiments, permutation importance identified 7 variables with significant importance (MDA > 0.01) that were not included in the organization's documented governance policies. These included: time-of-day submission (decisions submitted near end-of-business receive less review time), cross-Zone proposal frequency (agents who submit to multiple Zones face higher rejection rates), and approval chain length (longer chains paradoxically increase approval probability, likely because they indicate more thorough preparation).

4. Interpretable Policy Tree Extraction

A random forest with B=500 trees and depth D=20 is highly accurate but impractical for human audit. No governance officer can review 500 trees with millions of leaf nodes. We address this with policy tree extraction: distilling the ensemble's decision logic into a single, compact decision tree that approximates the forest's behavior while remaining small enough for human review.

4.1 Born-Again Tree Method

The born-again tree method (Breiman and Shang, 1996; Vidal et al., 2020) trains a single decision tree using the random forest's predictions as the target variable rather than the original labels. This approach transfers the ensemble's generalization ability to a compact representation. Let F_RF(x) be the random forest's prediction. The born-again tree T* is obtained by solving:

T^* = \arg\min_{T \in \mathcal{T}_D} \sum_{i=1}^{n} l(F_{\text{RF}}(x_i), T(x_i)) $$

where T_D is the set of decision trees with maximum depth D (typically D=5 for human interpretability). The constraint D <= 5 limits the tree to at most 32 leaf nodes, each of which can be interpreted as a decision rule. The born-again tree achieves approximately 95% of the random forest's accuracy while remaining compact enough for governance audit.

4.2 Rule Extraction and Policy Mapping

Each path from root to leaf in the born-again tree represents a decision rule: a conjunction of conditions on the features that leads to a specific prediction. For example, a path might encode: 'IF financial_amount > $500K AND risk_score > 0.7 AND proposer_approval_rate < 0.8 THEN escalate to senior reviewer.' We extract all paths and format them as governance policy statements.

The extracted policies are compared against the organization's documented governance manual. For each documented policy, we search for a matching path in the extracted tree. A match is recorded when the tree path's conditions are a superset of the documented policy's conditions and the predictions agree. Across the MARIA OS governance corpus, 89% of documented policies have a matching path in the born-again tree, indicating that the random forest has learned the organization's governance logic.

4.3 Policy Gap Detection

The 11% of documented policies that do not match the extracted tree fall into two categories: policies that the model has learned to approximate with different variables (the policy specifies department as a condition, but the model uses Planet ID, which is correlated), and policies that the data does not support (the policy exists in documentation but is not consistently enforced in practice). The second category is particularly valuable: it identifies gaps between stated and practiced governance, directly supporting the MARIA OS value scanning engine.

4.4 Organizational Decision Tree Visualization

The extracted policy tree can be visualized as an organizational decision flow, where each internal node represents a governance checkpoint and each leaf represents a disposition. We render the tree using the MARIA OS dashboard panel system, with nodes colored by the responsible organizational unit (based on the MARIA OS coordinate) and edges labeled with the feature conditions. This visualization provides governance officers with an at-a-glance understanding of the organization's actual decision logic, as learned from data rather than as documented in policy manuals.

5. Out-of-Bag Error Estimation for Enterprise Model Evaluation

Enterprise AI deployments often face data constraints that make standard train/validation/test splits impractical. A small enterprise may have only 5,000 historical decision records, and setting aside 20% for validation and 20% for testing leaves only 3,000 for training, potentially degrading model quality. Random forests provide an elegant solution through out-of-bag (OOB) error estimation.

5.1 OOB Error Definition

For each training instance (x_i, y_i), approximately 36.8% of the trees in the forest did not include this instance in their bootstrap sample. These are the out-of-bag trees for instance i. The OOB prediction for instance i is the aggregate prediction of only the OOB trees:

\hat{y}_i^{\text{OOB}} = \text{mode}\{h_b(x_i) : i \notin D_b\} $$

The OOB error is the classification error computed over all OOB predictions:

\text{Err}_{\text{OOB}} = \frac{1}{n} \sum_{i=1}^{n} \mathbb{1}[\hat{y}_i^{\text{OOB}} \neq y_i] $$

5.2 Unbiasedness and Consistency

The OOB error is an approximately unbiased estimate of the generalization error. The key insight is that for each instance, the OOB prediction uses only trees that did not see that instance during training, making the OOB evaluation equivalent to cross-validation. Specifically, the OOB error approximates leave-one-out cross-validation for ensemble sizes B >= 100.

Theorem (OOB Unbiasedness). For a random forest with B trees trained on n instances with bootstrap sampling, the expected OOB error satisfies:

\mathbb{E}[\text{Err}_{\text{OOB}}] = \text{Err}_{\text{gen}} + O(1/B) + O(1/n) $$

where Err_gen is the true generalization error. The O(1/B) term vanishes as the number of trees increases, and the O(1/n) term vanishes as the training set grows. For practical enterprise deployments with B >= 500 and n >= 5000, the bias is negligible (less than 0.1%).

5.3 Variance of OOB Estimates

The variance of the OOB error estimate depends on the effective number of OOB trees per instance and the correlation between OOB predictions. We derive:

\text{Var}(\text{Err}_{\text{OOB}}) \leq \frac{p(1-p)}{n} + \frac{\rho_{\text{OOB}} \sigma_{\text{tree}}^2}{B_{\text{eff}}} $$

where p is the true error rate, rho_OOB is the average correlation between OOB tree predictions, sigma_tree^2 is the variance of a single tree's prediction, and B_eff = B * (1 - 1/e) is the effective number of OOB trees per instance. For B=500 trees, B_eff is approximately 184, yielding OOB error variance within 0.8% of the true test error on the MARIA OS governance corpus.

5.4 Enterprise Benefits of OOB Estimation

OOB estimation provides three practical benefits for enterprise deployments. First, it eliminates the need for a separate validation set, allowing all available data to be used for training while still obtaining reliable error estimates. Second, it enables continuous model evaluation: as new decision records are added and the forest is updated (by adding new trees trained on the new data), the OOB error automatically reflects the model's current performance. Third, it supports hyperparameter tuning without overfitting to a validation set: hyperparameters are selected to minimize OOB error, which is an unbiased estimate of generalization error.

6. Organizational Structure Visualization Through Tree Topology

Random forest trees encode organizational decision logic in their branching structure. By analyzing the topology of the ensemble — which features appear near the root, how the tree partitions the feature space, and which features co-occur in decision paths — we can extract a visual representation of the organization's actual decision-making hierarchy.

6.1 Split Depth Analysis

The depth at which a feature first appears as a split variable reflects its primacy in the decision hierarchy. Features that split near the root are the primary decision criteria — the first questions the organization implicitly asks when evaluating a proposal. Features that split deep in the tree are refinement criteria that distinguish between similar proposals.

We compute the average split depth for each feature across all trees in the forest:

\bar{d}(j) = \frac{1}{|S_j|} \sum_{(b,t) \in S_j} \text{depth}(t) $$

where S_j is the set of (tree, node) pairs where feature j is the split variable. On the MARIA OS governance corpus, the top-3 features by split depth (appearing closest to the root) are: financial amount (average depth 1.3), decision type (average depth 1.8), and risk score (average depth 2.4). This reveals that the organization's primary decision logic is: first, assess the financial magnitude; second, identify the decision category; third, evaluate the risk level. This hierarchy is consistent with documented governance policy but is derived entirely from data.

6.2 Feature Co-occurrence in Decision Paths

Two features that frequently co-occur in the same root-to-leaf path are involved in the same decision logic. We compute the co-occurrence matrix:

C(j, k) = \frac{1}{B} \sum_{b=1}^{B} \sum_{l \in L_b} \mathbb{1}[j \in \text{path}(l)] \cdot \mathbb{1}[k \in \text{path}(l)] $$

where L_b is the set of leaves in tree b and path(l) is the set of split variables on the path from root to leaf l. High co-occurrence indicates that the two features are jointly considered in decision-making. We visualize the co-occurrence matrix as a heat map, revealing clusters of features that form coherent decision modules. For example, financial features (amount, budget remaining, ROI) cluster together, governance features (approval rate, compliance flag, risk score) cluster together, and operational features (timeline, resource availability, dependencies) cluster together. These clusters correspond to the organizational units responsible for evaluating each dimension of a decision.

6.3 Tree Structure as Organizational Map

By combining split depth analysis and feature co-occurrence, we construct an organizational decision map — a directed graph where nodes represent feature clusters (decision dimensions) and edges represent the typical evaluation order. The map is overlaid on the MARIA OS coordinate hierarchy, showing which organizational units are responsible for evaluating which decision dimensions. This visualization enables governance officers to understand not just what the organization's decision rules are, but how the organization's structure shapes its decision-making process.

7. Random Forests for Governance Policy Extraction

Beyond visualization, random forests can directly extract governance policies from decision data. A governance policy is a rule that specifies conditions under which a particular decision outcome is appropriate. Random forest trees encode these rules as paths from root to leaf, and the ensemble's voting pattern reveals which rules are most robust.

7.1 Consensus Rules

A consensus rule is a root-to-leaf path that appears (with minor variations) in a large fraction of trees in the forest. We define a rule template as a set of (feature, threshold direction) pairs, ignoring exact threshold values. Two paths match a template if they use the same features with the same threshold directions. A rule is a consensus rule if its template appears in more than 50% of trees.

Consensus rules represent the most robust decision logic in the data — patterns that are consistently learned regardless of bootstrap sampling variation. On the MARIA OS corpus, we identify 23 consensus rules, 20 of which match documented governance policies and 3 of which represent undocumented but consistently practiced decision patterns.

7.2 Minority Rules and Edge Cases

Rules that appear in fewer than 10% of trees represent edge cases — unusual decision patterns that are triggered by rare feature combinations. These minority rules are valuable for governance audit because they may indicate exceptions to standard policy that are practiced but not documented, or they may indicate inconsistencies in decision-making where different reviewers apply different standards.

We extract minority rules and classify them as either legitimate exceptions (the rule produces correct predictions on its support set) or inconsistencies (the rule produces mixed predictions, suggesting that the underlying data contains conflicting decisions for similar situations). Inconsistency detection enables the MARIA OS value scanning engine to identify governance gaps where the organization's stated values and practiced behaviors diverge.

7.3 Policy Drift Detection

By training random forests on sequential time windows of decision data, we can detect policy drift — changes in the organization's decision-making logic over time. For each time window, we extract the consensus rules and compare them with the rules from the previous window. New rules indicate emerging decision patterns, disappeared rules indicate abandoned practices, and modified rules (same features but shifted thresholds) indicate gradual policy evolution.

Policy drift detection is implemented as a scheduled job in MARIA OS that trains a fresh random forest on the most recent 90 days of decision data and compares the extracted rules with the established baseline. Significant drift triggers an alert to governance officers, who can then investigate whether the drift reflects intentional policy change or unintended practice deviation.

8. MARIA OS Evidence Layer Integration

Random forests integrate with the MARIA OS evidence layer to provide data-driven governance insights. The evidence layer collects, classifies, and assesses evidence bundles that support decisions in the pipeline. Random forests enhance this layer by quantifying evidence quality, predicting evidence sufficiency, and identifying evidence gaps.

8.1 Evidence Quality Scoring

Each evidence item in a decision's evidence bundle is scored for quality using a random forest model trained on historical evidence-outcome pairs. The quality score reflects the evidence's predictive value for the decision outcome:

q(e_i) = P(\text{success} | \text{evidence includes } e_i) - P(\text{success} | \text{evidence excludes } e_i) $$

This difference is estimated using the permutation importance of the evidence features, with each evidence item treated as a binary feature (present or absent) in the random forest. High-quality evidence items are those whose presence significantly increases the predicted probability of success.

8.2 Evidence Sufficiency Prediction

Before a decision proceeds to the approval gate, the random forest assesses whether the evidence bundle is sufficient. Evidence sufficiency is defined as the probability that the decision will succeed given the current evidence, compared against a configurable threshold:

\text{sufficient}(E) = \mathbb{1}\left[ P(\text{success} | E) \geq \tau_{\text{evidence}} \right] $$

If the evidence is insufficient, the model identifies the most impactful missing evidence by computing the expected success probability if each candidate evidence type were added. The top-k candidate evidence types are recommended to the decision proposer, enabling proactive evidence collection.

8.3 Evidence-Outcome Feedback Loop

The random forest model is continuously updated with completed decision outcomes, creating a feedback loop between evidence collection and decision success. Over time, the model learns which types of evidence are most predictive of success for each decision type, organizational context, and risk level. This learning is surfaced to decision proposers as evidence preparation guidelines: before submitting a decision proposal, the proposer receives a personalized checklist of recommended evidence items based on the random forest's analysis of what evidence matters most for their specific decision context.

9. Complementary Relationship with Gradient Boosting

Random forests and gradient boosting are not competitors within the Decision Layer — they are complements that serve different functions. This section formalizes their complementary relationship and defines when each algorithm should be preferred.

9.1 Prediction vs Interpretation Tradeoff

On the MARIA OS benchmark, gradient boosting (XGBoost) achieves 91.3% approval prediction accuracy versus 88.7% for random forests — a 2.6% advantage. However, random forests provide exact permutation importance, born-again tree extraction, and OOB error estimation — interpretability features that gradient boosting cannot match. The gradient boosting model is used for gate decisions where accuracy is paramount, while the random forest model is used for governance analysis where interpretability is paramount.

9.2 Ensemble Diversity

Using both gradient boosting and random forests provides ensemble diversity that neither alone achieves. The two models make different errors because they are constructed differently: gradient boosting is biased toward recent corrections (later trees focus on hard examples), while random forests are unbiased (each tree is an independent sample of the decision function). A simple average of the two models' predictions achieves 92.1% accuracy — better than either alone — because the errors are partially uncorrelated.

9.3 Dual-Model Architecture in MARIA OS

The MARIA OS Decision Layer implements a dual-model architecture where both gradient boosting and random forest models are trained on the same data and deployed in parallel. The gradient boosting model drives the gate decision (approve, escalate, or standard review), while the random forest model provides the explanation (feature importance, policy tree, evidence sufficiency). The two models' predictions are also compared as a consistency check: if the models disagree (gradient boosting predicts approve, random forest predicts reject), the decision is automatically escalated for human review, regardless of the individual model confidences.

10. Experimental Evaluation

10.1 Setup

We evaluate random forests on the MARIA OS Enterprise Decision Benchmark (500K records, 89 features, temporal split). The random forest is configured with B=500 trees, m=sqrt(89)=9 features per split, no maximum depth limit (trees grow until pure leaves or a minimum of 5 samples per leaf), and no class weighting. We compare with XGBoost (as the primary alternative) and a single decision tree (as the interpretability baseline).

10.2 Prediction Performance

Metric	Random Forest	XGBoost	Single Tree (D=5)	Single Tree (D=20)
Approval Accuracy	88.7%	91.3%	78.4%	84.1%
Risk AUC	0.91	0.94	0.77	0.85
Success RMSE	0.098	0.087	0.142	0.118
OOB Error Estimate	11.4%	N/A	N/A	N/A
True Test Error	11.3%	8.7%	21.6%	15.9%

Random forests are 2-3% less accurate than XGBoost but substantially more accurate than single decision trees. The OOB error estimate (11.4%) is within 0.1% of the true test error (11.3%), confirming the theoretical analysis of OOB unbiasedness.

10.3 Interpretability Evaluation

Metric	Random Forest	XGBoost + SHAP	Single Tree
Expert Rank Correlation (MDA)	0.93	0.89 (SHAP)	0.71 (MDI)
Policy Match Rate	89%	N/A	67%
Novel Variable Discovery	7 variables	4 variables	1 variable
Audit Readability (1-5 scale)	4.2	3.1	4.7

Random forests achieve the best balance of feature importance accuracy (0.93 expert rank correlation) and policy extraction quality (89% match rate). XGBoost with SHAP provides feature contributions but cannot extract policy trees. Single decision trees are most readable but least accurate in feature importance and policy extraction.

10.4 Policy Tree Quality

The born-again tree (depth 5) extracted from the random forest captures the ensemble's decision logic with 94.7% fidelity (agreement rate with the full forest on the test set). The 23 consensus rules extracted from the full forest cover 76% of all test decisions, with the remaining 24% handled by non-consensus paths. The 7 novel variables discovered by permutation importance were validated by domain experts as genuinely influential factors that had been overlooked in the documented governance policies.

11. Related Work

Breiman (2001) introduced random forests and established their theoretical properties including consistency and OOB error estimation. Strobl et al. (2007) identified the bias of impurity-based importance for correlated features and proposed conditional permutation importance. Vidal et al. (2020) developed born-again tree extraction methods for distilling ensembles into interpretable models.

In the enterprise AI governance space, Rudin (2019) argued for inherently interpretable models over post-hoc explanations, motivating the use of random forests as the interpretability engine alongside (rather than instead of) gradient boosting for prediction. Molnar et al. (2020) provided practical guidelines for model-agnostic interpretability methods, and Murdoch et al. (2019) surveyed definitions and evaluation of interpretability in machine learning.

The application of random forests to organizational decision analysis is novel to this work. Prior work has applied random forests to financial risk scoring (Alam et al., 2020) and credit approval (Lessmann et al., 2015), but these applications focused on prediction accuracy rather than governance policy extraction and organizational structure visualization.

12. Conclusion

This paper has established random forests as the interpretability engine of the Decision Layer in the agentic company intelligence stack. While gradient boosting provides superior predictive accuracy, random forests provide indispensable interpretability capabilities: accurate feature importance through permutation analysis, governance policy extraction through born-again tree distillation, and reliable model evaluation through out-of-bag error estimation.

The experimental results demonstrate that random forest feature importance achieves 0.93 rank correlation with domain expert variable rankings — the highest among all methods evaluated — and that extracted policy trees match 89% of documented governance policies. Perhaps most importantly, permutation importance analysis discovered 7 previously undocumented governance variables, demonstrating that random forests can reveal organizational decision patterns that even domain experts have overlooked.

The dual-model architecture within MARIA OS — gradient boosting for prediction, random forests for interpretation — embodies the principle that enterprise AI governance requires both accuracy and transparency. Neither capability alone is sufficient. Accurate predictions without explanations are unauditable. Explanations without accuracy are unreliable. Together, gradient boosting and random forests form a Decision Layer that is both performant and interpretable, enabling MARIA OS to automate decisions at scale while maintaining the governance transparency that enterprise operations demand.

Future work will extend the random forest interpretability framework in three directions. First, temporal random forests that explicitly model decision evolution over time, capturing not just current decision logic but the trajectory of governance change. Second, causal random forests that incorporate causal inference to distinguish variables that cause decision outcomes from those that merely correlate with them. Third, federated random forests that enable policy extraction across multi-tenant MARIA OS deployments without sharing proprietary decision data between Galaxies.

Random Forest for Interpretable Organizational Decision Trees: Extracting Governance Logic from Ensemble Structure