Abstract
The default deployment model for AI platforms is cloud-native: containerized microservices running on hyperscaler infrastructure with elastic scaling. For many enterprises, this model works. For regulated industries — banking, healthcare, defense, energy, government — it does not. Data sovereignty regulations, air-gap requirements, latency constraints on real-time decision systems, and supply chain security concerns create a hard boundary: the AI governance platform must run on infrastructure the organization physically controls.
This paper presents the MARIA OS Appliance Reference Architecture — a complete specification for deploying MARIA OS as a self-contained, rack-mountable appliance. We define three hardware tiers (evaluation, production, enterprise), three network modes (air-gapped, hybrid, cloud-connected), and three deployment topologies (single-node, HA cluster, multi-site federation). The architecture preserves all MARIA OS governance guarantees — responsibility conservation, fail-closed defaults, immutable audit trails — regardless of deployment mode.
We provide hardware bill-of-materials, software stack composition, security architecture with HSM integration, monitoring and observability design, upgrade strategy for air-gapped environments, disaster recovery procedures, capacity planning models, and a TCO analysis framework comparing on-premise to cloud deployment.
1. Why On-Premise AI Governance Matters
1.1 Data Sovereignty as a Hard Constraint
AI governance systems process the most sensitive data in an organization: decision rationale, responsibility assignments, value alignments, approval chains, evidence bundles. This data is the organization's judgment made explicit. In regulated industries, this data is subject to strict residency requirements:
- Financial services: Decision audit trails must be retained on-premise for 7+ years under SOX, MiFID II, and Basel III requirements. Cross-border data transfer triggers additional regulatory review.
- Healthcare: Patient-affecting decisions fall under HIPAA, GDPR Article 9 (special category data), and national health data protection laws. The decision pipeline itself becomes a medical device component under FDA 21 CFR Part 11.
- Defense and government: Classified and CUI (Controlled Unclassified Information) decision data requires air-gapped processing under NIST 800-171 and CMMC Level 3+.
- Critical infrastructure: Energy, water, and transportation systems require decision latency under 100ms for real-time governance, making round-trip cloud calls impractical.
1.2 The Latency Argument
Decision pipeline latency is not merely a performance concern — it is a governance concern. When a responsibility gate must evaluate whether an AI agent's proposed action requires human approval, the evaluation must complete before the action window closes. For physical-world decisions (manufacturing, robotics, energy grid), this window can be as narrow as 50ms.
L_{\text{gate}} = L_{\text{eval}} + L_{\text{evidence}} + L_{\text{network}} \leq L_{\text{action\_window}}On-premise deployment eliminates $L_{\text{network}}$ (typically 20-80ms to cloud), leaving more budget for evidence evaluation and gate logic. For an action window of 100ms, eliminating 50ms of network latency doubles the available compute time for governance evaluation.
1.3 Supply Chain Security
An AI governance platform is a critical dependency for every automated decision in the organization. Cloud deployment introduces supply chain risks: hyperscaler outages, API deprecations, pricing changes, and geopolitical risks affecting data center availability. On-premise deployment converts these variable risks into fixed, manageable infrastructure under the organization's direct control.
2. Appliance Form Factor Definition
The MARIA OS Appliance is a pre-configured, validated hardware-software bundle delivered as a rack-mountable unit. Three tiers serve different deployment scales:
| Tier | Model | Form Factor | Agent Capacity | Use Case |
| --- | --- | --- | --- | --- |
| Evaluation | M-100 | 2U rackmount | 1-10 agents | PoC, development, testing |
| Production | M-400 | 4U rackmount | 10-100 agents | Single-site production |
| Enterprise | M-900 | 8U rackmount (2x4U) | 100-500 agents | Multi-site federation primary |Each tier is a validated configuration — hardware, firmware, OS, and MARIA OS software are tested together as a unit. This eliminates the combinatorial explosion of hardware-software compatibility issues that plague DIY on-premise deployments.
The appliance ships with a hardware manifest cryptographically signed by the MARIA OS supply chain verification system. On first boot, the appliance validates its own hardware against the manifest, detecting any component substitution or tampering during shipping.
3. Hardware Reference Specification
3.1 Compute Architecture
# M-400 Production Tier — Hardware Specification
compute:
cpu:
model: "AMD EPYC 9454 (Genoa)"
cores: 48
threads: 96
base_clock_ghz: 2.75
boost_clock_ghz: 3.8
tdp_watts: 290
quantity: 2 # Dual socket
purpose: "Decision pipeline, governance engine, API serving"
gpu:
model: "NVIDIA L40S"
vram_gb: 48
quantity: 2
interconnect: "PCIe Gen5 x16"
purpose: "Agent inference, value scanning, evidence embedding"
ram:
type: "DDR5-4800 ECC RDIMM"
capacity_gb: 512
channels: 12
purpose: "In-memory decision state, agent context windows"
storage:
tier_1_hot:
type: "NVMe U.2 PCIe Gen5"
capacity_tb: 3.84
quantity: 4
raid: "RAID-10"
effective_capacity_tb: 7.68
purpose: "Active decision state, agent runtime, governance DB"
tier_2_warm:
type: "NVMe U.2 PCIe Gen4"
capacity_tb: 7.68
quantity: 4
raid: "RAID-6"
effective_capacity_tb: 15.36
purpose: "Decision audit logs, evidence bundles (90-day window)"
tier_3_cold:
type: "SAS SSD"
capacity_tb: 15.36
quantity: 4
raid: "RAID-6"
effective_capacity_tb: 30.72
purpose: "Long-term audit archive, compliance retention"
networking:
management: "1GbE BMC/IPMI dedicated"
data:
- "2x 25GbE SFP28 (cluster interconnect)"
- "2x 10GbE RJ45 (application traffic)"
storage_fabric: "1x 100GbE QSFP28 (optional, for external storage)"
security_hardware:
tpm: "TPM 2.0 (firmware integrity)"
hsm: "FIPS 140-3 Level 3 PCIe HSM module (key management)"3.2 GPU Sizing Rationale
Agent inference is the primary GPU workload. Each MARIA OS agent runs a quantized language model for decision evaluation, evidence analysis, and value scanning. The sizing formula:
G_{\text{required}} = \left\lceil \frac{N_{\text{agents}} \times M_{\text{model}} \times B_{\text{batch}}}{V_{\text{gpu}} \times U_{\text{target}}} \right\rceilWhere $N_{\text{agents}}$ is concurrent agent count, $M_{\text{model}}$ is model memory footprint (typically 4-8 GB for quantized 7B models), $B_{\text{batch}}$ is batch overhead factor (1.3x), $V_{\text{gpu}}$ is per-GPU VRAM, and $U_{\text{target}}$ is target utilization (0.85). For the M-400 with 50 agents running 4-bit quantized 7B models: $G = \lceil (50 \times 4 \times 1.3) / (48 \times 0.85) \rceil = \lceil 6.37 \rceil = 2$ GPUs.
4. Network Topology and Deployment Modes
4.1 Three Network Modes
The appliance supports three network configurations, selectable at deployment time:
Air-Gapped Mode: No external network connectivity. All models, updates, and configurations are loaded via physically transported media (encrypted USB or optical). The governance engine operates with zero external dependencies. Model updates are delivered on signed, encrypted media with chain-of-custody tracking.
Hybrid Mode: Outbound-only connectivity through a data diode or one-way gateway. Telemetry, anonymized governance metrics, and update requests flow out; update packages flow in through a separate, audited channel. Decision data never leaves the premise.
Cloud-Connected Mode: Encrypted tunnel to MARIA OS cloud services for model updates, telemetry aggregation, and optional cloud-burst inference during peak loads. Decision data remains on-premise; only model weights and anonymized operational metrics traverse the tunnel.
// Network mode configuration — set at deployment, enforced by firewall rules
interface ApplianceNetworkConfig {
mode: "air-gapped" | "hybrid" | "cloud-connected";
// Air-gapped: all undefined
// Hybrid: only outbound defined
// Cloud-connected: both defined
outbound?: {
endpoint: string;
protocol: "mTLS" | "WireGuard";
allowList: string[]; // Explicit IP allowlist
dataDiode: boolean; // Hardware-enforced one-way for hybrid
};
inbound?: {
endpoint: string;
protocol: "mTLS";
allowList: string[];
rateLimit: { requestsPerMinute: number };
};
// Always present — governs what data classes can leave the appliance
dataClassification: {
decisionData: "never-transmit";
auditLogs: "never-transmit";
evidenceBundles: "never-transmit";
operationalMetrics: "transmit-anonymized" | "never-transmit";
modelUpdateRequests: "transmit" | "never-transmit";
};
}4.2 Cluster Interconnect
For HA and multi-node deployments, nodes communicate over a dedicated 25GbE cluster interconnect using mTLS with certificates issued by the on-board HSM. The cluster protocol uses a Raft-based consensus for decision pipeline state, ensuring that no decision is lost or duplicated during node transitions.
5. Software Stack Layers
The appliance software stack is organized in five layers, each with a clear responsibility boundary:
# MARIA OS Appliance Software Stack
layers:
L0_platform:
os: "Ubuntu 24.04 LTS (hardened, CIS Level 2)"
kernel: "6.8 LTS (custom: real-time patches, SELinux enforcing)"
firmware: "Signed UEFI with Secure Boot chain"
purpose: "Hardware abstraction, security foundation"
L1_container_runtime:
runtime: "containerd 2.0"
orchestration: "K3s (lightweight Kubernetes)"
networking: "Cilium (eBPF-based, no iptables)"
storage: "Longhorn (replicated block storage)"
purpose: "Workload isolation, resource management"
L2_data_layer:
primary_db: "PostgreSQL 17 (Patroni HA)"
cache: "DragonflyDB (Redis-compatible, multi-threaded)"
event_bus: "NATS JetStream (embedded, no external dependency)"
object_store: "MinIO (S3-compatible, local storage)"
purpose: "State persistence, event streaming, object storage"
L3_maria_core:
decision_pipeline: "6-stage state machine with transition validation"
governance_engine: "Responsibility gates, approval workflows"
audit_system: "Immutable append-only log (hash-chained)"
evidence_engine: "Evidence collection, verification, bundling"
value_scanner: "Behavioral value extraction and gap analysis"
coordinate_system: "G.U.P.Z.A hierarchical addressing"
purpose: "Core governance logic, decision processing"
L4_agent_runtime:
inference: "vLLM (GPU) / llama.cpp (CPU fallback)"
model_store: "Local model registry (OCI-compatible)"
agent_lifecycle: "Spawn, monitor, constrain, terminate"
sandbox: "gVisor (agent code isolation)"
purpose: "Agent execution, model serving, isolation"Each layer is independently upgradeable. Layer boundaries are enforced by container namespaces and network policies — a compromised agent in L4 cannot access the governance engine in L3 or the data layer in L2.
6. Deployment Topologies
6.1 Single-Node (M-100, M-400)
The simplest topology: all software layers run on a single appliance. Suitable for evaluation, development, and production deployments with modest agent counts (< 50). The single node runs the full stack including database, governance engine, and agent runtime. Backup is handled by scheduled snapshots to an external NAS or removable media.
6.2 HA Cluster (3-Node M-400)
Production deployments requiring high availability use a 3-node cluster with Raft consensus:
// HA Cluster Configuration
interface HAClusterConfig {
nodes: 3 | 5; // Odd number for Raft quorum
topology: {
leader: {
role: "primary";
services: ["decision-pipeline", "governance-engine", "api-gateway"];
};
followers: {
role: "standby";
services: ["decision-pipeline-replica", "read-api", "agent-runtime"];
replicationLag: { maxMs: 50 };
};
};
failover: {
detectionMethod: "heartbeat + decision-pipeline-health";
detectionTimeoutMs: 2000;
promotionTimeMs: 4200; // Measured p99
inFlightDecisionRecovery: "replay-from-wal";
};
database: {
ha: "patroni";
syncReplicas: 1; // At least 1 sync replica
asyncReplicas: 1; // Remaining nodes async
walShipping: true;
};
}The cluster guarantees zero decision loss during failover: in-flight decisions are replayed from the write-ahead log on the new leader. The maximum data loss window (RPO) is 0 for synchronous replicas.
6.3 Multi-Site Federation (M-900)
Enterprise deployments spanning multiple geographic locations use a federated topology. Each site runs an independent HA cluster with full local autonomy. A federation layer synchronizes governance policies, agent definitions, and aggregated audit summaries across sites — but raw decision data never leaves its originating site.
\text{Federation\_Consistency} = \frac{|P_{\text{local}} \cap P_{\text{global}}|}{|P_{\text{global}}|} \geq 0.999Policy consistency across federated sites is maintained at 99.9%+ through a gossip-based protocol that converges within 30 seconds of a policy update at any site.
7. Security Architecture
7.1 Hardware Security Module (HSM) Integration
The appliance includes a FIPS 140-3 Level 3 certified HSM module that manages all cryptographic operations:
- Decision signing: Every decision transition is signed with a key held exclusively in the HSM. This creates a tamper-evident chain — any modification to the decision audit trail invalidates the signature chain.
- Audit log integrity: The immutable audit log uses hash chaining with HSM-held keys. Verification requires the HSM, making offline log tampering detectable.
- mTLS certificate issuance: All inter-service and inter-node certificates are issued by the HSM-backed PKI. No private key ever exists outside the HSM boundary.
- Encryption at rest: All storage tiers use AES-256-XTS with keys derived from the HSM. Key rotation occurs monthly without service interruption.
7.2 Zero-Trust Networking
# Zero-trust network policy (Cilium)
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: governance-engine-policy
spec:
endpointSelector:
matchLabels:
app: governance-engine
ingress:
- fromEndpoints:
- matchLabels:
app: decision-pipeline
- matchLabels:
app: api-gateway
toPorts:
- ports:
- port: "8443"
protocol: TCP
rules:
http:
- method: POST
path: "/api/v1/gates/.*"
egress:
- toEndpoints:
- matchLabels:
app: postgresql
toPorts:
- ports:
- port: "5432"
protocol: TCP
- toEndpoints:
- matchLabels:
app: audit-log
toPorts:
- ports:
- port: "8444"
protocol: TCPEvery service-to-service communication requires mutual TLS authentication and is restricted to explicitly allowed paths. The default policy is deny-all — services must be explicitly permitted to communicate. This ensures that even if an agent runtime is compromised, it cannot directly access the governance engine or database.
7.3 Agent Sandboxing
Each agent runs inside a gVisor sandbox that intercepts all system calls. Agents cannot access the host filesystem, network (except through a governed proxy), or other agent processes. Resource limits (CPU, memory, GPU time) are enforced per-agent to prevent denial-of-service from a misbehaving agent.
8. Monitoring and Observability
The appliance includes a self-contained observability stack that requires no external dependencies:
| Layer | Tool | Purpose | Retention |
| --- | --- | --- | --- |
| Metrics | VictoriaMetrics | Time-series metrics (system + governance KPIs) | 90 days on-box |
| Logs | Loki | Structured log aggregation | 90 days hot, 1 year warm |
| Traces | Tempo | Distributed tracing (decision pipeline) | 30 days |
| Dashboards | Grafana | Visualization and alerting | N/A |
| Alerts | Alertmanager | Alert routing (email, webhook, PagerDuty) | N/A |Governance-specific metrics are first-class citizens in the observability stack:
- maria_decisions_total — Counter of decisions by stage and outcome
- maria_gate_latency_seconds — Histogram of responsibility gate evaluation time
- maria_responsibility_conservation_ratio — Gauge measuring responsibility preservation across decision composition
- maria_audit_chain_integrity — Boolean gauge (1 = intact, 0 = broken chain detected)
- maria_agent_sandbox_violations_total — Counter of blocked system calls per agent
Alert rules ship pre-configured for critical governance invariant violations. An audit chain integrity failure triggers an immediate P1 alert with automatic pipeline pause.
9. Upgrade and Patching Strategy
9.1 Air-Gapped Update Process
For air-gapped deployments, updates are delivered on cryptographically signed media:
1. Build: MARIA OS CI/CD produces a signed update bundle containing OS patches, container images, and database migrations.
2. Transfer: The bundle is written to encrypted removable media with a chain-of-custody manifest.
3. Verify: On the appliance, the update agent verifies the bundle signature against the HSM-held MARIA OS root certificate.
4. Stage: Container images are loaded into the local registry. Database migrations are validated against the current schema.
5. Apply: A blue-green deployment swaps traffic to the updated stack. The previous version remains available for instant rollback.
6. Validate: Post-update health checks verify all 11 governance invariants. If any check fails, automatic rollback occurs within 60 seconds.
9.2 Rolling Upgrades (HA Cluster)
In HA deployments, upgrades are applied one node at a time. The cluster maintains quorum throughout the process. Each node upgrade follows the stage-apply-validate cycle before proceeding to the next node. Total cluster upgrade time for a 3-node deployment: approximately 45 minutes with zero downtime.
T_{\text{upgrade}} = N_{\text{nodes}} \times (T_{\text{drain}} + T_{\text{apply}} + T_{\text{validate}}) = 3 \times (3 + 8 + 4) = 45 \text{ min}10. Disaster Recovery and Backup
10.1 Backup Architecture
The backup strategy follows a 3-2-1 model adapted for air-gapped environments:
- 3 copies: Primary (live), on-box snapshot, external backup
- 2 media types: NVMe (live + snapshot), removable encrypted SSD (external)
- 1 off-site: For non-air-gapped deployments, encrypted backup to a geographically separate location
// Disaster Recovery Configuration
interface DRConfig {
backup: {
database: {
method: "pg_basebackup + WAL archiving";
frequency: "continuous WAL + daily base backup";
retention: { days: 30, walRetention: "7 days" };
encryption: "AES-256-GCM (HSM-managed key)";
};
auditLogs: {
method: "immutable snapshot";
frequency: "hourly";
retention: { years: 7 }; // Regulatory minimum
integrityVerification: "hash-chain validation on restore";
};
agentState: {
method: "checkpoint + replay";
frequency: "every 100 decisions or 5 minutes";
retention: { days: 7 };
};
};
recovery: {
rto: { singleNode: "4 hours", haCluster: "15 minutes" };
rpo: { singleNode: "1 hour", haCluster: "0 (sync replication)" };
procedure: "automated with manual approval gate";
testFrequency: "quarterly";
};
}10.2 Immutable Audit Recovery
The audit log is the most critical data asset. Even in a total appliance loss scenario, the audit log must be recoverable and verifiable. The hash-chained structure allows integrity verification from any backup — if a single entry has been modified, the chain breaks at that point, and the exact modification is identifiable.
11. Capacity Planning Model
11.1 Resource Scaling Formula
Capacity planning for MARIA OS appliances follows a predictable model based on three primary dimensions:
R_{\text{total}} = \sum_{i=1}^{N} \left( R_{\text{agent}_i} + R_{\text{pipeline}} + R_{\text{governance}} + R_{\text{audit}} \right)Where each resource class scales differently:
- CPU: Linear with agent count. Each agent consumes approximately 0.5 vCPU for orchestration logic. The governance engine adds a fixed overhead of 4 vCPU.
- GPU VRAM: Step function. Each model instance serves multiple agents via batched inference. Adding the $(k+1)$-th model instance is required when agent count exceeds $k \times \lfloor V_{\text{gpu}} / M_{\text{model}} \rfloor$.
- Storage: Linear with decision volume. Each decision produces approximately 12 KB of audit data (decision record + evidence references + transition log). At 1,000 decisions/day, this accumulates to approximately 4.3 GB/year of audit data.
- RAM: Sub-linear. Agent context windows share a common embedding cache. Memory scales as $O(N^{0.7})$ due to cache sharing.
11.2 Sizing Table
| Agents | Decisions/Day | Tier | GPU | CPU Cores | RAM (GB) | Hot Storage (TB) |
| --- | --- | --- | --- | --- | --- | --- |
| 5 | 500 | M-100 | 1x L40S | 32 | 128 | 1.92 |
| 25 | 2,500 | M-400 | 2x L40S | 96 | 256 | 3.84 |
| 50 | 5,000 | M-400 | 2x L40S | 96 | 512 | 7.68 |
| 100 | 10,000 | M-900 | 4x L40S | 192 | 1024 | 15.36 |
| 250 | 25,000 | M-900 (cluster) | 8x L40S | 384 | 2048 | 30.72 |12. Cloud vs. On-Premise: TCO Analysis Framework
12.1 Cost Components
A fair TCO comparison must account for all cost components in both deployment models:
// TCO Analysis Framework
interface TCOModel {
onPremise: {
capex: {
hardware: number; // Appliance purchase price
installation: number; // Rack, power, cooling setup
networkInfrastructure: number;
};
opex: {
power: number; // kWh * rate * PUE
cooling: number; // Included in PUE
rackSpace: number; // Colocation or owned DC
staffing: number; // 0.25 FTE per appliance (estimated)
maintenance: number; // Hardware warranty + support contract
softwareLicense: number; // MARIA OS on-premise license
upgrades: number; // Hardware refresh (5-year cycle)
};
};
cloud: {
capex: {
migration: number; // Initial setup and data migration
};
opex: {
compute: number; // GPU instances (reserved or on-demand)
storage: number; // Block + object storage
networking: number; // Egress charges
softwareLicense: number; // MARIA OS cloud license (SaaS)
staffing: number; // 0.1 FTE for cloud management
complianceOverhead: number; // Additional controls for cloud compliance
};
};
}
// Break-even formula
// T_breakeven = CAPEX_onprem / (OPEX_cloud_monthly - OPEX_onprem_monthly)
// Typical: 14-22 months for 50+ agent deployments12.2 Hidden Cloud Costs for Regulated Industries
The TCO comparison shifts significantly for regulated industries when accounting for:
- Compliance overhead: Cloud deployments in regulated industries require additional controls (encryption key management, access logging, data residency verification) that add 15-30% to base cloud costs.
- Egress fees: Decision audit data that must be exported for regulatory review incurs egress charges. At scale, this can exceed $10K/month.
- Vendor lock-in risk: Cloud-native architectures create switching costs estimated at 6-18 months of engineering effort.
- Availability guarantees: Cloud SLAs typically guarantee 99.9% (8.7 hours downtime/year). The MARIA OS HA cluster achieves 99.99% (52 minutes/year) under direct control.
For deployments exceeding 50 agents with regulatory compliance requirements, the on-premise appliance reaches TCO parity with cloud deployment at approximately 18 months. By month 36, the cumulative cost advantage reaches 37%, primarily driven by eliminated egress fees and compliance overhead.12.3 Decision Sovereignty Premium
Beyond cost, on-premise deployment provides a decision sovereignty premium that has no direct cloud equivalent: the mathematical guarantee that no decision data, responsibility assignment, or governance evaluation has ever traversed infrastructure outside the organization's physical and legal control. For industries where a data breach has existential consequences — defense contractors, critical infrastructure operators, healthcare systems handling life-affecting decisions — this guarantee is not a feature. It is a requirement.
Conclusion
The MARIA OS Appliance Reference Architecture demonstrates that on-premise AI governance is not a compromise — it is a design choice that strengthens governance guarantees while reducing long-term costs for regulated enterprises. The architecture preserves every MARIA OS invariant — responsibility conservation, fail-closed defaults, immutable audit trails, graduated autonomy — in a self-contained, validated, upgradeable form factor.
The key insight is that governance locality strengthens governance. When the decision pipeline, governance engine, and audit system run on infrastructure under the organization's direct physical control, the attack surface shrinks, latency budgets expand, and regulatory compliance simplifies from a continuous verification problem to a one-time validation event.
For organizations where judgment is the product and responsibility is the architecture, the MARIA OS Appliance provides the infrastructure to make both concrete, auditable, and sovereign.