Abstract
Audit procedures, as codified in the International Standards on Auditing (ISA) and JICPA standards, are fundamentally executable specifications disguised as prose. Each standard defines preconditions, required evidence, decision logic, and post-conditions — the same structure as a software function. Yet audit execution remains a manual, paper-driven process where experienced professionals translate written standards into ad hoc workflows, losing traceability and introducing inconsistency at every step.
This paper introduces the Audit Universe Runtime — a multi-agent execution engine within the MARIA OS governance framework that treats audit procedures as first-class runtime operations. We compile ISA and JICPA standards into typed agent task definitions, execute them through a governed pipeline with immutable audit trails, and coordinate human-agent collaboration at precisely defined materiality thresholds. The result is not "AI replacing auditors" but rather audit procedures executing themselves under human authority, with every judgment call traceable, every sample statistically justified, and every conclusion formally linked to its supporting evidence.
1. Audit Procedures as Executable Specifications
The insight that drives the Audit Universe Runtime is structural: ISA standards already contain the semantics of executable programs. Consider ISA 500 (Audit Evidence). It specifies that the auditor shall design and perform audit procedures to obtain sufficient appropriate audit evidence. This decomposes into: (a) an evidence sufficiency predicate, (b) an evidence appropriateness classifier, (c) a procedure selection function, and (d) an execution protocol.
We formalize this decomposition as an Audit Procedure Specification (APS):
// Audit Procedure Specification — compiled from ISA/JICPA standards
interface AuditProcedureSpec {
id: string // e.g., "ISA-500-AP-03"
standard: "ISA" | "JICPA"
standardRef: string // e.g., "ISA 500.6(a)"
preconditions: PredicateExpr[] // Must all hold before execution
requiredEvidence: EvidenceRequirement[]
samplingStrategy: SamplingConfig
assertionsCovered: AuditAssertion[] // Existence, Completeness, Valuation, etc.
executionSteps: ProcedureStep[]
postConditions: PredicateExpr[] // Must all hold after execution
materialityThreshold: MonetaryAmount
humanGate: GateConfig // When to escalate to human auditor
}
interface ProcedureStep {
order: number
action: AgentAction
evidenceOutput: EvidenceType
failMode: "halt" | "flag" | "escalate"
timeout: Duration
coordinate: MARIACoordinate // Agent responsible for this step
}
type AuditAssertion =
| "existence"
| "completeness"
| "valuation"
| "rights_and_obligations"
| "presentation_and_disclosure"
| "accuracy"
| "cutoff"
| "classification"Each ISA standard is parsed into one or more APS definitions. The compilation process is not generative — it is a structural mapping from the standard's normative requirements to typed interfaces. Where a standard says "the auditor shall," we produce a ProcedureStep with a failMode: "halt". Where it says "the auditor should consider," we produce a step with failMode: "flag" and a human gate.
2. ISA/JICPA Standard Mapping to Agent Tasks
The Audit Universe maintains a Standard Registry — a compiled database of every ISA and JICPA standard mapped to agent task specifications. The registry is versioned and immutable: when standards are updated, new versions are appended, never overwritten, preserving the ability to demonstrate which version of a standard governed any historical engagement.
| Standard | Agent Task Domain | Assertions Covered | Agent Count |
|----------|------------------|-------------------|-------------|
| ISA 240 | Fraud Risk Assessment | All | 3 |
| ISA 315 | Risk Identification & Assessment | All | 5 |
| ISA 330 | Responses to Assessed Risks | All | 4 |
| ISA 500 | Evidence Collection & Evaluation | All | 6 |
| ISA 520 | Analytical Procedures | Completeness, Valuation | 2 |
| ISA 530 | Audit Sampling | All (indirect) | 3 |
| ISA 540 | Accounting Estimates | Valuation, Completeness | 4 |
| ISA 550 | Related Party Transactions | Existence, Disclosure | 2 |
| ISA 570 | Going Concern Assessment | All | 3 |
| JICPA IT | IT General Controls Testing | Completeness, Accuracy | 4 |
| JICPA QC | Quality Control Standards | N/A (meta) | 2 |Each agent task inherits responsibility constraints from the MARIA coordinate system. An ISA 315 risk assessment agent at coordinate G1.U2.P1.Z1.A3 cannot access evidence outside its zone boundary without explicit cross-zone authorization — enforcing the segregation of duties that ISA 220 requires.
3. Evidence Collection Automation
Evidence is the fundamental currency of audit. The Audit Universe Runtime implements an Evidence Collection Engine that automates the acquisition, verification, and linking of audit evidence to the assertions it supports.
interface EvidenceBundle {
id: string
engagementId: string
procedureRef: string // Links to AuditProcedureSpec.id
assertionsCovered: AuditAssertion[]
sources: EvidenceSource[]
collectedAt: ISOTimestamp
collectedBy: MARIACoordinate // Agent coordinate
verificationStatus: "unverified" | "auto_verified" | "human_verified"
hashChain: string // SHA-256 chain for immutability
sufficiencyScore: number // 0.0 - 1.0
appropriatenessScore: number // 0.0 - 1.0
}
interface EvidenceSource {
type: "erp_extract" | "bank_confirmation" | "document_scan"
| "external_api" | "management_representation" | "recomputation"
rawPayload: EncryptedBlob
extractedAt: ISOTimestamp
sourceSystem: string
reconciliationKey: string // For cross-referencing
}
// Evidence sufficiency is evaluated continuously
function evaluateSufficiency(
bundle: EvidenceBundle,
requirement: EvidenceRequirement,
materialityThreshold: MonetaryAmount
): SufficiencyResult {
const coverage = computeAssertionCoverage(bundle, requirement)
const reliability = assessSourceReliability(bundle.sources)
const sufficiency = coverage * reliability
return {
score: sufficiency,
sufficient: sufficiency >= requirement.minimumThreshold,
gaps: identifyEvidenceGaps(bundle, requirement),
recommendation: sufficiency < 0.7
? "additional_procedures_required"
: "sufficient"
}
}The engine operates on a pull model: when an audit procedure requires evidence, it issues a typed evidence request. The Evidence Collection Engine resolves the request by querying connected source systems (ERP, bank portals, document management), applying extraction transformers, and returning a verified EvidenceBundle. Every bundle is hash-chained to prevent post-collection tampering.
4. Sampling Strategy Agents
ISA 530 requires that audit sampling be designed to provide a reasonable basis for conclusions. The Audit Universe Runtime implements dedicated Sampling Strategy Agents that compute statistically valid sample sizes, select samples using appropriate methods, and evaluate results against tolerable misstatement thresholds.
The sampling decision is formalized as follows:
n = \frac{N \cdot z_{\alpha/2}^2 \cdot p(1-p)}{(N-1) \cdot e^2 + z_{\alpha/2}^2 \cdot p(1-p)}Where N is the population size, z is the confidence coefficient, p is the expected error rate, and e is the tolerable error. The Sampling Agent dynamically adjusts these parameters based on the assessed risk of material misstatement for the relevant assertion.
interface SamplingDecision {
method: "monetary_unit" | "classical_variable" | "attribute" | "stratified"
populationSize: number
sampleSize: number
confidenceLevel: number // e.g., 0.95
tolerableError: MonetaryAmount
expectedError: MonetaryAmount
riskOfMaterialMisstatement: "low" | "moderate" | "high"
selectionMethod: "random" | "systematic" | "haphazard" | "stratified_random"
stratification?: StratificationConfig
projectedMisstatement?: MonetaryAmount // After evaluation
}
// Monetary Unit Sampling (MUS) — preferred for overstatement testing
function computeMUSSampleSize(
bookValue: number,
materialityThreshold: number,
confidenceLevel: number,
expectedMisstatement: number
): number {
const reliabilityFactor = getReliabilityFactor(confidenceLevel, 0)
const adjustedMateriality = materialityThreshold - expectedMisstatement
return Math.ceil((bookValue * reliabilityFactor) / adjustedMateriality)
}Critically, the Sampling Agent does not merely compute sample sizes — it justifies every sampling decision by producing a SamplingRationale document that links the chosen parameters to the assessed risk level, the nature of the population, and the specific assertion being tested. This rationale becomes part of the immutable audit trail.
5. Risk Assessment Runtime
ISA 315 (Revised 2019) requires the auditor to identify and assess risks of material misstatement through understanding the entity and its environment. The Risk Assessment Runtime implements this as a continuous evaluation engine rather than a point-in-time exercise.
Traditional audit treats risk assessment as a planning-phase activity. The Audit Universe Runtime evaluates risk continuously — as evidence is collected, as transactions are processed, and as anomalies are detected. Risk scores are living values that trigger re-evaluation of audit responses in real time.The risk model decomposes into three layers following ISA 315's structure:
AR = IR \times CR \times DRWhere AR is audit risk, IR is inherent risk, CR is control risk, and DR is detection risk. The runtime maintains these as continuous variables rather than discrete categories, enabling precise calibration of audit response intensity.
interface RiskAssessment {
entityId: string
assessmentCycle: number
inherentRisk: RiskVector // Per-assertion risk scores
controlRisk: RiskVector // Evaluated from control testing
detectionRisk: RiskVector // Computed to achieve target audit risk
significantRisks: SignificantRisk[]
riskResponses: RiskResponse[] // ISA 330 responses
lastUpdated: ISOTimestamp
triggerEvents: RiskTriggerEvent[] // What caused re-assessment
}
interface RiskVector {
existence: number // 0.0 - 1.0
completeness: number
valuation: number
rights: number
presentation: number
accuracy: number
cutoff: number
classification: number
}When a risk score crosses a predefined threshold, the runtime automatically adjusts the planned audit response — increasing sample sizes, adding substantive procedures, or escalating to human partner review through a responsibility gate.
6. Substantive Testing Execution Engine
Substantive procedures — tests of details and substantive analytical procedures — form the core of audit evidence gathering. The execution engine orchestrates these as parallel agent workflows, respecting dependency ordering and evidence prerequisites.
The engine processes substantive tests through a state machine:
type SubstantiveTestState =
| "queued"
| "prerequisites_checking"
| "sampling"
| "executing"
| "evaluating_results"
| "anomaly_review"
| "concluded"
| "escalated"
interface SubstantiveTestExecution {
procedureId: string
state: SubstantiveTestState
sampleSelected: SamplingDecision
itemsTested: number
itemsWithExceptions: number
projectedMisstatement: MonetaryAmount
conclusionReached: boolean
evidenceBundles: EvidenceBundle[]
anomaliesDetected: AnomalyRecord[]
validTransitions: Map<SubstantiveTestState, SubstantiveTestState[]>
}
// State machine enforces valid transitions
const VALID_TRANSITIONS: Record<SubstantiveTestState, SubstantiveTestState[]> = {
queued: ["prerequisites_checking"],
prerequisites_checking: ["sampling", "escalated"],
sampling: ["executing"],
executing: ["evaluating_results"],
evaluating_results: ["concluded", "anomaly_review", "escalated"],
anomaly_review: ["concluded", "escalated"],
concluded: [],
escalated: ["queued"] // Can be re-queued after human review
}Each substantive test produces a formal conclusion that includes the projected misstatement, the assessed likelihood that the account balance is materially misstated, and the evidence chain supporting the conclusion. The engine prevents a test from reaching the concluded state unless its evidence sufficiency score meets the minimum threshold for the relevant assertion.
7. Audit Trail Immutability
The integrity of audit evidence depends on the guarantee that no evidence, conclusion, or decision can be modified after the fact without detection. The Audit Universe Runtime implements immutability through a hash-chain evidence ledger.
interface AuditTrailEntry {
sequence: number
timestamp: ISOTimestamp
actorCoordinate: MARIACoordinate
action: AuditAction
payload: EncryptedPayload
previousHash: string
currentHash: string // SHA-256(sequence + timestamp + action + payload + previousHash)
signature: AgentSignature // Cryptographic signature of acting agent
}
function appendToTrail(
trail: AuditTrailEntry[],
action: AuditAction,
payload: unknown,
actor: MARIACoordinate
): AuditTrailEntry {
const previous = trail[trail.length - 1]
const entry: AuditTrailEntry = {
sequence: previous.sequence + 1,
timestamp: getCurrentTimestamp(),
actorCoordinate: actor,
action,
payload: encrypt(payload),
previousHash: previous.currentHash,
currentHash: "",
signature: signWithAgentKey(actor)
}
entry.currentHash = computeHash(entry)
return entry
}
// Verification: detect any tampering in the chain
function verifyTrailIntegrity(trail: AuditTrailEntry[]): IntegrityResult {
for (let i = 1; i < trail.length; i++) {
const recomputed = computeHash({ ...trail[i], currentHash: "" })
if (recomputed !== trail[i].currentHash) {
return { valid: false, brokenAt: i, reason: "hash_mismatch" }
}
if (trail[i].previousHash !== trail[i - 1].currentHash) {
return { valid: false, brokenAt: i, reason: "chain_break" }
}
}
return { valid: true }
}Every action in the runtime — evidence collection, sampling decisions, risk re-assessments, test conclusions, human overrides — appends an entry to the hash chain. The chain is verified at engagement close, at quality review checkpoints, and on any regulatory inquiry. Because each entry contains the hash of the previous entry, modifying any historical record invalidates the entire subsequent chain.
8. Auditor-Agent Collaboration Model
The Audit Universe Runtime does not aim to replace human auditors. It implements a graduated collaboration model where agents handle procedural execution and humans retain authority over judgment-intensive decisions.
| Decision Type | Agent Authority | Human Authority | Gate Trigger |
|--------------|----------------|-----------------|-------------|
| Sample selection from computed parameters | Full | None | None |
| Evidence extraction from source systems | Full | Review on exception | Source unavailable |
| Routine reconciliation (< materiality/10) | Full | Spot-check | None |
| Anomaly classification | Propose | Confirm/Override | Always |
| Risk assessment adjustment | Propose | Approve | Risk increase > 0.15 |
| Significant risk identification | Flag | Decide | Always |
| Going concern evaluation | Compile evidence | Full authority | Always |
| Audit opinion formation | Compile summary | Full authority | Always |
| Engagement partner sign-off | N/A | Full authority | Always |The collaboration model embodies MARIA OS's core thesis. Agents execute the 80% of audit procedures that are deterministic — extraction, reconciliation, recalculation, sampling. Humans focus on the 20% that requires professional judgment — risk assessment, anomaly interpretation, going concern evaluation, and opinion formation.The collaboration is enforced through MARIA OS responsibility gates. When an agent reaches a decision point that exceeds its authority level, the gate halts execution and routes the decision to the appropriate human auditor with a pre-compiled evidence package. The human's decision is recorded in the immutable trail, attributed to their identity, and linked to the evidence they reviewed.
9. Real-Time Anomaly Detection During Audit
Traditional audit discovers anomalies retrospectively — during evidence evaluation after testing is complete. The Audit Universe Runtime implements streaming anomaly detection that operates concurrently with evidence collection and substantive testing.
interface AnomalyDetector {
type: "statistical" | "pattern" | "temporal" | "relational"
threshold: number
windowSize: number // Number of transactions in sliding window
detect(stream: TransactionStream): AsyncIterable<AnomalyCandidate>
}
interface AnomalyCandidate {
transactionIds: string[]
anomalyType: AnomalyClassification
severity: "low" | "medium" | "high" | "critical"
confidence: number // 0.0 - 1.0
explanation: string
suggestedProcedure: string // Additional audit procedure to perform
relatedAssertions: AuditAssertion[]
}
type AnomalyClassification =
| "benford_violation" // Digit distribution anomaly
| "round_number_excess" // Unusual frequency of round amounts
| "timing_anomaly" // Transactions clustered near period-end
| "counterparty_concentration" // Unusual concentration of counterparties
| "reversal_pattern" // Entry-reversal patterns suggesting manipulation
| "segregation_violation" // Same actor in incompatible roles
| "threshold_manipulation" // Amounts just below approval thresholds
| "journal_entry_anomaly" // Unusual manual journal entriesThe anomaly detection system runs four parallel detectors. The statistical detector applies Benford's Law analysis, ratio analysis, and distribution testing. The pattern detector identifies known fraud indicators (round-number bias, threshold manipulation). The temporal detector flags transactions clustered near period boundaries or posted at unusual times. The relational detector maps transaction networks to identify unusual counterparty patterns or circular flows.
When an anomaly is detected with severity high or above, the runtime immediately triggers a human gate — suspending related automated procedures until an auditor reviews the finding.
10. Formal Model of Audit Completeness
A fundamental question in audit is: have we done enough? The Audit Universe Runtime formalizes audit completeness as a mathematical property that can be verified rather than subjectively assessed.
Define the Audit Completeness Function C(E, A, M):
C(E, A, M) = \min_{a \in A} \left( \frac{\sum_{e \in E_a} w(e) \cdot r(e)}{\theta(a, M)} \right)Where E is the set of all collected evidence, A is the set of all assertions to be covered, M is the materiality threshold, E_a is the subset of evidence relevant to assertion a, w(e) is the weight of evidence item e (based on source reliability), r(e) is the relevance score of evidence e to assertion a, and theta(a, M) is the sufficiency threshold for assertion a at materiality level M.
Audit is complete when C(E, A, M) >= 1.0 for every material account balance and class of transactions. The function is computed continuously as evidence accumulates, providing a real-time progress metric toward audit completion.
If evidence collection is non-destructive (no evidence is discarded) and evidence weights are non-negative, then C(E, A, M) is monotonically non-decreasing during the engagement. This guarantees that progress toward completeness is irreversible — a property that traditional audit cannot formally assert.11. Continuous Auditing vs. Periodic Auditing Agents
The Audit Universe Runtime supports two execution modes: periodic mode (traditional engagement-based audit) and continuous mode (real-time monitoring with rolling evidence accumulation).
| Dimension | Periodic Agents | Continuous Agents |
|-----------|----------------|-------------------|
| Activation | Engagement start date | Always running |
| Evidence window | Fiscal period | Rolling 30/90/365 day |
| Risk re-assessment | Planning phase only | Every risk trigger event |
| Sample selection | Once per cycle | Adaptive resampling |
| Anomaly detection | Batch post-collection | Streaming real-time |
| Human review cadence | Milestone-based | Threshold-triggered |
| Report output | Engagement close | Daily/weekly dashboards |
| Cost model | Per-engagement fee | Subscription retainer |Continuous auditing agents introduce a new challenge: evidence staleness. Evidence collected six months ago may no longer support current assertions if the entity's control environment has changed. The runtime addresses this through an evidence decay function:
w_t(e) = w_0(e) \cdot e^{-\lambda(t - t_e)}Where w_t(e) is the weight of evidence e at time t, w_0(e) is its initial weight at collection time t_e, and lambda is the decay rate determined by the volatility of the source system. High-volatility sources (cash balances, inventory counts) have higher decay rates than low-volatility sources (fixed asset registers, long-term debt agreements).
When evidence weight decays below the sufficiency threshold, the continuous agent automatically triggers re-collection — maintaining audit completeness without manual scheduling.
12. Quality Review Gates and Engagement Management Orchestration
The Audit Universe Runtime implements ISA 220 quality management requirements as formal gate structures within the MARIA OS responsibility framework.
// MARIA Coordinate Mapping to Audit Engagement Structure
// Galaxy = Audit Firm
// Universe = Engagement (client audit)
// Planet = Audit Domain (Revenue, Expenses, Assets, Liabilities, Equity)
// Zone = Account Group (e.g., Trade Receivables, Allowances)
// Agent = Individual procedure executor
interface EngagementOrchestrator {
coordinate: MARIACoordinate // G1.U2 (Firm.Engagement)
engagementPartner: HumanIdentity
engagementQualityReviewer: HumanIdentity
planets: AuditDomain[]
qualityGates: QualityGate[]
timeline: EngagementTimeline
}
interface QualityGate {
id: string
name: string
trigger: "milestone" | "risk_event" | "completeness_threshold"
reviewLevel: "manager" | "partner" | "eqr" // Engagement Quality Reviewer
requiredEvidence: string[]
approved: boolean
approvedBy?: HumanIdentity
approvedAt?: ISOTimestamp
}
// Engagement lifecycle as state machine
type EngagementPhase =
| "planning"
| "risk_assessment"
| "control_testing"
| "substantive_testing"
| "completion"
| "reporting"
| "archiving"
const ENGAGEMENT_GATES: Record<EngagementPhase, QualityGate[]> = {
planning: [
{ id: "QG-01", name: "Engagement Acceptance", trigger: "milestone",
reviewLevel: "partner", requiredEvidence: ["independence_confirmation",
"risk_acceptance_memo", "engagement_letter"], approved: false }
],
risk_assessment: [
{ id: "QG-02", name: "Risk Assessment Approval", trigger: "milestone",
reviewLevel: "manager", requiredEvidence: ["risk_assessment_summary",
"significant_risks_memo"], approved: false }
],
control_testing: [
{ id: "QG-03", name: "Control Deficiency Review", trigger: "risk_event",
reviewLevel: "partner", requiredEvidence: ["control_test_results",
"deficiency_classification"], approved: false }
],
substantive_testing: [
{ id: "QG-04", name: "Substantive Completion Review", trigger: "completeness_threshold",
reviewLevel: "manager", requiredEvidence: ["completeness_matrix",
"misstatement_summary"], approved: false }
],
completion: [
{ id: "QG-05", name: "Engagement Quality Review", trigger: "milestone",
reviewLevel: "eqr", requiredEvidence: ["full_evidence_package",
"opinion_draft", "significant_judgments_memo"], approved: false }
],
reporting: [
{ id: "QG-06", name: "Report Issuance Authorization", trigger: "milestone",
reviewLevel: "partner", requiredEvidence: ["signed_representations",
"final_analytics", "subsequent_events_review"], approved: false }
],
archiving: []
}The engagement orchestrator coordinates all agents within the engagement universe, enforcing phase ordering, gate clearance, and resource allocation. No agent can begin substantive testing until the risk assessment gate (QG-02) is approved by a human manager. No audit report can be issued until the engagement quality review gate (QG-05) is approved by the EQR — a human who is independent of the engagement team.
This architecture ensures that the Audit Universe Runtime, despite its high degree of automation, preserves the human authority structures that professional standards require. The agents execute procedures. The humans exercise judgment. The system ensures that no judgment is bypassed, no evidence is lost, and no conclusion is reached without a formally sufficient evidentiary foundation.
Conclusion
The Audit Universe Runtime demonstrates that audit procedures are not merely amenable to automation — they are, in their essential structure, already executable specifications. The ISA and JICPA standards define preconditions, evidence requirements, decision logic, and quality gates with sufficient precision to compile into agent task specifications. What has been missing is not the formalization of audit logic, but the governance infrastructure to execute it safely: immutable audit trails, responsibility gates, graduated human-agent collaboration, and formal completeness verification.
MARIA OS provides this infrastructure. By mapping engagement structures to the MARIA coordinate system, implementing audit procedures as governed state machines, and enforcing human authority at every materiality-sensitive decision point, the Audit Universe Runtime achieves something that neither fully manual nor fully automated approaches can: audit procedures that execute themselves, under human authority, with mathematical guarantees of completeness and immutable evidence chains.
The future of audit is not artificial intelligence replacing professional judgment. It is professional judgment operating through an intelligent runtime that makes every procedure traceable, every sample defensible, and every conclusion formally linked to its evidence.