CATEGORY ARCHIVE

Engineering

20 MARIA OS articles in the Engineering category, covering Bonginkan, Judgment OS, AI governance, decision intelligence, and agentic organization architecture.

20 articles|Published by Bonginkan

Judgment OS / Decision Intelligence OS

Core MARIA OS research on turning organizational judgment into executable decision systems.

Agentic Company Architecture

Research on human-agent organizations, delegation boundaries, role topology, and governed autonomy.

Responsibility Gates and AI Governance

Safety, accountability, fail-closed gates, auditability, and human-in-the-loop control for AI agents.

Multi-Agent Mathematics

Formal models for convergence, stability, game theory, graph dynamics, and multi-agent evaluation.

Evidence, RAG, and Knowledge Governance

Evidence bundles, retrieval architecture, Graph RAG, knowledge trust, and auditable reasoning pipelines.

Agentic R&D and Judgment Science

Research operations, simulation labs, judgment science, recursive improvement, and experimental AI governance.

EngineeringJune 1, 202619 min read

Why AI Agents Fail at Real Work: It Is Not the LLM, It Is the Harness Shortage

Understanding why agents work in PoC but never reach production — through the design of purpose, authority, memory, stop conditions, recovery paths, and audit trails

The primary reason enterprise AI agents fail is not model performance alone. The essence of the failure is letting AI act without a harness that encloses purpose, authority, memory, quality, stop conditions, recovery paths, and audit trails.

AI-agentDynamic-Harnessenterprise-AIHITLMARIA-OS

EngineeringMay 30, 202610 min read

Applications Maintained by Dynamic Harness-Driven Development

A general operating model for collecting runtime evidence, planning repairs, and keeping AI-assisted products stable

This application is maintained through dynamic harness-driven development. The method treats harness results as operational evidence, converts failures into bounded repair plans, and preserves learning without exposing internal implementation details.

dynamic-harnessharness-driven-developmentsoftware-maintenanceruntime-governancequality-engineering

EngineeringMay 30, 202618 min read

Harness-Driven Development: Building Agentic Systems from Runtime Evidence Backward

A development method where scenarios, gates, scorecards, and repair boundaries are designed before implementation

Harness-driven development treats the dynamic harness as the primary specification. Instead of writing agent code first and testing it later, teams define runtime episodes, failure taxonomies, gates, and evidence contracts first, then let implementation converge toward measurable behavior.

dynamic-harnessharness-driven-developmentagent-engineeringruntime-governanceevaluation-harness

EngineeringMay 30, 202622 min read

MARIA Self-Healing Runtime: Safe Autonomous Repair for Agentic Systems

A Self-Evolving Harness Runtime design for failure analysis, patch planning, scoped fixing, cross-cutting replay, memory-driven prevention, and human approval

MARIA Self-Healing Runtime is the safety-first repair layer inside MARIA OS. It observes failures, diagnoses root causes, plans bounded repairs, creates reviewable PRs, replays cross-cutting evidence, learns prevention patterns, and keeps human authority over high-risk change.

self-evolving-harnessmaria-self-healing-runtimeautonomous-harness-runtimeself-healing-ai-systemsautonomous-fixing-agentsruntime-governancefailure-analyzerpatch-plannermemory-store

EngineeringMay 30, 202624 min read

Dynamic Workflow Agent Monitoring Harness: Mass-Producing Safe Operational Agents

Monitoring tools, quality and manufacturing-management harnesses, loop guards, and agent blueprints for scaling workflow agents inside MARIA OS

Dynamic Workflow Agents should not be mass-produced by cloning prompts. MARIA OS treats every operational agent as a monitored production unit with a blueprint, harness binding plan, quality observatory, settlement ledger, loop guard, and memory-backed improvement path.

dynamic-workflow-agentmaria-osmonitoring-harnessmanufacturing-managementquality-engineeringagent-operations

EngineeringMay 30, 202628 min read

Safety Lives in the Fan-In: Designing Fail-Closed Parallel Multi-Harness Systems

Five implementation disciplines for running multiple harnesses in parallel on an agent platform without weakening safety

On an agent platform, you want to run identity, authority, trust, and surface-specific harnesses simultaneously against a single action. But in a fail-closed system, naive parallelization quietly weakens safety. This article works through the design disciplines at the implementation level: a fan-in fold over a normalized sequence of envelopes, restrictive-side conversion of timeouts, DAG dependencies, budgets, and snapshots.

parallel-harnessfail-closedagent-governancefan-inruntime-safety

EngineeringMarch 8, 202640 min read

MARIA Voice: AGI Partner Architecture — From Emotion Detection to Meta-Cognitive Response Generation

How a 7-layer prompt hierarchy, 5 conversation modes, zero-latency knowledge injection, and sentence-level streaming create a voice AI that understands before it speaks

Voice assistants answer questions. MARIA Voice understands people. Built on a 7-layer prompt hierarchy (Constitution, Identity, Response Style, Meta-Cognition, Safety, Persona, Memory), MARIA Voice implements a full cognitive pipeline: keyword-based emotion detection, context-sensitive mode switching, 2-tier knowledge injection, 6-layer persistent memory, and mode-adaptive response generation — all optimized for real-time voice with sub-800ms first-sentence latency. This paper presents the theoretical foundations in cognitive science and therapeutic dialogue, the complete system architecture, the mathematical models underlying emotion and mode detection, and production results from thousands of voice sessions.

MARIA-VoiceAGI-assistantvoice-uiemotion-detectionmeta-cognitionprompt-engineeringconversation-modeknowledge-injectionmemory-systemstreaming

EngineeringMarch 8, 202630 min read

Agent Tool Compiler: From Natural Language Intent to Executable Tool Code via Compilation Pipeline

Agents as compilers — a formal framework mapping NL intent through intermediate representation to optimized, type-safe runtime tools

Tool-generating agents are ad-hoc code producers. We reframe tool synthesis as a compilation problem: natural language intent is parsed into an Intent AST, lowered to a Tool IR (intermediate representation), optimized through security hardening and dead code elimination passes, and emitted as type-safe executable code that hot-loads into the agent runtime. This paper presents the Agent Tool Compiler architecture with formal language theory foundations.

tool-compilercode-generationapi-designself-extending-agentagentic-company

EngineeringMarch 8, 202630 min read

Agents That Write Their Own Tools: A 4-Phase Architecture for Tool Discovery, Synthesis, Validation, and Registration in Autonomous Systems

From static tool chains to self-extending capability — how MARIA OS agents create the tools they need at runtime

Normal agents wait for humans to build tools. MARIA OS agents create their own. This paper details the 4-phase tool lifecycle — Discovery, Synthesis, Validation, Registration — that enables agents to identify missing capabilities, generate tool implementations, verify correctness and safety in sandboxed environments, and hot-load new tools into the OS runtime. We formalize tool generation rate, quality convergence, and multi-agent tool sharing, and present a case study of an Audit agent creating an OCR extraction tool at runtime.

tool-synthesistool-discoverytool-validationself-extending-agentagentic-company

EngineeringMarch 8, 202630 min read

MARIA OS Evaluation Harness: A Standard Testing Infrastructure for Measuring Agent Quality

Formal test categories, composite scoring, and continuous evaluation pipelines that transform agent quality from subjective assessment into reproducible engineering measurement

Agent quality cannot be managed if it cannot be measured. Traditional software testing verifies deterministic input-output mappings, but AI agents operate in stochastic, multi-step decision spaces where correctness is contextual, safety is probabilistic, and governance compliance is structural. This paper introduces the MARIA OS Evaluation Harness — a standardized testing infrastructure that defines four test categories (correctness, safety, performance, governance compliance), four primary metrics (decision accuracy, gate compliance rate, evidence quality score, latency under load), and a formal composite scoring framework. We present the harness architecture comprising a test runner, scenario generator, oracle comparator, and regression detector, all scoped through MARIA coordinates for hierarchical test targeting. We prove that the composite agent score is monotonically responsive to genuine quality improvements and demonstrate that continuous evaluation pipelines catch 94.7% of quality regressions before production deployment.

evaluation-harnessagent-qualitytestingbenchmarksagentic-company

EngineeringFebruary 22, 202648 min read

Robot Judgment OS Lab: Designing Responsibility-Bounded Physical-World AI with Multi-Universe Gates

An agentic R&D team architecture for robot governance research — two lab divisions, eleven specialized agents, and five research themes bridging MARIA OS Multi-Universe evaluation with physical-world robotic systems

Physical-world robots demand governance architectures that digital-only agent systems cannot provide: sub-millisecond fail-closed gates, real-time multi-universe conflict detection, embodied ethical learning under sensor noise, and quantitative human-robot responsibility allocation at every decision node. This paper presents the Robot Judgment OS Lab — an agentic R&D team design embedded within the MARIA OS coordinate system, organized into two divisions (Robot Gate Architecture Lab and Embodied Learning & Conflict Lab) with eleven specialized agents operating under fail-closed research gates. We formalize five research themes: Responsibility-Bounded Robot Decision, Physical-World Conflict Mapping, Embodied Ethical Learning, Human-Robot Responsibility Matrix, and ROS2 Multi-Universe Bridge. Mathematical contributions include a real-time ConflictScore function, constrained RL for embodied ethics calibration, a four-factor responsibility decomposition protocol, safety-bounded action spaces, and a layered architecture formalization from ROS2 base through Multi-Universe, Gate, and Conflict layers. The lab design demonstrates that structured R&D governance — where research teams are themselves governed by the infrastructure they study — produces faster, safer, and more auditable advances in robot judgment than traditional unstructured robotics research.

roboticsrobot-osphysical-worldmulti-universefail-closedembodied-ethicsconflict-mappingresponsibility-matrixMARIA-OSROS2

EngineeringFebruary 16, 202630 min read

Real-Time Meeting Session Orchestration: State Machine Design for Multi-Component Bot Systems

How a seven-state machine coordinates browser automation, audio capture, speech recognition, and live streaming into a coherent meeting intelligence pipeline

A meeting AI bot is not a single component — it is an orchestra of subsystems that must start, coordinate, and stop in precise sequence. The browser must launch before audio can be captured. Audio must flow before speech recognition begins. Recognition must produce segments before minutes can be generated. And when the meeting ends, all components must shut down gracefully without losing data. This paper presents the state machine design of MARIA Meeting AI's session manager, which coordinates Playwright browser automation, CDP audio capture, Gemini Live Audio ASR, and incremental minutes generation through a seven-state lifecycle with EventEmitter-based real-time streaming to dashboard clients.

meeting-aistate-machineorchestrationevent-drivenssereal-timeplaywrightsession-management

EngineeringFebruary 15, 202641 min read

The Complete Action Router: From Theory to Implementation to Scaling in MARIA OS

End-to-end architecture of the three-layer Action Router stack (Intent Parser, Action Resolver, Gate Controller), with recursive optimization and scaling patterns for 100+ agent deployments

The Action Router Intelligence Theory established that routing must control actions, not classify words. This paper presents the full implementation architecture: a three-layer stack of Intent Parser (context-aware goal extraction), Action Resolver (state-dependent action selection with precondition-effect semantics), and Gate Controller (risk-tiered execution envelopes integrated with MARIA OS governance). We detail a recursive optimization loop in which routing policies learn from execution outcomes, formalized as an online convex optimization problem with O(√T) regret. We then present a scaling architecture for 100+ concurrent agents using coordinate-based sharding, hierarchical action caches, and zone-local resolution. Integration with the MARIA OS Decision Pipeline state machine is formalized as a product automaton. Production benchmarks show sub-30ms P99 latency at 10,000 routing decisions per second, with first-attempt accuracy improving from 93.4% to 97.8% after 30 days of recursive learning.

action-routerscalingimplementationMARIA-OSmulti-agentstate-machinerecursive-improvement

EngineeringFebruary 15, 202632 min read

Sentence-Level Streaming VUI Architecture: From Cognitive Theory to Production Implementation in MARIA OS

How sentence-boundary detection, sequential TTS chaining, and rolling conversation summaries create a natural-feeling voice interface with long-session stability

Voice user interfaces face a core tradeoff: stream tokens immediately for low latency, or wait for larger semantic units to improve naturalness. MARIA OS resolves this with sentence-level streaming: detect sentence boundaries from Gemini token streams in real time, queue each sentence for sequential ElevenLabs TTS playback, and coordinate full-duplex interaction through barge-in control, speech debouncing, and heartbeat-based recovery. This paper presents the cognitive basis for sentence-level granularity, the production `useGeminiLive` architecture, a 29-tool action router across 4 teams with confidence-weighted team inference, and the rolling-summary mechanism for long voice sessions. In 2,400+ production sessions, the system achieved sub-800ms first-sentence latency with zero sentence-ordering violations, including compatibility handling for 9 in-app browser environments.

voice-uistreamingTTSspeech-recognitionreal-timeGeminiElevenLabsaction-routerMARIA-OScognitive-science

EngineeringFebruary 14, 202644 min read

Communication Topology and Information Cascading in Planet 100: Bottleneck Detection and Bandwidth Optimization in 100+ Agent Clusters

Spectral analysis of the 111-agent communication matrix identifies eigenvalue-based bottleneck signatures and routing strategies

We analyze Planet 100's communication network as a weighted directed graph over 111 agents. Using the eigenvalue spectrum of the normalized communication matrix, we identify bottleneck regions from spectral partitions, derive routing strategies with minimum-cost flow optimization, and show that spectral-guided bandwidth allocation reduces cascading failures by 84% while improving end-to-end throughput by 2.3x.

planet-100communication-topologyinformation-cascadingbottleneck-detectionbandwidth-optimizationspectral-analysisagent-clusters

EngineeringFebruary 14, 202617 min read

Cognitive Load Balancing in Human-Agent Hybrid Teams: Scheduling Human Attention as a Limited Resource

A practical workload model for routing review to people who still have real attention left

Human oversight fails when review demand is treated as infinite capacity. This article presents a practical control model for supervisor load, priority routing, and rest-aware scheduling. The emphasis is operational: estimate available attention, protect high-priority reviews, and avoid the common failure mode where humans are technically in the loop but cognitively saturated.

team-designcognitive-loadworkload-distributionhuman-agent-hybridattention-allocationqueueing-theoryfatigue-modeloversight-quality

EngineeringFebruary 14, 202618 min read

Fault-Tolerant Team Architectures: Reliability Patterns for Multi-Agent Systems Without Mathematical Overclaim

Use redundant role coverage, graceful degradation, and recovery drills instead of fragile point estimates

Multi-agent teams fail when a required role disappears and nobody can safely take over. This article reframes fault tolerance around role coverage, standby design, and recovery speed. Rather than overpromising precise MTTF values, it focuses on the operational question that matters: how many failures can the team absorb before a critical function becomes unstaffed?

team-designfault-toleranceresiliencereliability-engineeringredundancygraceful-degradationMTTFsingle-point-of-failure

EngineeringFebruary 14, 202638 min read

Productive Disagreement Protocol for Agent Teams: Structured Dissent for Higher-Quality Decisions

Operationalize evidence-backed dissent, validation diversity, and anti-groupthink interventions

Structured disagreement channels dissent into testable claims, improving decision quality without collapsing throughput.

agent-teamsdisagreement-protocolgroupthink-preventionmeta-insightdecision-qualityorganizational-learningmulti-agent-governancevalidation-diversitySEO-research

EngineeringFebruary 12, 202636 min read

Engineering Case Study: Quality Gate Control Theory for Manufacturing AI

Applying established control theory, R2R-aware manufacturing practice, and MARIA OS audit gates to simulated semiconductor quality cascades

Manufacturing AI systems face a stability problem that traditional software governance often does not: defect rates evolve as continuous dynamical variables under material variation, tool wear, and environmental drift. This engineering case study applies established PID, Lyapunov, and BIBO analysis to quality gates, positions the approach against semiconductor run-to-run control, and shows how MARIA OS adds fail-closed escalation, evidence bundles, and audit coordinates. The reported 94.7% defect containment, sub-200ms gate response, and 0.12x/stage attenuation are simulation results on a tuned linear model, not production fab measurements.

manufacturingquality-gatecontrol-theorystability-analysisreal-timedefect-rategovernance

EngineeringFebruary 12, 202645 min read

Responsible Robot Judgment OS: Multi-Universe Gate Control for Physical-World Autonomous Decision Systems

Extending fail-closed responsibility gates from digital agents to physical-world robotic systems

Physical-world robots operate under hard real-time constraints where fail-closed gates must halt actuators within milliseconds. This paper introduces a multi-universe evaluation architecture for robotic decision systems across Safety, Regulatory, Efficiency, Ethics, and Human Comfort universes. We analyze how responsibility-bounded judgment can be maintained under latency constraints, sensor noise, and embodied ethical drift, and describe components including a Robot Gate Engine, real-time conflict heatmap, ethics-calibration model, responsibility protocol, and a layered architecture bridging MARIA OS with ROS2.

roboticsrobot-judgmentphysical-worldfail-closedembodied-ethicsROS2MARIA-OS