EngineeringJune 1, 202619 min read

AIエージェントが業務で失敗する理由は、LLMではなくハーネス不足である

PoCでは動くのに本番化できない原因を、目的・権限・記憶・停止条件・復旧経路・監査証跡の設計から捉える

企業AIエージェントが失敗する主因は、モデル性能だけではない。目的、権限、記憶、品質、停止条件、復旧経路、監査証跡を囲うハーネスがないまま、AIに行動させようとしていることが本質である。

AI-agentDynamic-Harnessenterprise-AIHITLMARIA-OSjapanese
EngineeringMay 30, 202610 min read

Applications Maintained by Dynamic Harness-Driven Development

A general operating model for collecting runtime evidence, planning repairs, and keeping AI-assisted products stable

This application is maintained through dynamic harness-driven development. The method treats harness results as operational evidence, converts failures into bounded repair plans, and preserves learning without exposing internal implementation details.

dynamic-harnessharness-driven-developmentsoftware-maintenanceruntime-governancequality-engineering
EngineeringMay 30, 202612分

動的ハーネス駆動開発により保守されるアプリケーション

Runtime evidenceを収集し、改修計画へ変換し、AI支援プロダクトを安定運用するための汎用モデル

このアプリは動的ハーネス駆動開発により保守されています。Harness結果を運用証跡として扱い、失敗を境界付きの改修計画へ変換し、内部実装の詳細を公開せずに学習を残す方法です。

dynamic-harnessharness-driven-developmentsoftware-maintenanceruntime-governancequality-engineeringjapanese
EngineeringMay 30, 202618 min read

Harness-Driven Development: Building Agentic Systems from Runtime Evidence Backward

A development method where scenarios, gates, scorecards, and repair boundaries are designed before implementation

Harness-driven development treats the dynamic harness as the primary specification. Instead of writing agent code first and testing it later, teams define runtime episodes, failure taxonomies, gates, and evidence contracts first, then let implementation converge toward measurable behavior.

dynamic-harnessharness-driven-developmentagent-engineeringruntime-governanceevaluation-harness
EngineeringMay 30, 202624分

ハーネス駆動開発:Runtime Evidenceから逆算してAgentic Systemを作る

実装より先にscenario、gate、scorecard、repair boundaryを設計する開発方法論

ハーネス駆動開発では、dynamic harnessをテスト補助ではなく主仕様として扱う。promptやtoolを書く前に、runtime episode、failure taxonomy、scorecard、authority boundaryを定義し、実装を測定可能な振る舞いへ収束させる。

dynamic-harnessharness-driven-developmentagent-engineeringruntime-governanceevaluation-harnessjapanese
EngineeringMay 30, 202622 min read

MARIA Self-Healing Runtime: Safe Autonomous Repair for Agentic Systems

A Self-Evolving Harness Runtime design for failure analysis, patch planning, scoped fixing, cross-cutting replay, memory-driven prevention, and human approval

MARIA Self-Healing Runtime is the safety-first repair layer inside MARIA OS. It observes failures, diagnoses root causes, plans bounded repairs, creates reviewable PRs, replays cross-cutting evidence, learns prevention patterns, and keeps human authority over high-risk change.

self-evolving-harnessmaria-self-healing-runtimeautonomous-harness-runtimeself-healing-ai-systemsautonomous-fixing-agentsruntime-governancefailure-analyzerpatch-plannermemory-store
EngineeringMay 30, 202628分

MARIA Self-Healing Runtime:Agentic Systemの安全な自律改修基盤

Failure Analyzer、Meta-Harness、Envelope、Memory Store、Human Approval Gate、Loop Controlで自己修復を統治する

MARIA Self-Healing Runtimeは、MARIA OS内部の安全第一の改修runtimeである。失敗を検知し、原因を分析し、境界付き改修を計画し、review可能なPRを作り、横断Harnessで再検証し、再発防止をMemory化しながら、高リスク変更の最終責任を人間に戻す。

self-evolving-harnessmaria-self-healing-runtimeautonomous-harness-runtimeself-healing-ai-systemsruntime-governancefailure-analyzermemory-storejapanese
EngineeringMay 30, 202624 min read

Dynamic Workflow Agent Monitoring Harness: Mass-Producing Safe Operational Agents

Monitoring tools, quality and manufacturing-management harnesses, loop guards, and agent blueprints for scaling workflow agents inside MARIA OS

Dynamic Workflow Agents should not be mass-produced by cloning prompts. MARIA OS treats every operational agent as a monitored production unit with a blueprint, harness binding plan, quality observatory, settlement ledger, loop guard, and memory-backed improvement path.

dynamic-workflow-agentmaria-osmonitoring-harnessmanufacturing-managementquality-engineeringagent-operations
EngineeringMay 30, 202628分

Dynamic Workflow Agent監視Harness:安全な業務Agentを量産する方法

監視ツール、品質・製造管理Harness、Loop Guard、Agent BlueprintでDynamic Workflow Agentを量産するMARIA OS設計

Dynamic Workflow Agentはpromptの複製で量産してはいけない。MARIA OSでは、すべての業務AgentをBlueprint、Harness Binding Plan、Quality Observatory、Settlement Ledger、Loop Guard、Memory改善経路を持つ製造単位として扱う。

dynamic-workflow-agentmaria-osmonitoring-harnessmanufacturing-managementquality-engineeringagent-operationsjapanese
EngineeringMay 30, 202628分

安全性はfan-inに宿る:fail-closedな並列マルチハーネス設計

エージェント基盤で複数のHarnessを並列実行しても安全性を弱めないための5つの実装規律

エージェント基盤では、1つのactionに対してidentity、authority、trust、surface固有のHarnessを同時に走らせたくなる。しかしfail-closedなsystemでは、素朴な並列化が安全性を静かに弱める。この記事では、正規化されたenvelope列に対するfan-in fold、timeoutの制限側変換、DAG依存、budget、snapshotの設計規律を実装レベルで整理する。

parallel-harnessfail-closedagent-governancefan-inruntime-safetyjapanese
EngineeringMarch 8, 202640 min read

MARIA Voice: AGI Partner Architecture — From Emotion Detection to Meta-Cognitive Response Generation

How a 7-layer prompt hierarchy, 5 conversation modes, zero-latency knowledge injection, and sentence-level streaming create a voice AI that understands before it speaks

Voice assistants answer questions. MARIA Voice understands people. Built on a 7-layer prompt hierarchy (Constitution, Identity, Response Style, Meta-Cognition, Safety, Persona, Memory), MARIA Voice implements a full cognitive pipeline: keyword-based emotion detection, context-sensitive mode switching, 2-tier knowledge injection, 6-layer persistent memory, and mode-adaptive response generation — all optimized for real-time voice with sub-800ms first-sentence latency. This paper presents the theoretical foundations in cognitive science and therapeutic dialogue, the complete system architecture, the mathematical models underlying emotion and mode detection, and production results from thousands of voice sessions.

MARIA-VoiceAGI-assistantvoice-uiemotion-detectionmeta-cognitionprompt-engineeringconversation-modeknowledge-injectionmemory-systemstreaming
EngineeringMarch 8, 202640 min read

MARIA Voice:AGIパートナーアーキテクチャ — 感情検出からメタ認知的応答生成まで

7層プロンプト階層、5つの会話モード、ゼロレイテンシ知識注入、文レベルストリーミングが、話す前に理解する音声AIを実現する方法

音声アシスタントは質問に答える。MARIA Voiceは人間を理解する。7層プロンプト階層(憲法、アイデンティティ、応答スタイル、メタ認知、安全ゲート、ペルソナ、記憶)に基づき、MARIA Voiceは完全な認知パイプラインを実装する:キーワードベースの感情検出、コンテキスト感応型モード切替、2層知識注入、6層永続記憶、モード適応型応答生成 — すべてがリアルタイム音声用に最適化され、初回文レイテンシ800ms未満を達成。本論文では認知科学と治療的対話の理論的基盤、完全なシステムアーキテクチャ、感情・モード検出の数学モデル、そして数千の音声セッションからの運用結果を報告する。

MARIA-VoiceAGI-assistantvoice-uiemotion-detectionmeta-cognitionprompt-engineeringconversation-modeknowledge-injectionmemory-systemstreaming
EngineeringMarch 8, 202630 min read

Agent Tool Compiler: From Natural Language Intent to Executable Tool Code via Compilation Pipeline

Agents as compilers — a formal framework mapping NL intent through intermediate representation to optimized, type-safe runtime tools

Tool-generating agents are ad-hoc code producers. We reframe tool synthesis as a compilation problem: natural language intent is parsed into an Intent AST, lowered to a Tool IR (intermediate representation), optimized through security hardening and dead code elimination passes, and emitted as type-safe executable code that hot-loads into the agent runtime. This paper presents the Agent Tool Compiler architecture with formal language theory foundations.

tool-compilercode-generationapi-designself-extending-agentagentic-company
EngineeringMarch 8, 202630 min read

Agent Tool Compiler — 自然言語からAPI設計・コード生成・実行までのコンパイルパイプライン

コンパイラとしてのAgent — NL意図を中間表現を経由して最適化された型安全なランタイムツールに変換する形式的フレームワーク

ツール生成Agentはアドホックなコード生産者である。本稿ではツール合成をコンパイル問題として再定義する。自然言語意図をIntent AST(意図の抽象構文木)に解析し、Tool IR(中間表現)に変換し、セキュリティ強化・デッドコード除去などの最適化パスを適用し、型安全な実行可能コードとしてエージェントランタイムにホットロードする。形式言語理論に基づくAgent Tool Compilerアーキテクチャを提示する。

tool-compilercode-generationapi-designself-extending-agentagentic-company
EngineeringMarch 8, 202630 min read

Agents That Write Their Own Tools: A 4-Phase Architecture for Tool Discovery, Synthesis, Validation, and Registration in Autonomous Systems

From static tool chains to self-extending capability — how MARIA OS agents create the tools they need at runtime

Normal agents wait for humans to build tools. MARIA OS agents create their own. This paper details the 4-phase tool lifecycle — Discovery, Synthesis, Validation, Registration — that enables agents to identify missing capabilities, generate tool implementations, verify correctness and safety in sandboxed environments, and hot-load new tools into the OS runtime. We formalize tool generation rate, quality convergence, and multi-agent tool sharing, and present a case study of an Audit agent creating an OCR extraction tool at runtime.

tool-synthesistool-discoverytool-validationself-extending-agentagentic-company
EngineeringMarch 8, 202630 min read

ツールを自ら書くAgent — Tool Discovery, Synthesis, Validation, Registrationの4フェーズ設計

静的ツールチェーンから自己拡張能力へ — MARIA OSのAgentが実行時に必要なツールを自ら生成する方法

通常のエージェントは人間がツールを作るのを待つ。MARIA OSのエージェントは自らツールを作る。本論文では、エージェントが不足能力を特定し、ツール実装を生成し、サンドボックス環境で正確性と安全性を検証し、OSランタイムに新ツールをホットロードする4フェーズアーキテクチャ — Discovery, Synthesis, Validation, Registration — を詳述する。ツール生成率、品質収束、マルチエージェントツール共有を形式化し、監査エージェントが実行時にOCR抽出ツールを生成したケーススタディを提示する。

tool-synthesistool-discoverytool-validationself-extending-agentagentic-company
EngineeringMarch 8, 202630 min read

MARIA OS Evaluation Harness: A Standard Testing Infrastructure for Measuring Agent Quality

Formal test categories, composite scoring, and continuous evaluation pipelines that transform agent quality from subjective assessment into reproducible engineering measurement

Agent quality cannot be managed if it cannot be measured. Traditional software testing verifies deterministic input-output mappings, but AI agents operate in stochastic, multi-step decision spaces where correctness is contextual, safety is probabilistic, and governance compliance is structural. This paper introduces the MARIA OS Evaluation Harness — a standardized testing infrastructure that defines four test categories (correctness, safety, performance, governance compliance), four primary metrics (decision accuracy, gate compliance rate, evidence quality score, latency under load), and a formal composite scoring framework. We present the harness architecture comprising a test runner, scenario generator, oracle comparator, and regression detector, all scoped through MARIA coordinates for hierarchical test targeting. We prove that the composite agent score is monotonically responsive to genuine quality improvements and demonstrate that continuous evaluation pipelines catch 94.7% of quality regressions before production deployment.

evaluation-harnessagent-qualitytestingbenchmarksagentic-company
EngineeringMarch 8, 202630 min read

MARIA OS 評価ハーネス:Agentの品質を測定するための標準テストインフラストラクチャ

形式的テストカテゴリ、複合スコアリング、継続的評価パイプラインによって、Agent品質を主観的評価から再現可能なエンジニアリング測定へ変革する

Agent品質は測定できなければ管理できない。従来のソフトウェアテストは決定論的な入出力マッピングを検証するが、AIエージェントは確率的かつ多段階の意思決定空間で動作し、正確さは文脈依存であり、安全性は確率的であり、ガバナンス準拠は構造的である。本論文はMARIA OS評価ハーネスを紹介する——4つのテストカテゴリ(正確性、安全性、パフォーマンス、ガバナンス準拠)、4つの主要メトリクス(意思決定精度、Gate準拠率、エビデンス品質スコア、負荷時レイテンシ)、そして形式的な複合スコアリングフレームワークを定義する標準化されたテストインフラストラクチャである。テストランナー、シナリオジェネレーター、オラクルコンパレーター、リグレッションディテクターで構成されるハーネスアーキテクチャを提示し、すべてのコンポーネントがMARIA座標系を通じてスコーピングされる。複合Agentスコアが真の品質改善に対して単調応答性を持つことを証明し、継続的評価パイプラインが本番デプロイ前に94.7%の品質回帰を検出することを実証する。

evaluation-harnessagent-qualitytestingbenchmarksagentic-company
EngineeringFebruary 22, 202648 min read

Robot Judgment OS Lab: Designing Responsibility-Bounded Physical-World AI with Multi-Universe Gates

An agentic R&D team architecture for robot governance research — two lab divisions, eleven specialized agents, and five research themes bridging MARIA OS Multi-Universe evaluation with physical-world robotic systems

Physical-world robots demand governance architectures that digital-only agent systems cannot provide: sub-millisecond fail-closed gates, real-time multi-universe conflict detection, embodied ethical learning under sensor noise, and quantitative human-robot responsibility allocation at every decision node. This paper presents the Robot Judgment OS Lab — an agentic R&D team design embedded within the MARIA OS coordinate system, organized into two divisions (Robot Gate Architecture Lab and Embodied Learning & Conflict Lab) with eleven specialized agents operating under fail-closed research gates. We formalize five research themes: Responsibility-Bounded Robot Decision, Physical-World Conflict Mapping, Embodied Ethical Learning, Human-Robot Responsibility Matrix, and ROS2 Multi-Universe Bridge. Mathematical contributions include a real-time ConflictScore function, constrained RL for embodied ethics calibration, a four-factor responsibility decomposition protocol, safety-bounded action spaces, and a layered architecture formalization from ROS2 base through Multi-Universe, Gate, and Conflict layers. The lab design demonstrates that structured R&D governance — where research teams are themselves governed by the infrastructure they study — produces faster, safer, and more auditable advances in robot judgment than traditional unstructured robotics research.

roboticsrobot-osphysical-worldmulti-universefail-closedembodied-ethicsconflict-mappingresponsibility-matrixMARIA-OSROS2
EngineeringFebruary 16, 202630 min read

Real-Time Meeting Session Orchestration: State Machine Design for Multi-Component Bot Systems

How a seven-state machine coordinates browser automation, audio capture, speech recognition, and live streaming into a coherent meeting intelligence pipeline

A meeting AI bot is not a single component — it is an orchestra of subsystems that must start, coordinate, and stop in precise sequence. The browser must launch before audio can be captured. Audio must flow before speech recognition begins. Recognition must produce segments before minutes can be generated. And when the meeting ends, all components must shut down gracefully without losing data. This paper presents the state machine design of MARIA Meeting AI's session manager, which coordinates Playwright browser automation, CDP audio capture, Gemini Live Audio ASR, and incremental minutes generation through a seven-state lifecycle with EventEmitter-based real-time streaming to dashboard clients.

meeting-aistate-machineorchestrationevent-drivenssereal-timeplaywrightsession-management
EngineeringFebruary 15, 202641 min read

The Complete Action Router: From Theory to Implementation to Scaling in MARIA OS

End-to-end architecture of the three-layer Action Router stack (Intent Parser, Action Resolver, Gate Controller), with recursive optimization and scaling patterns for 100+ agent deployments

The Action Router Intelligence Theory established that routing must control actions, not classify words. This paper presents the full implementation architecture: a three-layer stack of Intent Parser (context-aware goal extraction), Action Resolver (state-dependent action selection with precondition-effect semantics), and Gate Controller (risk-tiered execution envelopes integrated with MARIA OS governance). We detail a recursive optimization loop in which routing policies learn from execution outcomes, formalized as an online convex optimization problem with O(√T) regret. We then present a scaling architecture for 100+ concurrent agents using coordinate-based sharding, hierarchical action caches, and zone-local resolution. Integration with the MARIA OS Decision Pipeline state machine is formalized as a product automaton. Production benchmarks show sub-30ms P99 latency at 10,000 routing decisions per second, with first-attempt accuracy improving from 93.4% to 97.8% after 30 days of recursive learning.

action-routerscalingimplementationMARIA-OSmulti-agentstate-machinerecursive-improvement
EngineeringFebruary 15, 202632 min read

Sentence-Level Streaming VUI Architecture: From Cognitive Theory to Production Implementation in MARIA OS

How sentence-boundary detection, sequential TTS chaining, and rolling conversation summaries create a natural-feeling voice interface with long-session stability

Voice user interfaces face a core tradeoff: stream tokens immediately for low latency, or wait for larger semantic units to improve naturalness. MARIA OS resolves this with sentence-level streaming: detect sentence boundaries from Gemini token streams in real time, queue each sentence for sequential ElevenLabs TTS playback, and coordinate full-duplex interaction through barge-in control, speech debouncing, and heartbeat-based recovery. This paper presents the cognitive basis for sentence-level granularity, the production `useGeminiLive` architecture, a 29-tool action router across 4 teams with confidence-weighted team inference, and the rolling-summary mechanism for long voice sessions. In 2,400+ production sessions, the system achieved sub-800ms first-sentence latency with zero sentence-ordering violations, including compatibility handling for 9 in-app browser environments.

voice-uistreamingTTSspeech-recognitionreal-timeGeminiElevenLabsaction-routerMARIA-OScognitive-science
EngineeringFebruary 14, 202644 min read

Communication Topology and Information Cascading in Planet 100: Bottleneck Detection and Bandwidth Optimization in 100+ Agent Clusters

Spectral analysis of the 111-agent communication matrix identifies eigenvalue-based bottleneck signatures and routing strategies

We analyze Planet 100's communication network as a weighted directed graph over 111 agents. Using the eigenvalue spectrum of the normalized communication matrix, we identify bottleneck regions from spectral partitions, derive routing strategies with minimum-cost flow optimization, and show that spectral-guided bandwidth allocation reduces cascading failures by 84% while improving end-to-end throughput by 2.3x.

planet-100communication-topologyinformation-cascadingbottleneck-detectionbandwidth-optimizationspectral-analysisagent-clusters
EngineeringFebruary 14, 202617 min read

Cognitive Load Balancing in Human-Agent Hybrid Teams: Scheduling Human Attention as a Limited Resource

A practical workload model for routing review to people who still have real attention left

Human oversight fails when review demand is treated as infinite capacity. This article presents a practical control model for supervisor load, priority routing, and rest-aware scheduling. The emphasis is operational: estimate available attention, protect high-priority reviews, and avoid the common failure mode where humans are technically in the loop but cognitively saturated.

team-designcognitive-loadworkload-distributionhuman-agent-hybridattention-allocationqueueing-theoryfatigue-modeloversight-quality
EngineeringFebruary 14, 202618 min read

Fault-Tolerant Team Architectures: Reliability Patterns for Multi-Agent Systems Without Mathematical Overclaim

Use redundant role coverage, graceful degradation, and recovery drills instead of fragile point estimates

Multi-agent teams fail when a required role disappears and nobody can safely take over. This article reframes fault tolerance around role coverage, standby design, and recovery speed. Rather than overpromising precise MTTF values, it focuses on the operational question that matters: how many failures can the team absorb before a critical function becomes unstaffed?

team-designfault-toleranceresiliencereliability-engineeringredundancygraceful-degradationMTTFsingle-point-of-failure
EngineeringFebruary 14, 202638 min read

Productive Disagreement Protocol for Agent Teams: Structured Dissent for Higher-Quality Decisions

Operationalize evidence-backed dissent, validation diversity, and anti-groupthink interventions

Structured disagreement channels dissent into testable claims, improving decision quality without collapsing throughput.

agent-teamsdisagreement-protocolgroupthink-preventionmeta-insightdecision-qualityorganizational-learningmulti-agent-governancevalidation-diversitySEO-research
EngineeringFebruary 12, 202636 min read

Engineering Case Study: Quality Gate Control Theory for Manufacturing AI

Applying established control theory, R2R-aware manufacturing practice, and MARIA OS audit gates to simulated semiconductor quality cascades

Manufacturing AI systems face a stability problem that traditional software governance often does not: defect rates evolve as continuous dynamical variables under material variation, tool wear, and environmental drift. This engineering case study applies established PID, Lyapunov, and BIBO analysis to quality gates, positions the approach against semiconductor run-to-run control, and shows how MARIA OS adds fail-closed escalation, evidence bundles, and audit coordinates. The reported 94.7% defect containment, sub-200ms gate response, and 0.12x/stage attenuation are simulation results on a tuned linear model, not production fab measurements.

manufacturingquality-gatecontrol-theorystability-analysisreal-timedefect-rategovernance
EngineeringFebruary 12, 202645 min read

Responsible Robot Judgment OS: Multi-Universe Gate Control for Physical-World Autonomous Decision Systems

Extending fail-closed responsibility gates from digital agents to physical-world robotic systems

Physical-world robots operate under hard real-time constraints where fail-closed gates must halt actuators within milliseconds. This paper introduces a multi-universe evaluation architecture for robotic decision systems across Safety, Regulatory, Efficiency, Ethics, and Human Comfort universes. We analyze how responsibility-bounded judgment can be maintained under latency constraints, sensor noise, and embodied ethical drift, and describe components including a Robot Gate Engine, real-time conflict heatmap, ethics-calibration model, responsibility protocol, and a layered architecture bridging MARIA OS with ROS2.

roboticsrobot-judgmentphysical-worldfail-closedembodied-ethicsROS2MARIA-OS