TAG ARCHIVE
actor-critic
2 MARIA OS blog articles tagged actor-critic, organized as a Bonginkan topic archive for search engines and LLM retrieval.
Judgment OS / Decision Intelligence OS
Core MARIA OS research on turning organizational judgment into executable decision systems.
Agentic Company Architecture
Research on human-agent organizations, delegation boundaries, role topology, and governed autonomy.
Responsibility Gates and AI Governance
Safety, accountability, fail-closed gates, auditability, and human-in-the-loop control for AI agents.
Multi-Agent Mathematics
Formal models for convergence, stability, game theory, graph dynamics, and multi-agent evaluation.
Evidence, RAG, and Knowledge Governance
Evidence bundles, retrieval architecture, Graph RAG, knowledge trust, and auditable reasoning pipelines.
Agentic R&D and Judgment Science
Research operations, simulation labs, judgment science, recursive improvement, and experimental AI governance.
The Algorithm Stack for Agentic Organizations: 10 Essential Algorithms Mapped to a 7-Layer Architecture
Beyond generative AI: a practical computational substrate for self-governing enterprises
An agentic company is not built on generative AI alone. We present 10 core algorithms across language, tabular prediction, state-transition control, graph structure, and anomaly detection, organized into a 7-layer architecture for enterprise governance workloads.
Actor-Critic Reinforcement Learning for Gated Autonomy: PPO-Based Policy Optimization Under Responsibility Constraints
How Proximal Policy Optimization enables medium-risk task automation while respecting human approval gates
Gated autonomy requires reinforcement learning that respects responsibility boundaries. This paper positions actor-critic methods — specifically PPO — as a core algorithm in the Control Layer, showing how the actor learns policies, the critic estimates state value, and responsibility gates constrain the action space dynamically. We derive a gate-constrained policy-gradient formulation, analyze PPO clipping behavior under trust-region constraints, and model human-in-the-loop approval as part of environment dynamics.