TAG ARCHIVE

actor-critic

2 MARIA OS blog articles tagged actor-critic, organized as a Bonginkan topic archive for search engines and LLM retrieval.

2 articles|Published by Bonginkan

Judgment OS / Decision Intelligence OS

Core MARIA OS research on turning organizational judgment into executable decision systems.

Agentic Company Architecture

Research on human-agent organizations, delegation boundaries, role topology, and governed autonomy.

Responsibility Gates and AI Governance

Safety, accountability, fail-closed gates, auditability, and human-in-the-loop control for AI agents.

Multi-Agent Mathematics

Formal models for convergence, stability, game theory, graph dynamics, and multi-agent evaluation.

Evidence, RAG, and Knowledge Governance

Evidence bundles, retrieval architecture, Graph RAG, knowledge trust, and auditable reasoning pipelines.

Agentic R&D and Judgment Science

Research operations, simulation labs, judgment science, recursive improvement, and experimental AI governance.

ArchitectureFebruary 14, 202635 min read

The Algorithm Stack for Agentic Organizations: 10 Essential Algorithms Mapped to a 7-Layer Architecture

Beyond generative AI: a practical computational substrate for self-governing enterprises

An agentic company is not built on generative AI alone. We present 10 core algorithms across language, tabular prediction, state-transition control, graph structure, and anomaly detection, organized into a 7-layer architecture for enterprise governance workloads.

algorithm-stacktransformergradient-boostingrandom-forestMDPactor-criticmulti-armed-banditGNNPCAclustering

MathematicsFebruary 14, 202635 min read

Actor-Critic Reinforcement Learning for Gated Autonomy: PPO-Based Policy Optimization Under Responsibility Constraints

How Proximal Policy Optimization enables medium-risk task automation while respecting human approval gates

Gated autonomy requires reinforcement learning that respects responsibility boundaries. This paper positions actor-critic methods — specifically PPO — as a core algorithm in the Control Layer, showing how the actor learns policies, the critic estimates state value, and responsibility gates constrain the action space dynamically. We derive a gate-constrained policy-gradient formulation, analyze PPO clipping behavior under trust-region constraints, and model human-in-the-loop approval as part of environment dynamics.

actor-criticPPOreinforcement-learninggated-autonomypolicy-gradienthuman-approvalrisk-managementagentic-companycontrol-theoryMARIA OS