Abstract
Meeting minutes are the organizational memory of decisions. When an AI system generates these minutes, the fundamental question shifts from 'what was discussed?' to 'how can we verify that the minutes accurately reflect the discussion?' Traditional AI summarization produces fluent, coherent text — but fluency is orthogonal to accuracy. A well-written summary that attributes a decision to the wrong person or fabricates a commitment that was never made is worse than no summary at all.
This paper presents the evidence-linking architecture of MARIA Meeting AI, where every extracted element — decisions, action items, discussion sections, and open questions — must reference specific transcript segments as its evidentiary basis. The system enforces a mandatory citation chain: no decision can exist in the minutes without a pointer to the transcript segment where it was discussed. This constraint eliminates hallucinated minutes by construction — if the AI cannot find a supporting segment, the decision is not included.
We formalize the evidence-linking constraint as a coverage metric, present the incremental summarization algorithm that generates and updates minutes during live meetings, and show that the structured output format (sections, decisions, actions, open questions) enables downstream governance integration with the MARIA OS Decision Pipeline.
1. The Meeting Minutes Trust Problem
1.1 Why Traditional Minutes Fail
Manual meeting minutes have always suffered from three pathologies:
1. Selection bias: The note-taker records what they consider important, which may not align with what was objectively significant. Decisions that seem obvious in the moment may be omitted; tangential discussions that interest the note-taker may be over-represented. 2. Attribution error: In fast-moving discussions, the note-taker may attribute a statement to the wrong speaker. This is particularly problematic for decisions and commitments, where attribution determines accountability. 3. Reconstruction drift: Minutes are often written after the meeting, from memory supplemented by sparse notes. The reconstructed account inevitably drifts from the actual discussion, incorporating the note-taker's interpretation and post-hoc rationalization.
AI-generated minutes introduce a fourth pathology: confident hallucination. Large language models produce fluent, authoritative text even when they lack sufficient information. A model asked to generate minutes from a noisy, fragmented transcript may invent plausible-sounding decisions that were never made, or attribute actions to participants who never volunteered for them. The fluency of the output masks the unreliability of the content.
1.2 Evidence Linking as an Architectural Constraint
MARIA Meeting AI addresses these pathologies by making evidence linking a structural requirement, not a best-effort optimization. The system's output schema requires every element to include a segmentRefs array — a list of transcript segment IDs that serve as the evidentiary basis for the element.
This is not a post-hoc citation mechanism where the AI generates text and then searches for supporting segments. Instead, the evidence links are generated simultaneously with the content, as part of the same structured output. The AI must identify the relevant segments and produce the summary in a single inference pass, ensuring that the content is grounded in specific transcript evidence.
2. The Structured Output Schema
2.1 Minutes Artifact Structure
The minutes artifact is a structured document with four component types, each requiring evidence links:
Sections represent topical clusters of discussion:
{
title: string, // Topic heading
summary: string, // Narrative summary of the discussion
segmentRefs: string[] // Transcript segments that comprise this topic
}Decisions represent commitments or conclusions reached during the meeting:
{
text: string, // The decision statement
segmentRefs: string[], // Segments where the decision was discussed/made
confidence: number // Model confidence in extraction (0-100)
}Action Items represent tasks assigned to specific individuals:
{
text: string, // Task description
owner: string | null, // Assigned person (null if unassigned)
dueDate: string | null,// Deadline if mentioned
segmentRefs: string[] // Segments where the action was discussed
}Open Questions represent unresolved issues identified during the meeting:
{
text: string, // The unresolved question
segmentRefs: string[] // Segments where the question arose
}2.2 The Citation Coverage Metric
We define citation coverage as the proportion of minutes elements that have at least one valid segment reference:
where $M$ is the set of all elements in the minutes and $T$ is the set of all transcript segment IDs. A citation coverage of 1.0 means every element in the minutes has at least one valid reference to the transcript.
We also define citation density as the average number of segment references per element:
Higher citation density indicates that elements are supported by multiple transcript segments, which correlates with extraction reliability. Decisions with $|\text{segmentRefs}| \geq 2$ are significantly more likely to be accurate than those with a single reference, as they represent topics discussed across multiple speaking turns.
3. Incremental Summarization Algorithm
3.1 The Live Minutes Problem
In a live meeting, the transcript grows continuously. The minutes must be updated incrementally — regenerating the entire document from scratch every 15 seconds would be wasteful and would produce jarring discontinuities in the live view. The incremental summarization algorithm must balance three competing objectives:
1. Freshness: New decisions and actions should appear in the minutes within one update cycle (15 seconds). 2. Stability: Existing content should not change unless new information genuinely contradicts or refines it. 3. Coherence: The minutes should read as a unified document, not as a series of appended fragments.
3.2 The Incremental Update Protocol
The algorithm operates in two modes:
Live mode (during the meeting): Every MINUTES_UPDATE_INTERVAL_MS (15 seconds), the system feeds the new transcript segments plus the existing minutes state to the Gemini model. The prompt instructs the model to:
- Add new sections if a new topic has emerged
- Extend existing sections if the topic continues
- Add new decisions, actions, or open questions as they are identified
- Update confidence scores based on accumulated evidence
- Never remove previously identified decisions unless explicitly contradicted
The existing minutes state is passed as context, not as immutable truth. The model can refine earlier sections but is instructed to preserve structural stability.
Final mode (after the meeting): Once the meeting ends, the system performs a single comprehensive pass over the entire transcript. This final pass has the complete context of the meeting and can produce a more coherent, better-organized document than the incremental updates. The final minutes include a markdown export suitable for human review and distribution.
3.3 Version Tracking
Each minutes update increments a version counter. The minutes artifact tracks its state as:
Live minutes have state live with an incrementing version number. The final comprehensive pass produces a final state document. Downstream consumers (the dashboard UI, the Decision Pipeline) can choose to display live minutes for real-time awareness or wait for the final version for governance purposes.
4. Evidence Linking in Practice
4.1 The Prompt Engineering Challenge
Enforcing evidence linking through the LLM prompt requires precise instructions. The system prompt for the minutes engine includes explicit rules:
- Every decision MUST reference at least one segment_id as evidence
- Every action item MUST reference at least one segment_id as evidence
- Never fabricate content not present in the transcript
- If a speaker is uncertain, use the speaker label as-is
- Output valid JSON matching the specified schema
The use of responseMimeType: 'application/json' in the Gemini API call enforces structured output at the API level, preventing the model from producing free-form text that omits the required fields.
4.2 Handling Ambiguous Decisions
Not all decisions are explicitly stated. Some emerge through consensus ('so we are going with option A?', followed by silence or nods). The system handles these implicit decisions by:
1. Lowering the confidence score: Implicit decisions receive confidence scores below 80, signaling that human verification may be needed. 2. Referencing the surrounding context: The segment references include not just the decision statement but the preceding discussion segments that led to it. 3. Flagging as open questions: When the model is uncertain whether a statement was a decision or a suggestion, it may classify the element as an open question rather than a decision.
This graduated confidence system prevents the over-extraction of decisions (counting suggestions as commitments) while still capturing implicit agreements that the participants likely intended as decisions.
4.3 Multilingual Evidence Linking
MARIA Meeting AI supports Japanese and English meetings. Japanese business meetings present unique challenges for evidence linking:
- Indirect agreement patterns: Japanese speakers often indicate agreement through backchannels (そうですね, はい) rather than explicit statements.
- Hierarchical deference: Decisions may be attributed to the most senior person present, even when the proposal originated from a junior participant.
- Omitted subjects: Japanese grammar frequently omits the subject, making action item attribution more difficult.
The system prompt includes language-specific instructions for Japanese meetings, and the confidence calibration is adjusted to account for these linguistic patterns.
5. Integration with MARIA OS Decision Pipeline
5.1 From Minutes to Decisions
Meeting minutes are not endpoints — they are inputs to the MARIA OS Decision Pipeline. When the final minutes are generated, the system can automatically:
1. Create decision records: Each extracted decision with confidence above 85 can be registered as a proposed decision in the pipeline.
2. Create action items: Each action item can be registered as a task with the identified owner and due date.
3. Link evidence: The transcript segment references serve as the evidence bundle for the decision, satisfying the Decision Pipeline's requirement that every decision must have a traceable evidence chain.
This integration closes the loop between meeting intelligence and organizational governance. A decision discussed in a Tuesday morning meeting becomes a traceable, auditable record in the Decision Pipeline before the meeting room is empty.
5.2 The Evidence Chain from Speech to Governance
The complete evidence chain is:
At each link, the chain is traceable. The transcript segment includes the speaker label and timestamp. The minutes decision includes the segment reference. The governance record includes the minutes artifact. An auditor can trace any organizational decision back to the exact moment in the meeting where it was discussed.
6. Conclusion
Evidence-linked meeting minutes represent a paradigm shift from AI summarization to AI documentation. The distinction is critical: summarization produces a readable account of what happened; documentation produces a verifiable record of what was decided. By enforcing mandatory citation chains — where every decision must point to its source in the transcript — MARIA Meeting AI eliminates the hallucination problem that plagues LLM-generated content and creates a foundation for organizational accountability.
The incremental summarization algorithm ensures that minutes are available in real-time, not just after the meeting ends. The structured output format (sections, decisions, actions, open questions) enables programmatic integration with downstream systems. And the confidence scoring provides a calibrated signal for human reviewers, distinguishing explicit decisions from implicit agreements that may need verification.
The result is a meeting intelligence system where trust is not assumed but constructed — link by link, segment by segment, decision by decision.