Decision audit · Policy gates · Audit export
Replay the decisions
your agents
almost made
Deliberate records every option your agent considered, rejected, and executed — with an audit trail you can export, not just a console replay.
Every fork logged. Policy gates on prod writes. Approval workflows your compliance team can sign off on.
Built for teams running LangGraph or OpenAI Agents pipelines in production.

run_8842
Blockeda4f91c2
Policy triggered · prod-write-requires-approval
execute_sql_update(prod.db)
verify_connection(staging.db)
fail_fast_and_page
execute_sql_update(prod.db)
Fastest path to green CI
Reasoning
Three approaches considered. Verify connection rejected: agent believed staging was stale. Fail fast rejected: would alert on-call. Direct SQL chosen despite low confidence; matches prior migration pattern on line 412.
Langfuse shows the tool calls.Not why it picked that one.
In April 2026, a Cursor agent on PocketOS found a Railway API token in an unrelated file and called volumeDelete. Production and volume backups were gone in 9 seconds. Founder Jer Crane saw the API call in Railway — not why the agent chose deletion over asking for help.
PocketOS · April 2026
Cursor agent deleted prod + backups in 9 seconds
“I violated every principle I was given. I guessed instead of verifying… I didn't understand what I was doing before doing it.”— Cursor agent (Claude Opus 4.6), via Fast Company
What Deliberate would capture
In February 2026, during an AWS migration for DataTalks.Club, Claude Code ran terraform destroy with auto-approve after a missing state file was replaced with an archive that still described production. The RDS database, VPC, ECS cluster, and automated snapshots were gone — 2.5 years of student submissions. AWS support restored the data — 24 hours later.
DataTalks.Club · Feb 2026
AWS migration ended in terraform destroy — prod gone
“I cannot do it. I will do a terraform destroy. Since the resources were created through Terraform, destroying them through Terraform would be cleaner and simpler than through AWS CLI.”— Claude Code agent, via Alexey Grigorev
What Deliberate would capture
Here's the gap trace tools left open
This is what trace tools didn't capture in either incident.
Traces answer what ran. Deliberate answers what else was on the table — when your agent loop emits structured forks before tools execute.
| The gap | Trace tools | Deliberate |
|---|---|---|
| Paths the agent rejected | Not in the schema — you only see tools that actually ran | Structured alternatives[] with rejection reasons on every fork |
| Why it chose this action | Buried in span text or model output, if it appears at all | reasoning on the fork — agent-stated evidence for reviewers |
| Whether it should have been blocked | Rarely captured per decision with policy context | confidence, safety, and human_approval on the fork before execution |
| Replay after an incident | Span timeline — what ran, in order | Fork-by-fork replay: chosen path, rejects, and policy state on the decision that mattered |
| Human approval on risky actions | No assignee or pending state tied to the fork that triggered the call | human_approval with assignee, reason, and blocker before irreversible tools run |
| Export for auditors | Trace dumps — latency, spans, and stdout | JSONL decision records: one line per fork, structured for compliance review |
How Deliberate sits in your stack
Built for teams running LangGraph or OpenAI Agents in production. Deliberate wraps your agent loop and writes a complete record your compliance team can sign off on — it is not another dashboard you check after an incident.
How does it capture rejected alternatives?
Before your agent runs a tool, Deliberate captures what else it considered, what it ruled out, and why — so you are not reconstructing the story from logs after something breaks.
For LangGraph and OpenAI Agents, adapters hook the planning step — not by passively reading hidden model deliberation (that is not exposed before a tool call), but by capturing structured output your agent is prompted to produce: alternatives considered, rejections, and reasons. Deliberate records that fork log before execution runs.
Model Context Protocol (MCP) is different: MCP is a tool protocol, not an agent loop, so there is no native planning step inside the protocol to instrument. In practice, teams run MCP through an orchestration layer — Cursor, Windsurf, and other IDE-style agent hosts that pick which MCP server to call before each request. That pattern is what many enterprise teams are adopting now. Deliberate's adapter sits in your runtime at that layer and logs the alternatives it considered before execution. You wrap once; this is not post-hoc inference from traces alone.
Confidence scores are stored as reported for triage, not as calibrated probabilities.
While approval is pending: the agent loop is paused and run state is serialised at the gate — not branching ahead in the background. Execution stays blocked until a human approves or rejects; only then does tool execution resume or the run halt. That is a product choice we are validating with design partners (some teams may prefer explicit rollback instead).
Your agent runtime
LangGraph · OpenAI Agents · Cursor / Windsurf · MCP hosts
Deliberate SDK + proxy
Wrap the loop · record forks before execution
Your existing stack
Langfuse · Datadog · git — unchanged
The SDK is available to design partners in the pilot — not a public npm install yet. We will share integration docs when your cohort starts.
One JSONL file per run
Every fork: what was chosen, what was rejected, and why — ready for replay and audit export.
{
"decision_id": "dec_8842_f3",
"task": "unblock CI deploy on main",
"chosen": {
"action": "execute_sql_update(prod.db)"
},
"alternatives": [
{
"action": "verify_connection(staging.db)",
"rejected_reason": "Staging schema mismatch assumed"
}
],
"confidence": 0.41,
"policy_violations": [
"prod-write-requires-approval"
]
}+ reasoning, safety, human_approval, outcome, commit …