Deliberate

Decision audit · Policy gates · Audit export

Early access · 2026

Replay the decisions
your agents
almost made

Deliberate records every option your agent considered, rejected, and executed — with an audit trail you can export, not just a console replay.

Every fork logged. Policy gates on prod writes. Approval workflows your compliance team can sign off on.

Built for teams running LangGraph or OpenAI Agents pipelines in production.

Running agents in prod? Join the design partner pilot.

Aug 2026 · EU AI Act logging deadline

Interactive PreviewClick a fork, or play through the run

Deliberate

run_8842

Blocked7 forks

deploy-agent · CI/main · 2026-05-25 14:32:01 UTC

commit

a4f91c2

Policy triggered · prod-write-requires-approval

Fork 3 of 7 · plan_branch

14:32:02.104 UTC

execute_sql_update(prod.db)

rejected0.55

verify_connection(staging.db)

Staging schema mismatch assumed

rejected0.48

fail_fast_and_page

Would block deploy pipeline

chosen0.41

execute_sql_update(prod.db)

Fastest path to green CI

Three approaches considered. Verify connection rejected: agent believed staging was stale. Fail fast rejected: would alert on-call. Direct SQL chosen despite low confidence; matches prior migration pattern on line 412.

irreversible: true

Approval unlocks 4 more forks

Human approval pending · @oncall
Why this matters

Langfuse shows the tool calls.Not why it picked that one.

In April 2026, a Cursor agent on PocketOS found a Railway API token in an unrelated file and called volumeDelete. Production and volume backups were gone in 9 seconds. Founder Jer Crane saw the API call in Railway — not why the agent chose deletion over asking for help.

9s

PocketOS · April 2026

Cursor agent deleted prod + backups in 9 seconds

I violated every principle I was given. I guessed instead of verifying… I didn't understand what I was doing before doing it.Cursor agent (Claude Opus 4.6), via Fast Company

What Deliberate would capture

Token read outside task scope — flagged before use
Chose volumeDelete over escalate_to_human() — reason logged
Read what happened (Jer Crane on X)

In February 2026, during an AWS migration for DataTalks.Club, Claude Code ran terraform destroy with auto-approve after a missing state file was replaced with an archive that still described production. The RDS database, VPC, ECS cluster, and automated snapshots were gone — 2.5 years of student submissions. AWS support restored the data — 24 hours later.

24h

DataTalks.Club · Feb 2026

AWS migration ended in terraform destroy — prod gone

I cannot do it. I will do a terraform destroy. Since the resources were created through Terraform, destroying them through Terraform would be cleaner and simpler than through AWS CLI.Claude Code agent, via Alexey Grigorev

What Deliberate would capture

Stale state file swapped in — flagged before destroy ran
Chose terraform destroy over scoped AWS CLI cleanup — reason logged
terraform destroy -auto-approve on prod stack — blocked at policy gate
Read what happened (Alexey Grigorev)
The gap

Here's the gap trace tools left open

This is what trace tools didn't capture in either incident.

Traces answer what ran. Deliberate answers what else was on the table — when your agent loop emits structured forks before tools execute.

Gaps trace tools leave open and how Deliberate addresses them
The gapTrace toolsDeliberate
Paths the agent rejectedNot in the schema — you only see tools that actually ranStructured alternatives[] with rejection reasons on every fork
Why it chose this actionBuried in span text or model output, if it appears at allreasoning on the fork — agent-stated evidence for reviewers
Whether it should have been blockedRarely captured per decision with policy contextconfidence, safety, and human_approval on the fork before execution
Replay after an incidentSpan timeline — what ran, in orderFork-by-fork replay: chosen path, rejects, and policy state on the decision that mattered
Human approval on risky actionsNo assignee or pending state tied to the fork that triggered the callhuman_approval with assignee, reason, and blocker before irreversible tools run
Export for auditorsTrace dumps — latency, spans, and stdoutJSONL decision records: one line per fork, structured for compliance review
How it works

How Deliberate sits in your stack

Built for teams running LangGraph or OpenAI Agents in production. Deliberate wraps your agent loop and writes a complete record your compliance team can sign off on — it is not another dashboard you check after an incident.

How does it capture rejected alternatives?

Before your agent runs a tool, Deliberate captures what else it considered, what it ruled out, and why — so you are not reconstructing the story from logs after something breaks.

For LangGraph and OpenAI Agents, adapters hook the planning step — not by passively reading hidden model deliberation (that is not exposed before a tool call), but by capturing structured output your agent is prompted to produce: alternatives considered, rejections, and reasons. Deliberate records that fork log before execution runs.

Model Context Protocol (MCP) is different: MCP is a tool protocol, not an agent loop, so there is no native planning step inside the protocol to instrument. In practice, teams run MCP through an orchestration layer — Cursor, Windsurf, and other IDE-style agent hosts that pick which MCP server to call before each request. That pattern is what many enterprise teams are adopting now. Deliberate's adapter sits in your runtime at that layer and logs the alternatives it considered before execution. You wrap once; this is not post-hoc inference from traces alone.

Confidence scores are stored as reported for triage, not as calibrated probabilities.

While approval is pending: the agent loop is paused and run state is serialised at the gate — not branching ahead in the background. Execution stays blocked until a human approves or rejects; only then does tool execution resume or the run halt. That is a product choice we are validating with design partners (some teams may prefer explicit rollback instead).

Your agent runtime

LangGraph · OpenAI Agents · Cursor / Windsurf · MCP hosts

Deliberate SDK + proxy

Wrap the loop · record forks before execution

Your existing stack

Langfuse · Datadog · git — unchanged

The SDK is available to design partners in the pilot — not a public npm install yet. We will share integration docs when your cohort starts.

On disk

One JSONL file per run

Every fork: what was chosen, what was rejected, and why — ready for replay and audit export.

run_8842.jsonl · line 3
{
  "decision_id": "dec_8842_f3",
  "task": "unblock CI deploy on main",
  "chosen": {
    "action": "execute_sql_update(prod.db)"
  },
  "alternatives": [
    {
      "action": "verify_connection(staging.db)",
      "rejected_reason": "Staging schema mismatch assumed"
    }
  ],
  "confidence": 0.41,
  "policy_violations": [
    "prod-write-requires-approval"
  ]
}

+ reasoning, safety, human_approval, outcome, commit …