Dry-Run Evaluation¶
You need to test whether a tool call would be allowed or denied without actually executing it. The evaluate() and evaluate_batch() methods on the Edictum class check a tool call against all matching contracts and return a detailed result -- no tool execution, no session state changes, no audit events.
When to use this¶
Read this page when you need to test whether a tool call would be allowed or denied without actually executing it. evaluate() is synchronous, produces no audit events, and evaluates all matching contracts exhaustively (no short-circuit on first denial). Use it for CI/CD gating, contract change impact analysis, or building approval workflows. For the full pipeline with session state and audit events, use run(). For command-line spot-checks, use edictum check or edictum test. See the comparison table below.
Quick Example¶
from edictum import Edictum
guard = Edictum.from_yaml("contracts.yaml")
result = guard.evaluate("read_file", {"path": ".env"})
print(result.verdict) # "deny"
print(result.deny_reasons) # ["Sensitive file '.env' denied."]
evaluate()¶
def evaluate(
self,
tool_name: str,
args: dict[str, Any],
*,
principal: Principal | None = None,
output: str | None = None,
environment: str | None = None,
) -> EvaluationResult
Evaluates a single tool call against all matching contracts. This method is synchronous -- no await required.
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
tool_name |
str |
required | The tool being called |
args |
dict[str, Any] |
required | Tool call arguments |
principal |
Principal \| None |
None |
Identity context for the call |
output |
str \| None |
None |
Simulated tool output. When provided, postconditions are evaluated against this value |
environment |
str \| None |
None |
Override the guard's default environment |
Behavior¶
- Exhaustive evaluation. All matching contracts are evaluated. The pipeline does not short-circuit on the first denial -- you see every contract that would fire.
- No tool execution. The tool function is never called.
- No session state. Session contracts are skipped because there is no session context in a dry-run.
- Sandbox contracts are evaluated. Unlike session contracts, sandbox contracts are stateless and are always included in dry-run evaluation.
- Postconditions require output. Postconditions are only evaluated when
outputis provided. Without it, only preconditions and sandbox contracts are checked. - Synchronous. Unlike
guard.run(), this method does not requireasyncio.
Examples¶
Test a precondition:
result = guard.evaluate("read_file", {"path": ".env"})
assert result.verdict == "deny"
assert result.contracts[0].contract_id == "block-dotenv"
Test with principal context:
from edictum import Principal
result = guard.evaluate(
"deploy_service",
{"service": "api", "env": "production"},
principal=Principal(role="sre", ticket_ref="JIRA-456"),
)
assert result.verdict == "allow"
Test postconditions by providing output:
result = guard.evaluate(
"read_file",
{"path": "data.txt"},
output="SSN: 123-45-6789",
)
assert result.verdict == "warn"
assert len(result.warn_reasons) > 0
Test with environment override:
Test sandbox path allowlists:
# Sandbox contracts are evaluated during dry-run
result = guard.evaluate("read_file", {"path": "/etc/shadow"})
assert result.verdict == "deny"
# Sandbox contracts appear in results
sandbox_results = [c for c in result.contracts if c.contract_type == "sandbox"]
assert len(sandbox_results) == 1
assert sandbox_results[0].passed is False
evaluate_batch()¶
Evaluates multiple tool calls. Each call is evaluated independently via evaluate(). This method is synchronous.
Call Format¶
Each dict in the calls list accepts these keys:
| Key | Type | Required | Description |
|---|---|---|---|
tool |
str |
yes | Tool name |
args |
dict |
no | Tool arguments (defaults to {}) |
principal |
dict |
no | Principal as a dict with keys: role, user_id, ticket_ref, claims |
output |
str \| dict |
no | Simulated output. Dicts are JSON-serialized automatically |
environment |
str |
no | Environment override |
Example¶
results = guard.evaluate_batch([
{"tool": "read_file", "args": {"path": ".env"}},
{"tool": "read_file", "args": {"path": "readme.txt"}},
{"tool": "read_file", "args": {"path": "data.txt"}, "output": "SSN: 123-45-6789"},
{
"tool": "deploy_service",
"args": {"service": "api"},
"principal": {"role": "sre", "ticket_ref": "JIRA-123"},
},
])
assert results[0].verdict == "deny"
assert results[1].verdict == "allow"
assert results[2].verdict == "warn"
assert results[3].verdict == "allow"
EvaluationResult¶
Returned by evaluate(). Contains the overall verdict and per-contract details.
| Field | Type | Description |
|---|---|---|
verdict |
str |
"allow", "deny", or "warn" |
tool_name |
str |
The tool name that was evaluated |
contracts |
list[ContractResult] |
Per-contract results |
deny_reasons |
list[str] |
Messages from failed preconditions |
warn_reasons |
list[str] |
Messages from failed postconditions |
contracts_evaluated |
int |
Total number of contracts checked |
policy_error |
bool |
True if any contract raised an exception during evaluation |
The verdict is determined by:
"deny"-- at least one precondition or sandbox contract failed (and was not in observe mode)"warn"-- no precondition or sandbox failures, but at least one postcondition failed"allow"-- all contracts passed
ContractResult¶
One entry per evaluated contract. Found in EvaluationResult.contracts.
| Field | Type | Description |
|---|---|---|
contract_id |
str |
The contract's ID (from YAML id: or function __name__) |
contract_type |
str |
"precondition", "postcondition", or "sandbox" |
passed |
bool |
Whether the contract passed |
message |
str \| None |
The contract's message (from then.message in YAML) |
tags |
list[str] |
Tags attached to the contract |
observed |
bool |
True if the contract is in observe mode and would have fired |
effect |
str |
Postcondition effect: "warn", "redact", or "deny" |
policy_error |
bool |
True if the contract raised an exception |
evaluate() vs run() vs CLI¶
evaluate() |
run() |
edictum check / edictum test |
|
|---|---|---|---|
| Executes the tool | No | Yes | No |
| Session tracking | No | Yes | No |
| Audit events | No | Yes | No |
| Async required | No | Yes | N/A |
| Preconditions | Yes | Yes | Yes |
| Sandbox contracts | Yes | Yes | Yes |
| Postconditions | Only with output |
Always | --calls only |
| Short-circuits | No (exhaustive) | Yes (first deny) | No |
Use evaluate() for fast, synchronous contract logic testing. Use run() when you need the full pipeline including session state, hooks, and audit. Use the CLI for quick spot-checks and CI pipelines.
Next Steps¶
- Testing contracts -- YAML test cases, CI integration, and testing patterns
- CLI reference --
edictum checkandedictum testcommands - Contracts -- the four contract types