Testing Your Contracts¶
This guide covers how to validate, dry-run, unit test, and regression test your Edictum contracts.
When to use this¶
Use this guide when you need to verify contracts produce correct verdicts before deploying them -- whether that means gating contract changes in CI with edictum validate and edictum test, spot-checking a specific tool call with edictum check, running programmatic dry-run evaluations with guard.evaluate(), or regression testing against historical audit logs with edictum replay.
CLI Validation¶
Run edictum validate to catch schema, syntax, and semantic errors before deployment:
Validation checks include:
- YAML parse errors
- Missing required fields (
apiVersion,kind,metadata.name,defaults.mode) - Invalid regex patterns in
matches/matches_any - Duplicate contract IDs within a bundle
- Invalid effect for contract type (preconditions only allow
deny; postconditions allowwarn,redact, ordeny) - Use of
output.textin a precondition
CLI Contract Check¶
Use edictum check to simulate a tool call against your contracts without executing anything:
$ edictum check contracts.yaml \
--tool read_file \
--args '{"path": ".env"}' \
--principal-role analyst
DENIED by contract block-secret-reads
Message: Analysts cannot read '.env'. Ask an admin for help.
Tags: secrets, dlp
Contracts evaluated: 1
Verify allowed calls:
$ edictum check contracts.yaml \
--tool read_file \
--args '{"path": "readme.txt"}' \
--principal-role analyst
ALLOWED
Contracts evaluated: 1
This is useful for quick spot-checks during development. For batch testing, use edictum test.
Batch Testing With YAML Test Cases¶
Use edictum test to run a suite of test cases against your contracts. Define expected
outcomes in a YAML file and let the CLI verify them all at once:
# tests/contract-cases.yaml
cases:
- id: block-env-file
tool: read_file
args:
path: "/app/.env"
principal:
role: analyst
expect: deny
match_contract: block-sensitive-reads
- id: allow-readme
tool: read_file
args:
path: "README.md"
principal:
role: analyst
expect: allow
- id: deny-deploy-without-ticket
tool: deploy_service
args:
service: api
env: production
principal:
role: sre
expect: deny
match_contract: require-ticket
- id: allow-deploy-with-ticket
tool: deploy_service
args:
service: api
env: production
principal:
role: sre
ticket_ref: JIRA-456
expect: allow
- id: platform-team-access
tool: deploy_service
args:
env: production
principal:
role: developer
claims:
department: platform
clearance: high
expect: allow
Run it:
$ edictum test contracts.yaml --cases tests/contract-cases.yaml
block-env-file: read_file {"path": "/app/.env"} -> DENIED (block-sensitive-reads)
allow-readme: read_file {"path": "README.md"} -> ALLOWED
deny-deploy-without-ticket: deploy_service {"service": "api", "env": "production"} -> DENIED (require-ticket)
allow-deploy-with-ticket: deploy_service {"service": "api", "env": "production"} -> ALLOWED
platform-team-access: deploy_service {"env": "production"} -> ALLOWED
5/5 passed, 0 failed
Key features:
expect--allowordeny. The test passes if the precondition verdict matches.match_contract-- optional. When set, verifies that the specific contract ID triggered the denial. Catches cases where the right verdict happens for the wrong reason.principal-- supportsrole,user_id,ticket_ref, andclaims(arbitrary key-value pairs). Omit to test without principal context.
Preconditions only
--cases evaluates preconditions only. For postcondition testing, use --calls
(see below) or pytest with guard.evaluate().
This is the recommended approach for contract regression testing in CI. Keep your test
cases file alongside your contracts and run edictum test on every PR.
Evaluating Tool Calls With --calls¶
When you need to test postconditions or want a quick evaluation without defining expected verdicts, use --calls with a JSON file:
[
{"tool": "read_file", "args": {"path": "README.md"}},
{"tool": "read_file", "args": {"path": "/app/.env"}},
{"tool": "read_file", "args": {"path": "data.txt"}, "output": "SSN: 123-45-6789"}
]
Run it:
$ edictum test contracts.yaml --calls tests/calls.json
# Tool Verdict Contracts Details
1 read_file ALLOW 1 all contracts passed
2 read_file DENY 1 Sensitive file '/app/.env' denied.
3 read_file WARN 1 PII detected.
Key differences from --cases:
- Postconditions supported -- include an
outputfield to trigger postcondition evaluation. - Exhaustive evaluation -- all matching contracts run, no short-circuit on first denial.
- No expected verdicts -- results report what happened, not pass/fail against expectations.
- JSON output -- add
--jsonfor machine-readable output in CI pipelines.
See the CLI reference for the full format.
Unit Testing With pytest¶
For programmatic testing, use guard.evaluate() for dry-run checks or guard.run() to test with actual tool execution.
Dry-run with evaluate()¶
evaluate() checks a tool call against all matching contracts without executing the tool. It evaluates exhaustively (all matching contracts, no short-circuit) and returns an EvaluationResult:
from edictum import Edictum, Principal
guard = Edictum.from_yaml("contracts.yaml")
# Test a precondition denial
result = guard.evaluate("read_file", {"path": ".env"})
assert result.verdict == "deny"
assert "block-dotenv" in result.contracts[0].contract_id
# Test an allowed call
result = guard.evaluate("read_file", {"path": "readme.txt"})
assert result.verdict == "allow"
# Test a postcondition warning (pass output to trigger postconditions)
result = guard.evaluate("read_file", {"path": "data.txt"}, output="SSN: 123-45-6789")
assert result.verdict == "warn"
assert len(result.warn_reasons) > 0
# Test with principal context
result = guard.evaluate(
"deploy_service",
{"service": "api"},
principal=Principal(role="sre", ticket_ref="JIRA-123"),
)
assert result.verdict == "allow"
evaluate() is sync and does not require asyncio. The EvaluationResult contains:
| Field | Type | Description |
|---|---|---|
verdict |
str |
"allow", "deny", or "warn" |
tool_name |
str |
The tool name evaluated |
contracts |
list[ContractResult] |
Per-contract results with contract_id, passed, message, tags, observed, policy_error |
deny_reasons |
list[str] |
Messages from failed preconditions |
warn_reasons |
list[str] |
Messages from failed postconditions |
contracts_evaluated |
int |
Total number of contracts checked |
policy_error |
bool |
True if any contract had an evaluation error |
For batch evaluation, use evaluate_batch():
results = guard.evaluate_batch([
{"tool": "read_file", "args": {"path": ".env"}},
{"tool": "read_file", "args": {"path": "readme.txt"}},
])
assert results[0].verdict == "deny"
assert results[1].verdict == "allow"
Full execution with run()¶
Use guard.run() when you need to test the complete pipeline including tool execution, session tracking, and audit:
import asyncio
import pytest
from edictum import Edictum, EdictumDenied
@pytest.fixture
def guard():
return Edictum.from_yaml("contracts.yaml")
def test_sensitive_read_denied(guard):
async def read_file(path):
return f"contents of {path}"
with pytest.raises(EdictumDenied):
asyncio.run(guard.run("read_file", {"path": ".env"}, read_file))
def test_normal_read_allowed(guard):
async def read_file(path):
return f"contents of {path}"
result = asyncio.run(guard.run("read_file", {"path": "readme.txt"}, read_file))
assert "contents" in result
Test patterns to cover:
- Denied calls -- assert that
EdictumDeniedis raised for calls that should be denied. - Allowed calls -- assert that the tool result is returned for calls that should pass.
- Edge cases -- test boundary values, missing principal fields, wildcard tool targets.
- Session limits -- call
guard.run()in a loop to verify session-level limits fire at the correct count.
When to use evaluate() vs run()
Use evaluate() for contract logic testing -- it's sync, fast, and doesn't need
mock tool functions. Use run() when you need to test the full pipeline including
session state, hooks, and audit.
Integration Testing With Observe Mode¶
Test contracts in a running system without denying real tool calls. Deploy with mode: observe and collect audit events:
from edictum import Edictum, Principal
from edictum.audit import FileAuditSink, RedactionPolicy
redaction = RedactionPolicy()
sink = FileAuditSink("test-audit.jsonl", redaction=redaction)
guard = Edictum.from_yaml("contracts.yaml", audit_sink=sink, redaction=redaction)
# defaults.mode should be "observe" in the YAML
After running your agent through a test scenario, inspect test-audit.jsonl for:
CALL_WOULD_DENYevents -- these are calls that would be denied in enforce mode.- Absence of false positives -- legitimate calls should not produce would-deny events.
Regression Testing¶
Save audit logs from a known-good run and compare against updated contracts using edictum replay:
$ edictum replay contracts/v2.yaml --audit-log audit/baseline.jsonl
Replayed 340 events, 0 would change
If the replay shows changes, investigate before deploying:
$ edictum replay contracts/v2.yaml --audit-log audit/baseline.jsonl
Replayed 340 events, 2 would change
Changed verdicts:
read_file: call_allowed -> denied
Contract: block-config-reads
bash: call_allowed -> denied
Contract: block-destructive-commands
Incorporate replay into your CI pipeline to catch unintended contract regressions:
# GitHub Actions example
- name: Validate contracts
run: edictum validate contracts/production.yaml
- name: Replay baseline audit log
run: |
edictum replay contracts/production.yaml \
--audit-log tests/audit-baseline.jsonl
Testing Checklist¶
- Validate --
edictum validatepasses with zero errors. - Dry-run --
edictum checkproduces expected deny/allow for key scenarios. - Batch test (cases) --
edictum test --casespasses all YAML test cases with correct verdicts and contract matches. - Batch test (calls) --
edictum test --callsevaluates representative tool calls including postconditions. - Unit tests -- pytest tests with
guard.evaluate()cover preconditions, postconditions, and edge cases. Useguard.run()for session limit tests. - Observe mode -- deploy in observe mode and review
CALL_WOULD_DENYevents. - Replay --
edictum replayagainst a baseline audit log shows no regressions. - Enforce -- flip to
mode: enforceafter all checks pass.