Data Protection Patterns¶
Data protection contracts prevent sensitive information from leaking through agent tool calls. They cover two sides: denying access to sensitive files (preconditions) and scanning tool output for sensitive patterns (postconditions).
PII Detection in Tool Output¶
Scan tool output for personally identifiable information using regex patterns. This is a postcondition because it inspects the result after the tool has run.
When to use: Your agent calls tools that return data from databases, APIs, or files that may contain personal data. You want an audit trail of PII exposure and a warning to the agent to redact before proceeding.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: pii-detection
defaults:
mode: enforce
contracts:
- id: pii-in-output
type: post
tool: "*"
when:
output.text:
matches_any:
- '\\b\\d{3}-\\d{2}-\\d{4}\\b'
- '\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b'
- '\\b\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}\\b'
- '\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b'
then:
effect: warn
message: "PII pattern detected in output. Redact before using in summaries or responses."
tags: [pii, compliance]
import re
from edictum import Verdict
from edictum.contracts import postcondition
@postcondition("*")
def detect_pii_in_output(envelope, tool_response):
if not isinstance(tool_response, str):
return Verdict.pass_()
pii_patterns = {
"SSN": r"\b\d{3}-\d{2}-\d{4}\b",
"email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
"credit_card": r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b",
"phone": r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",
}
found = [name for name, pat in pii_patterns.items() if re.search(pat, tool_response)]
if found:
return Verdict.fail(
f"Tool output contains potential PII: {', '.join(found)}. "
"Do NOT include this data in summaries or outputs. "
"Redact before processing further.",
pii_types=found,
)
return Verdict.pass_()
The patterns above detect:
| Pattern | Regex | Example Match |
|---|---|---|
| US SSN | \b\d{3}-\d{2}-\d{4}\b |
123-45-6789 |
| Email address | \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z\|a-z]{2,}\b |
user@example.com |
| Credit card | \b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b |
4111-1111-1111-1111 |
| Phone number | \b\d{3}[-.]?\d{3}[-.]?\d{4}\b |
555-867-5309 |
Gotchas:
- With effect: warn, postconditions detect but do not modify the output. Use on_postcondition_warn callbacks or switch to effect: redact for automatic pattern replacement on READ/PURE tools.
- Regex-based PII detection is a baseline. Production deployments should use ML-based PII scanners (Presidio, Phileas, etc.) behind the same postcondition contract interface.
- matches_any short-circuits on the first match. Order patterns from most common to least common for performance.
- The phone number regex will match some non-phone patterns like version numbers (e.g., 123.456.7890). Tune patterns based on your data.
Tip: For automatic redaction, change effect: warn to effect: redact. The pipeline uses the same matches_any patterns from the when clause to replace matched text with [REDACTED]. This works for READ/PURE tools; WRITE/IRREVERSIBLE tools fall back to warn.
Secret Scanning in Output¶
Detect credentials, tokens, and private keys in tool output. Even if a precondition allowed the read, the output may contain secrets that should not enter the conversation.
When to use: Defense in depth. Your agent reads files, calls APIs, or queries databases. Even if the input was allowed, the output may contain secrets leaked into logs, configs, or error messages.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: secret-scanning
defaults:
mode: enforce
contracts:
- id: secrets-in-output
type: post
tool: "*"
when:
output.text:
matches_any:
- 'AKIA[0-9A-Z]{16}'
- 'eyJ[A-Za-z0-9_-]+\\.eyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+'
- '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
then:
effect: warn
message: "Secret detected in output. Do not reference, log, or output this value."
tags: [secrets, dlp]
metadata:
severity: critical
import re
from edictum import Verdict
from edictum.contracts import postcondition
@postcondition("*")
def detect_secrets_in_output(envelope, tool_response):
if not isinstance(tool_response, str):
return Verdict.pass_()
secret_patterns = {
"AWS Access Key": r"AKIA[0-9A-Z]{16}",
"JWT Token": r"eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+",
"Private Key": r"-----BEGIN (?:RSA |EC )?PRIVATE KEY-----",
}
found = [name for name, pat in secret_patterns.items() if re.search(pat, tool_response)]
if found:
return Verdict.fail(
f"Tool output contains secrets: {', '.join(found)}. "
"Do NOT reference, log, or output these values.",
secret_types=found,
)
return Verdict.pass_()
The patterns above detect:
| Pattern | Regex | Example Match |
|---|---|---|
| AWS Access Key | AKIA[0-9A-Z]{16} |
AKIAIOSFODNN7EXAMPLE |
| JWT Token | eyJ... (three dot-separated base64 segments) |
eyJhbGciOiJ... |
| Private Key | PEM header format | -----BEGIN RSA PRIVATE KEY----- |
Gotchas:
- The AWS key pattern only matches access key IDs (starting with AKIA). It does not detect secret access keys, which are harder to distinguish from random strings. Add a separate pattern for aws_secret_access_key\s*[:=]\s*\S+ if needed.
- JWT patterns match the structure but do not validate the token. Expired or invalid JWTs still trigger the warning, which is the desired behavior.
Sensitive File Blocking¶
Block reads of files that commonly contain secrets, credentials, or private keys. This is a precondition -- it runs before the tool executes, so no data is exposed.
When to use: Your agent has access to read_file and you want to prevent it from reading files that could expose secrets, even accidentally.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: sensitive-file-denial
defaults:
mode: enforce
contracts:
- id: block-secret-files
type: pre
tool: read_file
when:
args.path:
contains_any:
- ".env"
- ".secret"
- "credentials"
- ".pem"
- "id_rsa"
- ".key"
- "kubeconfig"
then:
effect: deny
message: "Reading sensitive file '{args.path}' is denied. Skip and continue with non-sensitive files."
tags: [secrets, dlp]
- id: block-config-with-secrets
type: pre
tool: read_file
when:
any:
- args.path: { ends_with: ".tfvars" }
- args.path: { ends_with: ".npmrc" }
- args.path: { ends_with: ".pypirc" }
- args.path: { ends_with: ".netrc" }
then:
effect: deny
message: "Config file '{args.path}' may contain credentials. Access denied."
tags: [secrets, dlp]
from edictum import Verdict, precondition
@precondition("read_file")
def block_secret_files(envelope):
path = envelope.args.get("path", "")
sensitive = [".env", ".secret", "credentials", ".pem", "id_rsa", ".key", "kubeconfig"]
for s in sensitive:
if s in path:
return Verdict.fail(
f"Reading sensitive file '{path}' is denied. "
"Skip and continue with non-sensitive files."
)
return Verdict.pass_()
@precondition("read_file")
def block_config_with_secrets(envelope):
path = envelope.args.get("path", "")
secret_exts = [".tfvars", ".npmrc", ".pypirc", ".netrc"]
for ext in secret_exts:
if path.endswith(ext):
return Verdict.fail(
f"Config file '{path}' may contain credentials. Access denied."
)
return Verdict.pass_()
Gotchas:
- contains_any is a substring match. A path like /reports/environment.log would match on .env. Use ends_with or matches with word boundaries for more precise matching.
- This pattern only protects read_file. If your agent has a bash tool, it could read the same files with cat. Add corresponding contracts for all file-reading tools.
Output Size Monitoring¶
Warn when tool output is unusually large, which can waste context window tokens and cause the agent to lose track of its task.
When to use: Your agent reads files or queries databases where unbounded results are possible. Large outputs dilute the agent's focus and increase token costs.
apiVersion: edictum/v1
kind: ContractBundle
metadata:
name: output-monitoring
defaults:
mode: enforce
contracts:
- id: large-output-warning
type: post
tool: "*"
when:
output.text:
matches: '.{50000,}'
then:
effect: warn
message: "Tool output is very large. Use pagination, head/tail, or more specific filters."
tags: [performance, output-size]
from edictum import Verdict
from edictum.contracts import postcondition
@postcondition("*")
def monitor_output_size(envelope, tool_response):
if tool_response is None:
return Verdict.pass_()
size = len(str(tool_response))
if size > 50_000:
return Verdict.fail(
f"Tool output is very large ({size:,} chars). "
"Consider using head/tail, pagination, or more specific "
"filters to reduce the output before processing.",
output_size=size,
)
return Verdict.pass_()
Gotchas:
- The .{50000,} regex matches any string with 50,000 or more characters. This is a rough proxy for output size. Adjust the threshold based on your context window budget.
- Large regex matches can be slow. If performance is a concern, consider implementing output size monitoring as a Python postcondition instead, where you can use len() directly.