Skip to main content
Version: v1.1.0
Applies to: Policy Engine, Policy Design, Audit & Telemetry
Audience: Developers, Platform Engineers, Security Architects

Purpose

This guide shows how to write good TealTiger policies end-to-end—from intent → policy structure → rollout → testing → operations. A “good policy” in TealTiger is:
  • Deterministic: same inputs → same decision
  • Explainable: clear reason codes and traceable conditions
  • Auditable: stable, structured audit output
  • Composable: small policies that scale, not monoliths
  • Safe to change: versioned, reviewed, and rolled out intentionally
If you are new, start with Policy Anatomy and Authoring Workflow.
Where this fits: Start with /concepts/decision-lifecycle → then author policies here → validate outputs in /audit/audit-event-schema

Mental Model

This section explains what a policy does and what it does not do. A TealTiger policy is a governance contract that evaluates an input context and produces a decision:
  • Allow: proceed
  • Deny: block
  • Modify: enforce changes (e.g., clamp max tokens, force lower tier)
  • Redact: remove or mask sensitive fields from logs/audit trails
  • Annotate: emit structured metadata (risk score, cost class, tags)
Policies do not replace:
  • Business logic
  • IAM/KMS/Secrets
  • SIEM dashboards
  • Model hosting/inference
Policies define guardrails around AI behavior.

Policy Anatomy

1) Inputs (Context)

Policies evaluate a context that typically includes:
  • Execution identity: who/what is running (service, user, role, tenant)
  • Environment: prod/stage/dev
  • Request intent: purpose tag or operation type
  • Model/provider metadata: model, tier, tool access, etc.
  • Cost metadata: budget class, token limits, estimated cost fields
  • Risk signals: precomputed scores/flags from your pipeline (optional)
  • Audit preferences: redaction rules and logging level (policy-driven)
Best practice: keep context explicit, stable, and documented.

2) Conditions

Conditions are the “if” checks:
  • equality checks
  • ranges and thresholds
  • membership (allow/deny lists)
  • boolean flags
  • environment/identity gating
Best practice: conditions should be readable without needing tribal knowledge.

3) Actions (Outcomes)

Actions define what happens:
  • allow / deny
  • modifications to the request (clamps, overrides, downgrades)
  • redaction directives
  • telemetry annotations (tags, severity, risk score, cost class)
  • reason codes emitted for explainability
Best practice: actions must be predictable and bounded.

4) Reason Codes (Explainability Contract)

Every non-trivial policy decision should emit a reason code that is:
  • stable
  • descriptive
  • versioned by policy lifecycle (not by runtime)
  • usable in dashboards/alerts
Best practice: “reason codes” are the API between governance and humans.

5) Audit & Telemetry Output

A policy should lead to:
  • consistent audit event emission
  • consistent log/redaction behavior
  • consistent tagging for later analysis
Best practice: treat audit output as evidence, not debug noise.

Authoring Workflow

Step 1: Start from a Governance Question

Write a one-line question:
  • “Should this request be allowed to use tools in production?”
  • “Should this request exceed 4k tokens under free-tier budgets?”
  • “Should we redact user PII before emitting audit logs?”
  • “Should we block untrusted identities from high-cost models?”
If your policy doesn’t answer a clear question, it will become messy.

Step 2: Pick the Policy Category (Cost / Security / Reliability)

Cost governance policies typically control:
  • token limits
  • model tiers
  • concurrency ceilings
  • budget class constraints
  • routing decisions that impact spend
Security governance policies typically control:
  • tool access and permissions
  • data handling and redaction
  • environment-based restrictions
  • identity-based authorization boundaries
Reliability governance policies typically control:
  • timeouts and retries (as constraints)
  • request size and rate constraints
  • fallback behavior (as deterministic rules)
  • safe defaults when signals are missing
Best practice: keep each policy focused on one category to avoid monoliths.

Step 3: Define Inputs Explicitly (Contract First)

Document the required inputs your policy expects. Keep it short:
  • Required: env, identity.role, request.intent, model.tier
  • Optional: risk.score, cost.estimated_usd
Design rule:
  • Missing required inputs should lead to a deterministic safe behavior (often deny or safe downgrade), with a reason code like INPUT_MISSING_*.

Step 4: Write Conditions in Plain Language First

Before writing policy code, write bullet conditions:
  • If env == "prod" and tool_access == true and identity.trust != "trusted" → deny
  • If budget_class == "free" and max_tokens > 2048 → clamp to 2048
  • If request.contains_pii == true → redact fields X/Y/Z
This prevents policy code from becoming a logic puzzle.

Step 5: Choose Actions that are Minimal and Bounded

Good actions:
  • deny with reason code
  • clamp a numeric field within a known range
  • require redaction for specific fields
  • tag with severity/risk score
  • route to a cheaper model tier deterministically (if that is your pattern)
Bad actions:
  • “auto-learn” or “adapt”
  • random sampling
  • vague “soft block” behavior
  • unbounded transformations

Step 6: Assign Reason Codes (Design for Humans)

Reason codes should:
  • be short but meaningful
  • indicate category + root cause
  • stay stable over time
Examples (pattern):
  • COST_TOKEN_CLAMP_FREE_TIER
  • SEC_TOOL_ACCESS_UNTRUSTED_PROD
  • REL_INPUT_MISSING_SAFE_DEFAULT
  • SEC_REDACT_PII_FIELDS
Best practice: every deny/modify/redact should have a reason code.

Step 7: Test with a Golden Corpus (Determinism Check)

Create a small set of inputs and expected outcomes:
  • happy path
  • boundary cases
  • missing input cases
  • high-risk identity cases
  • prod vs non-prod
A good corpus answers:
  • “If we change policy logic, what breaks?”
Best practice: your “golden corpus” is the fastest way to prevent regressions.

Step 8: Rollout Safely (Policy Change Management)

Recommended rollout stages:
  1. Shadow mode (observe): evaluate policy and emit audit events, but do not enforce
  2. Warn mode (soft enforcement): allow but tag high-risk decisions
  3. Enforce mode: deny/modify/redact becomes active
  4. Tighten thresholds: only after you have evidence
Always ship:
  • policy version metadata
  • reason code mapping updates
  • audit expectations

Policy Design Principles

Principle 1: Deterministic Always

  • Avoid “confidence” based outcomes unless confidence is an input and thresholds are explicit.
  • No randomization.
  • No time-dependent decisions unless time is an explicit input.

Principle 2: Prefer Simple Policies + Composition

Instead of one mega-policy:
  • policy_cost_limits
  • policy_tool_access
  • policy_redaction
  • policy_environment_gates
Evaluate independently; merge outcomes deterministically.

Principle 3: Fail Safe on Missing Inputs

Define a consistent missing-input behavior, e.g.:
  • Security-critical inputs missing → deny
  • Cost inputs missing → clamp to conservative defaults
  • Risk signal missing → treat as unknown, apply stricter path
Always emit reason codes:
  • INPUT_MISSING_ENV
  • INPUT_MISSING_IDENTITY_TRUST
  • INPUT_MISSING_COST_ESTIMATE

Principle 4: Keep Thresholds Explicit and Owned

All thresholds should be:
  • visible in policy
  • documented (why this number)
  • versioned and reviewed
Examples:
  • token clamp: 2048 / 4096
  • max tool calls: 3
  • risk threshold: >= 70 triggers deny in prod

Principle 5: Preserve Explainability

If a policy fails, the developer should be able to answer in 30 seconds:
  • which rule triggered?
  • which input caused it?
  • which reason code explains it?

Common Policy Patterns

Pattern A: Environment Gate

Use environment gating to ensure stricter controls in prod.
  • In prod: deny untrusted tool use
  • In non-prod: allow but tag
Reason codes:
  • SEC_TOOL_USE_PROD_DENY_UNTRUSTED
  • SEC_TOOL_USE_NONPROD_ALLOW_TAGGED

Pattern B: Budget Class Token Clamp

Use budget classes to bound cost deterministically.
  • free tier: clamp to 2k tokens
  • standard: clamp to 8k tokens
  • premium: allow configured max
Reason codes:
  • COST_TOKEN_CLAMP_FREE
  • COST_TOKEN_CLAMP_STANDARD

Pattern C: Identity-Based Model Tier Control

  • untrusted identities → only low-cost model tier
  • trusted services → allow high tier
Reason codes:
  • COST_MODEL_TIER_DOWNGRADE_UNTRUSTED
  • COST_MODEL_TIER_ALLOW_TRUSTED

Pattern D: Redaction as Default for Sensitive Paths

  • if request contains PII flags → redact before audit
  • if tenant is regulated → force stricter redaction always
Reason codes:
  • SEC_REDACT_PII
  • SEC_REDACT_REGULATED_TENANT

Pattern E: Risk Score Thresholding (Deterministic)

You may pass probabilistic signals (like classifier scores) as inputs, but enforce deterministically:
  • risk >= 80 in prod → deny
  • risk 50–79 → allow with tags + enhanced logging/redaction
  • risk < 50 → allow
Reason codes:
  • SEC_RISK_DENY_HIGH_PROD
  • SEC_RISK_ALLOW_TAG_MEDIUM

Anti-Patterns

Avoid these (see: /about/anti-patterns):
  • encoding business logic inside policies
  • frequent ad-hoc policy edits
  • treating audit logs like debug logs
  • making giant monolithic policies
  • expecting policies to “learn” or adapt

Testing and Validation

Include at least:
  1. Allow case (trusted identity, safe intent)
  2. Deny case (untrusted tool access in prod)
  3. Modify case (token clamp for budget class)
  4. Redact case (PII flagged)
  5. Missing input case (safe default + reason code)
  6. Boundary case (token == limit, risk == threshold)
  7. Non-prod case (more permissive but tagged)

What to Assert

  • decision outcome (allow/deny/modify/redact)
  • emitted reason codes
  • final effective values (e.g., clamped tokens)
  • audit event fields and stability expectations
For deterministic validation, see Golden Corpus → /policy/golden-corpus

Rollout and Operations

Keep Policies Versioned

  • store policies in Git
  • tag releases
  • ensure audit events include policy version metadata

Monitor for Drift

  • track frequency of each reason code
  • spikes indicate behavior changes or misuse
  • use reason codes as the stable “signal API”

Review Policies Periodically

  • evaluate top denies / clamps
  • adjust thresholds intentionally
  • capture learnings in policy docs

Review Checklist

A policy is ready when:
  • Purpose is stated as a governance question
  • Inputs are defined (required vs optional)
  • Conditions are readable
  • Actions are minimal and bounded
  • All deny/modify/redact outcomes emit reason codes
  • Golden corpus exists with boundary + missing-input cases
  • Rollout mode is chosen (shadow/warn/enforce)
  • Audit and telemetry expectations are documented

  • /policy/conditions-and-actions
  • /policy/reason-codes
  • /policy/risk-scores
  • /audit/audit-event-schema
  • /audit/logging-behavior
  • /about/best-practices
  • /about/decision-philosophy

Connect the dots

  • Lifecycle context: /concepts/decision-lifecycle
  • Decision mechanics: /policy/conditions-and-actions · /policy/reason-codes · /policy/risk-scores
  • Audit evidence: /audit/audit-event-schema · /audit/logging-behavior