Skip to main content

Purpose

A Golden Corpus is a small, curated set of test cases used to validate that TealTiger policies behave deterministically and as intended. This document explains:
  • What a golden corpus is
  • Why it is essential for governance
  • How to design one effectively
  • When it should be updated
A golden corpus is not optional for serious policy authoring — it is the safety net that makes deterministic governance reliable.

What Is a Golden Corpus?

A golden corpus is a collection of input contexts and expected outcomes that serve as the canonical source of truth for policy behavior. Each test case answers:
  • Given this input
  • Under this policy version
  • The decision must be exactly this
If the outcome changes unexpectedly, the corpus detects it immediately.

Why Determinism Requires a Golden Corpus

TealTiger is designed around deterministic enforcement: Same inputs → same policy → same decision Without a golden corpus:
  • Policy changes become risky
  • Regressions go unnoticed
  • Audit confidence erodes
  • “It worked yesterday” becomes common
With a golden corpus:
  • Every change is intentional
  • Every decision is reproducible
  • Every audit trail is defensible

What Belongs in a Golden Corpus

A good golden corpus is small but representative. At minimum, include the following case types:
  1. Allow case
  2. Deny case
  3. Modify case
  4. Redact case
  5. Missing input case
  6. Boundary case
  7. Environment case

Canonical Test Case Structure

machine_spec:
  version: v1.1.0
  id: tealtiger.golden_corpus.contract
  type: golden_corpus_contract
  stability: immutable
  required_fields:
    - test_case_id
    - description
    - policy_version
    - category
    - input_context
    - expected.decision
    - expected.reason_codes
  optional_fields:
    - expected.effective_values
    - expected.audit_attributes
  decision_enum: [allow, deny, modify, redact]
  category_enum: [cost, security, reliability]
  rules:
    - id: deterministic_outcome
      description: Same inputs and policy version must yield the same decision.
    - id: reason_codes_required
      description: deny/modify/redact must emit at least one reason code.
    - id: immutable_test_ids
      description: Test case IDs must never be reused for different behavior.
This structure is a contract, not a suggestion.

Example Golden Corpus

Test Case 01 — Allow (Baseline Safe Case)

machine_spec:
  version: v1.1.0
  id: TC-ALLOW-BASELINE
  type: golden_corpus_case
  stability: immutable
  category: security
  policy_version: v1.1.0
  input_context:
    env: prod
    identity.role: trusted_service
    request.intent: inference
    tool_access: false
  expected:
    decision: allow
    reason_codes:
      - SEC_ALLOW_TRUSTED_BASELINE
Test Case ID: TC-ALLOW-BASELINE
Description: Trusted identity performing an allowed action in production.
Policy Version: v1.1.0
Category: Security
Input Context:
  • env: prod
  • identity.role: trusted_service
  • request.intent: inference
  • tool_access: false
Expected Decision: allow Expected Reason Codes:
  • SEC_ALLOW_TRUSTED_BASELINE
Expected Effective Values:
  • none
Expected Audit Attributes:
  • decision = allow
  • environment = prod

Test Case 02 — Deny (Clear Policy Violation)

machine_spec:
  version: v1.1.0
  id: TC-DENY-UNTRUSTED-TOOL-PROD
  type: golden_corpus_case
  stability: immutable
  category: security
  policy_version: v1.1.0
  input_context:
    env: prod
    identity.role: untrusted
    request.intent: tool_use
    tool_access: true
  expected:
    decision: deny
    reason_codes:
      - SEC_TOOL_ACCESS_UNTRUSTED_PROD
Test Case ID: TC-DENY-UNTRUSTED-TOOL-PROD Description:
Untrusted identity attempting tool access in production.
Policy Version: v1.1.0
Category: Security
Input Context:
  • env: prod
  • identity.role: untrusted
  • request.intent: tool_use
  • tool_access: true
Expected Decision: deny Expected Reason Codes:
  • SEC_TOOL_ACCESS_UNTRUSTED_PROD
Expected Effective Values:
  • none
Expected Audit Attributes:
  • decision = deny
  • severity = high

Test Case 03 — Modify (Deterministic Clamp)

machine_spec:
  version: v1.1.0
  id: TC-MODIFY-TOKEN-CLAMP-FREE
  type: golden_corpus_case
  stability: immutable
  category: cost
  policy_version: v1.1.0
  input_context:
    env: prod
    budget_class: free
    requested_max_tokens: 8192
  expected:
    decision: modify
    reason_codes:
      - COST_TOKEN_CLAMP_FREE_TIER
    effective_values:
      max_tokens: 2048
Test Case ID: TC-MODIFY-TOKEN-CLAMP-FREE Description:
Free‑tier request exceeding token limit.
Policy Version: v1.1.0
Category: Cost
Input Context:
  • env: prod
  • budget_class: free
  • requested_max_tokens: 8192
Expected Decision: modify Expected Reason Codes:
  • COST_TOKEN_CLAMP_FREE_TIER
Expected Effective Values:
  • max_tokens = 2048
Expected Audit Attributes:
  • decision = modify
  • cost_class = free

Test Case 04 — Redact (Sensitive Data Handling)

machine_spec:
  version: v1.1.0
  id: TC-REDACT-PII
  type: golden_corpus_case
  stability: immutable
  category: security
  policy_version: v1.1.0
  input_context:
    env: prod
    request.contains_pii: true
  expected:
    decision: redact
    reason_codes:
      - SEC_REDACT_PII_FIELDS
Test Case ID: TC-REDACT-PII Description:
Request flagged as containing PII.
Policy Version: v1.1.0
Category: Security
Input Context:
  • env: prod
  • request.contains_pii: true
Expected Decision: redact Expected Reason Codes:
  • SEC_REDACT_PII_FIELDS
Expected Effective Values:
  • none
Expected Audit Attributes:
  • redaction_applied = true

Test Case 05 — Missing Input (Safe Default)

machine_spec:
  version: v1.1.0
  id: TC-MISSING-INPUT-IDENTITY
  type: golden_corpus_case
  stability: immutable
  category: security
  policy_version: v1.1.0
  input_context:
    env: prod
    identity: missing
  expected:
    decision: deny
    reason_codes:
      - INPUT_MISSING_IDENTITY
Test Case ID: TC-MISSING-INPUT-IDENTITY Description:
Required identity field is missing.
Policy Version: v1.1.0
Category: Security
Input Context:
  • env: prod
  • identity: missing
Expected Decision: deny Expected Reason Codes:
  • INPUT_MISSING_IDENTITY
Expected Effective Values:
  • none
Expected Audit Attributes:
  • decision = deny
  • failure_type = missing_input

Test Case 06 — Boundary Condition

machine_spec:
  version: v1.1.0
  id: TC-BOUNDARY-TOKEN-LIMIT
  type: golden_corpus_case
  stability: immutable
  category: cost
  policy_version: v1.1.0
  input_context:
    env: prod
    budget_class: standard
    requested_max_tokens: 4096
  expected:
    decision: allow
    reason_codes:
      - COST_TOKEN_ALLOW_AT_LIMIT
    effective_values:
      max_tokens: 4096
Test Case ID: TC-BOUNDARY-TOKEN-LIMIT Description:
Requested tokens exactly at the allowed threshold.
Policy Version: v1.1.0
Category: Cost
Input Context:
  • env: prod
  • budget_class: standard
  • requested_max_tokens: 4096
Expected Decision: allow Expected Reason Codes:
  • COST_TOKEN_ALLOW_AT_LIMIT
Expected Effective Values:
  • max_tokens = 4096
Expected Audit Attributes:
  • decision = allow

Rules for Maintaining the Golden Corpus

  • Update the corpus only when behavior changes intentionally
  • Never update the corpus to “fix” failing tests
  • Add new cases for new policies or thresholds
  • Do not remove cases without explicit review
If a policy change breaks the golden corpus, the change is not safe.

Review Checklist

  • All golden corpus cases pass
  • New behavior has new test cases
  • Reason codes are explicit and stable
  • Boundary conditions are covered
  • Missing‑input behavior is deterministic

Summary

The golden corpus is the proof system for deterministic governance. If policies define what should happen,
the golden corpus proves that it always does.