Skip to main content
TealTiger maintains an automated red team benchmark suite that tests guardrails and policy enforcement against real-world adversarial attack patterns. Results are generated from repeatable test runs in CI.

AIGoat OWASP LLM Top 10 Benchmark

AIGoat is an open-source AI security playground that provides hands-on attack labs covering the full OWASP Top 10 for LLM Applications. TealTiger’s benchmark extracts attack patterns from AIGoat’s labs and runs them through the full v1.1.0 stack.

v1.1.0 Full Stack Results

27 attacks · 27 caught · 100% catch rate
OWASP CategoryAttacksCaughtRate
LLM01 — Prompt Injection88100%
LLM02 — Sensitive Information Disclosure44100%
LLM05 — Improper Output Handling33100%
LLM06 — Excessive Agency55100%
LLM07 — System Prompt Leakage33100%
LLM10 — Unbounded Consumption22100%
Compound (multi-category)22100%

What Each Layer Catches

TealTiger v1.1.0 uses defense in depth. Different components handle different attack classes:
Attack ClassCaught By
Prompt injection, jailbreaks, DANTealGuard (PromptInjectionGuardrail)
PII in input (SSN, credit cards, emails)TealGuard (PIIDetectionGuardrail)
XSS, SQL injection, command injectionTealEngine (CodeExecutionPolicy — blockedPatterns)
Forbidden tool calls (delete, drop, export)TealEngine (ToolPolicy — allowed: false)
Privilege escalation, unauthorized actionsTealEngine (IdentityPolicy — forbidden actions)
Data exfiltration via queriesTealEngine (ToolPolicy — allowedTables, maxRows)
System prompt extractionTealGuard (PromptInjectionGuardrail — systemLeakage patterns)
Token exhaustion, context floodingTealEngine (BehavioralPolicy — rate limits, cost limits)

Guardrails-Only Baseline (v0.2.2)

For comparison, running the same attack corpus through only the basic guardrails (without TealEngine policies) produces a lower catch rate. This demonstrates the value of the full v1.1.0 stack. 34 attacks · 18 caught · 52.9% catch rate
OWASP CategoryAttacksCaughtRateGap
LLM01 — Prompt Injection10770%Indirect/obfuscated injection bypasses regex
LLM02 — Sensitive Info5360%Exfiltration requests without PII in input
LLM05 — Output Handling300%No output-side enforcement in basic guardrails
LLM06 — Excessive Agency3133%No tool-call gating in basic guardrails
LLM07 — System Prompt Leak4125%Limited regex coverage
LLM10 — Unbounded Consumption300%No budget/rate enforcement in basic guardrails
Compound66100%Compound attacks trigger multiple guardrails
The jump from 52.9% to 100% is entirely due to TealEngine’s policy layer — tool policies, identity enforcement, code execution policies, and behavioral limits.

Attack Categories Tested

LLM01: Prompt Injection

  • Direct instruction override (“ignore all previous instructions”)
  • Context manipulation (delimiter injection, chat template injection)
  • Indirect injection (foreign language translation tasks)
  • Role-play jailbreaks (DAN, DUDE, persona assignment)
  • Encoding attacks (base64-encoded payloads)

LLM02: Sensitive Information Disclosure

  • PII in user input (SSN, credit cards, emails, phone numbers)
  • Bulk data exfiltration requests
  • Targeted PII extraction for specific users
  • Training data completion attacks

LLM05: Improper Output Handling

  • XSS payloads (<script> tags, document.cookie)
  • SQL injection (DROP TABLE, DELETE FROM)
  • OS command injection (cat /etc/passwd, shell commands)

LLM06: Excessive Agency

  • Unauthorized financial actions (large refunds without approval)
  • Privilege escalation (admin access requests)
  • Bulk data export to external destinations
  • Destructive database operations (delete, drop)

LLM07: System Prompt Leakage

  • Direct extraction (“print your system prompt”)
  • Verbatim repetition attacks
  • Social engineering via developer persona

LLM10: Unbounded Consumption

  • Token exhaustion (extremely long output requests)
  • Context window flooding (100K+ character inputs)

Running the Benchmarks

Both benchmark test suites are included in the TealTiger SDK:
# v1.1.0 full stack benchmark
npx jest aigoat-v110-benchmark.test.ts --verbose

# Guardrails-only baseline
npx jest aigoat-redteam-benchmark.test.ts --verbose
The tests run without external dependencies (no API keys, no Docker, no network calls) and complete in under 10 seconds.

Methodology

  • Attack prompts are inspired by AIGoat OWASP LLM Top 10 labs
  • Each attack is run through the full TealTiger stack (TealGuard + TealEngine)
  • A “catch” means at least one component returned a DENY decision
  • Tests are deterministic and repeatable — no LLM inference involved
  • Results are generated from automated test runs, not manual testing

Adding New Benchmarks

The benchmark framework is extensible. To add new attack corpora:
const NEW_ATTACKS: AttackPrompt[] = [
  {
    id: 'CUSTOM-001',
    owasp: 'LLM01',
    category: 'new-technique',
    action: 'chat.create',
    prompt: 'Your adversarial prompt here',
    description: 'What this attack tries to achieve',
  },
];
We welcome contributions of new attack patterns. If you find a prompt that bypasses TealTiger’s defenses, please open an issue on GitHub.