Red Team Benchmark Results

AIGoat OWASP LLM Top 10 Benchmark
v1.1.0 Full Stack Results
What Each Layer Catches
Guardrails-Only Baseline (v0.2.2)
Attack Categories Tested
LLM01: Prompt Injection
LLM02: Sensitive Information Disclosure
LLM05: Improper Output Handling
LLM06: Excessive Agency
LLM07: System Prompt Leakage
LLM10: Unbounded Consumption
Running the Benchmarks
Methodology
Adding New Benchmarks

TealTiger maintains an automated red team benchmark suite that tests guardrails and policy enforcement against real-world adversarial attack patterns. Results are generated from repeatable test runs in CI.

AIGoat OWASP LLM Top 10 Benchmark

AIGoat is an open-source AI security playground that provides hands-on attack labs covering the full OWASP Top 10 for LLM Applications. TealTiger’s benchmark extracts attack patterns from AIGoat’s labs and runs them through the full v1.1.0 stack.

v1.1.0 Full Stack Results

27 attacks · 27 caught · 100% catch rate

OWASP Category	Attacks	Caught	Rate
LLM01 — Prompt Injection	8	8	100%
LLM02 — Sensitive Information Disclosure	4	4	100%
LLM05 — Improper Output Handling	3	3	100%
LLM06 — Excessive Agency	5	5	100%
LLM07 — System Prompt Leakage	3	3	100%
LLM10 — Unbounded Consumption	2	2	100%
Compound (multi-category)	2	2	100%

What Each Layer Catches

TealTiger v1.1.0 uses defense in depth. Different components handle different attack classes:

Attack Class	Caught By
Prompt injection, jailbreaks, DAN	TealGuard (PromptInjectionGuardrail)
PII in input (SSN, credit cards, emails)	TealGuard (PIIDetectionGuardrail)
XSS, SQL injection, command injection	TealEngine (CodeExecutionPolicy — blockedPatterns)
Forbidden tool calls (delete, drop, export)	TealEngine (ToolPolicy — allowed: false)
Privilege escalation, unauthorized actions	TealEngine (IdentityPolicy — forbidden actions)
Data exfiltration via queries	TealEngine (ToolPolicy — allowedTables, maxRows)
System prompt extraction	TealGuard (PromptInjectionGuardrail — systemLeakage patterns)
Token exhaustion, context flooding	TealEngine (BehavioralPolicy — rate limits, cost limits)

Guardrails-Only Baseline (v0.2.2)

For comparison, running the same attack corpus through only the basic guardrails (without TealEngine policies) produces a lower catch rate. This demonstrates the value of the full v1.1.0 stack. 34 attacks · 18 caught · 52.9% catch rate

OWASP Category	Attacks	Caught	Rate	Gap
LLM01 — Prompt Injection	10	7	70%	Indirect/obfuscated injection bypasses regex
LLM02 — Sensitive Info	5	3	60%	Exfiltration requests without PII in input
LLM05 — Output Handling	3	0	0%	No output-side enforcement in basic guardrails
LLM06 — Excessive Agency	3	1	33%	No tool-call gating in basic guardrails
LLM07 — System Prompt Leak	4	1	25%	Limited regex coverage
LLM10 — Unbounded Consumption	3	0	0%	No budget/rate enforcement in basic guardrails
Compound	6	6	100%	Compound attacks trigger multiple guardrails

The jump from 52.9% to 100% is entirely due to TealEngine’s policy layer — tool policies, identity enforcement, code execution policies, and behavioral limits.

Attack Categories Tested

LLM01: Prompt Injection

Direct instruction override (“ignore all previous instructions”)
Context manipulation (delimiter injection, chat template injection)
Indirect injection (foreign language translation tasks)
Role-play jailbreaks (DAN, DUDE, persona assignment)
Encoding attacks (base64-encoded payloads)

LLM02: Sensitive Information Disclosure

PII in user input (SSN, credit cards, emails, phone numbers)
Bulk data exfiltration requests
Targeted PII extraction for specific users
Training data completion attacks

LLM05: Improper Output Handling

XSS payloads (<script> tags, document.cookie)
SQL injection (DROP TABLE, DELETE FROM)
OS command injection (cat /etc/passwd, shell commands)

LLM06: Excessive Agency

Unauthorized financial actions (large refunds without approval)
Privilege escalation (admin access requests)
Bulk data export to external destinations
Destructive database operations (delete, drop)

LLM07: System Prompt Leakage

Direct extraction (“print your system prompt”)
Verbatim repetition attacks
Social engineering via developer persona

LLM10: Unbounded Consumption

Token exhaustion (extremely long output requests)
Context window flooding (100K+ character inputs)

Running the Benchmarks

Both benchmark test suites are included in the TealTiger SDK:

# v1.1.0 full stack benchmark
npx jest aigoat-v110-benchmark.test.ts --verbose

# Guardrails-only baseline
npx jest aigoat-redteam-benchmark.test.ts --verbose

The tests run without external dependencies (no API keys, no Docker, no network calls) and complete in under 10 seconds.

Methodology

Attack prompts are inspired by AIGoat OWASP LLM Top 10 labs
Each attack is run through the full TealTiger stack (TealGuard + TealEngine)
A “catch” means at least one component returned a DENY decision
Tests are deterministic and repeatable — no LLM inference involved
Results are generated from automated test runs, not manual testing

Adding New Benchmarks

The benchmark framework is extensible. To add new attack corpora:

const NEW_ATTACKS: AttackPrompt[] = [
  {
    id: 'CUSTOM-001',
    owasp: 'LLM01',
    category: 'new-technique',
    action: 'chat.create',
    prompt: 'Your adversarial prompt here',
    description: 'What this attack tries to achieve',
  },
];

We welcome contributions of new attack patterns. If you find a prompt that bypasses TealTiger’s defenses, please open an issue on GitHub.

Governance Validation & Verification (v1.2.x)TypeScript SDK API Reference

Start Here

Concepts

Policy

Deployment

Operations

Integrations

AI Tools

Governance

API Reference

Cookbook

Architecture & Telemetry

Guides

Reference

Versions & Roadmap

About

Playground

Red Team Benchmark Results

AIGoat OWASP LLM Top 10 Benchmark

v1.1.0 Full Stack Results

What Each Layer Catches

Guardrails-Only Baseline (v0.2.2)

Attack Categories Tested

LLM01: Prompt Injection

LLM02: Sensitive Information Disclosure

LLM05: Improper Output Handling

LLM06: Excessive Agency

LLM07: System Prompt Leakage

LLM10: Unbounded Consumption

Running the Benchmarks

Methodology

Adding New Benchmarks

Start Here

Concepts

Policy

Deployment

Operations

Integrations

AI Tools

Governance

API Reference

Cookbook

Architecture & Telemetry

Guides

Reference

Versions & Roadmap

About

Playground

​AIGoat OWASP LLM Top 10 Benchmark

​v1.1.0 Full Stack Results

​What Each Layer Catches

​Guardrails-Only Baseline (v0.2.2)

​Attack Categories Tested

​LLM01: Prompt Injection

​LLM02: Sensitive Information Disclosure

​LLM05: Improper Output Handling

​LLM06: Excessive Agency

​LLM07: System Prompt Leakage

​LLM10: Unbounded Consumption

​Running the Benchmarks

​Methodology

​Adding New Benchmarks

AIGoat OWASP LLM Top 10 Benchmark

v1.1.0 Full Stack Results

What Each Layer Catches

Guardrails-Only Baseline (v0.2.2)

Attack Categories Tested

LLM01: Prompt Injection

LLM02: Sensitive Information Disclosure

LLM05: Improper Output Handling

LLM06: Excessive Agency

LLM07: System Prompt Leakage

LLM10: Unbounded Consumption

Running the Benchmarks

Methodology

Adding New Benchmarks