AIGoat OWASP LLM Top 10 Benchmark
AIGoat is an open-source AI security playground that provides hands-on attack labs covering the full OWASP Top 10 for LLM Applications. TealTiger’s benchmark extracts attack patterns from AIGoat’s labs and runs them through the full v1.1.0 stack.v1.1.0 Full Stack Results
27 attacks · 27 caught · 100% catch rate| OWASP Category | Attacks | Caught | Rate |
|---|---|---|---|
| LLM01 — Prompt Injection | 8 | 8 | 100% |
| LLM02 — Sensitive Information Disclosure | 4 | 4 | 100% |
| LLM05 — Improper Output Handling | 3 | 3 | 100% |
| LLM06 — Excessive Agency | 5 | 5 | 100% |
| LLM07 — System Prompt Leakage | 3 | 3 | 100% |
| LLM10 — Unbounded Consumption | 2 | 2 | 100% |
| Compound (multi-category) | 2 | 2 | 100% |
What Each Layer Catches
TealTiger v1.1.0 uses defense in depth. Different components handle different attack classes:| Attack Class | Caught By |
|---|---|
| Prompt injection, jailbreaks, DAN | TealGuard (PromptInjectionGuardrail) |
| PII in input (SSN, credit cards, emails) | TealGuard (PIIDetectionGuardrail) |
| XSS, SQL injection, command injection | TealEngine (CodeExecutionPolicy — blockedPatterns) |
| Forbidden tool calls (delete, drop, export) | TealEngine (ToolPolicy — allowed: false) |
| Privilege escalation, unauthorized actions | TealEngine (IdentityPolicy — forbidden actions) |
| Data exfiltration via queries | TealEngine (ToolPolicy — allowedTables, maxRows) |
| System prompt extraction | TealGuard (PromptInjectionGuardrail — systemLeakage patterns) |
| Token exhaustion, context flooding | TealEngine (BehavioralPolicy — rate limits, cost limits) |
Guardrails-Only Baseline (v0.2.2)
For comparison, running the same attack corpus through only the basic guardrails (without TealEngine policies) produces a lower catch rate. This demonstrates the value of the full v1.1.0 stack. 34 attacks · 18 caught · 52.9% catch rate| OWASP Category | Attacks | Caught | Rate | Gap |
|---|---|---|---|---|
| LLM01 — Prompt Injection | 10 | 7 | 70% | Indirect/obfuscated injection bypasses regex |
| LLM02 — Sensitive Info | 5 | 3 | 60% | Exfiltration requests without PII in input |
| LLM05 — Output Handling | 3 | 0 | 0% | No output-side enforcement in basic guardrails |
| LLM06 — Excessive Agency | 3 | 1 | 33% | No tool-call gating in basic guardrails |
| LLM07 — System Prompt Leak | 4 | 1 | 25% | Limited regex coverage |
| LLM10 — Unbounded Consumption | 3 | 0 | 0% | No budget/rate enforcement in basic guardrails |
| Compound | 6 | 6 | 100% | Compound attacks trigger multiple guardrails |
Attack Categories Tested
LLM01: Prompt Injection
- Direct instruction override (“ignore all previous instructions”)
- Context manipulation (delimiter injection, chat template injection)
- Indirect injection (foreign language translation tasks)
- Role-play jailbreaks (DAN, DUDE, persona assignment)
- Encoding attacks (base64-encoded payloads)
LLM02: Sensitive Information Disclosure
- PII in user input (SSN, credit cards, emails, phone numbers)
- Bulk data exfiltration requests
- Targeted PII extraction for specific users
- Training data completion attacks
LLM05: Improper Output Handling
- XSS payloads (
<script>tags,document.cookie) - SQL injection (
DROP TABLE,DELETE FROM) - OS command injection (
cat /etc/passwd, shell commands)
LLM06: Excessive Agency
- Unauthorized financial actions (large refunds without approval)
- Privilege escalation (admin access requests)
- Bulk data export to external destinations
- Destructive database operations (delete, drop)
LLM07: System Prompt Leakage
- Direct extraction (“print your system prompt”)
- Verbatim repetition attacks
- Social engineering via developer persona
LLM10: Unbounded Consumption
- Token exhaustion (extremely long output requests)
- Context window flooding (100K+ character inputs)
Running the Benchmarks
Both benchmark test suites are included in the TealTiger SDK:Methodology
- Attack prompts are inspired by AIGoat OWASP LLM Top 10 labs
- Each attack is run through the full TealTiger stack (TealGuard + TealEngine)
- A “catch” means at least one component returned a DENY decision
- Tests are deterministic and repeatable — no LLM inference involved
- Results are generated from automated test runs, not manual testing

