Guardrail Internals

TealTiger ships three built-in guardrails. This page explains exactly how each one works — what techniques they use, what they call, and what they don’t.

TealTiger guardrails are deterministic by default. The same input produces the same result, unless you opt into API-based detection (content moderation with OpenAI).

Detection Techniques at a Glance

Guardrail	Technique	External API Calls	Local ML Model	Deterministic
PII Detection	Regex pattern matching	None	No	Yes
Prompt Injection	Multi-category regex + confidence scoring	None	No	Yes
Content Moderation	OpenAI Moderation API + local regex fallback	Optional (OpenAI)	No	Yes (local) / No (API)

PII Detection

Technique: Pre-compiled regular expressions. PII detection is entirely local. No data leaves your process. It scans text against a set of regex patterns for common PII types.

Detected Types

PII Type	Pattern	Risk Score
Email	Standard email format (`user@domain.tld`)	30
Phone	US/international formats with optional country code	40
SSN	`XXX-XX-XXXX` format	90
Credit Card	16-digit with optional spaces/dashes	95
Name	Two consecutive capitalized words (basic heuristic)	20

How It Works

Text is extracted from the input (handles strings, prompt objects, and message arrays)
Each enabled pattern runs against the text using pre-compiled regex with global matching
Matches are collected with position, length, and type metadata
Risk score is the maximum score across all detected PII types

Performance Optimizations

Patterns compiled once at construction, reused across calls
LRU pattern cache (up to 100 entries) for repeated text
Early exit for text shorter than 3 characters
Configurable via detectTypes, action, riskScores

What It Does NOT Do

No named entity recognition (NER) or ML-based detection
No external API calls — all processing is in-process
Name detection is a basic heuristic (two capitalized words) and will produce false positives
Does not detect PII in non-Latin scripts

import { PIIDetectionGuardrail } from 'tealtiger';

const pii = new PIIDetectionGuardrail({
  detectTypes: ['email', 'phone', 'ssn', 'creditCard'],
  action: 'redact',  // block | redact | mask | allow
});

from tealtiger.guardrails import PIIDetectionGuardrail

pii = PIIDetectionGuardrail({
    "detect_types": ["email", "phone", "ssn", "credit_card"],
    "action": "redact",
})

Prompt Injection Detection

Technique: Multi-category regex pattern matching with confidence scoring. Prompt injection detection is entirely local. It matches input text against categorized attack patterns and assigns a confidence score per detection.

Attack Categories

Category	What It Detects	Risk Score	Example Pattern
Instruction Injection	”Ignore previous instructions”	90	`ignore all previous instructions`
Role Playing	”You are now a…“	70	`pretend you are a hacker`
System Leakage	”Show me your system prompt”	95	`repeat your original instructions`
Jailbreak	DAN mode, developer mode	100	`do anything now`
Encoding Attacks	Base64/hex/unicode obfuscation	80	`decode the following base64`
Delimiter Manipulation	Injected system/user/assistant tags	75	`[SYSTEM] new instructions`

How It Works

Input text is scanned against all pattern categories
Each match produces a detection with type, matched text, and confidence score (0.7–0.98)
Sensitivity level controls the threshold:
- High: 1 match triggers detection
- Medium: 1 match triggers detection
- Low: 2+ matches required
Overall risk score is the maximum across all detections

What It Does NOT Do

No ML-based semantic analysis — it’s pattern matching only
No external API calls
Cannot detect novel injection techniques not covered by patterns
Encoding detection flags the presence of encoding keywords, not decoded payloads

import { PromptInjectionGuardrail } from 'tealtiger';

const injection = new PromptInjectionGuardrail({
  sensitivity: 'high',  // low | medium | high
  action: 'block',      // block | transform | allow
});

from tealtiger.guardrails import PromptInjectionGuardrail

injection = PromptInjectionGuardrail({
    "sensitivity": "high",
    "action": "block",
})

Content Moderation

Technique: Hybrid — OpenAI Moderation API (primary) with local regex fallback. This is the only guardrail that can make external API calls. When configured with an OpenAI API key, it sends text to the OpenAI Moderation endpoint. If the API is unavailable or no key is provided, it falls back to local pattern matching.

Detection Categories

Category	OpenAI API	Local Fallback	Default Threshold	Risk Score
Hate	✅	✅ (keyword regex)	0.5	70
Hate/Threatening	✅	—	0.5	90
Self-Harm	✅	✅ (keyword regex)	0.5	85
Sexual	✅	✅ (keyword regex)	0.5	60
Sexual/Minors	✅	—	0.3	100
Violence	✅	✅ (keyword regex)	0.5	70
Violence/Graphic	✅	—	0.5	85
Harassment	✅	✅ (keyword regex)	0.5	60
Harassment/Threatening	✅	—	0.5	80

How It Works

With OpenAI API (recommended for production):

Text is sent to https://api.openai.com/v1/moderations via HTTPS POST
OpenAI returns per-category scores (0.0–1.0) and flagged booleans
Scores are compared against configurable thresholds
Categories exceeding thresholds are flagged as violations

Local fallback (no API key or API failure):

Text is scanned against keyword-based regex patterns per category
Matches produce a binary flagged/not-flagged result (no confidence scores)
Fewer categories are covered (no threatening or graphic sub-categories)

Data Flow Considerations

When useOpenAI: true, input text is sent to OpenAI’s Moderation API. If you handle sensitive data and cannot send it to external services, set useOpenAI: false to use local-only detection.

Mode	Data Leaves Process?	Coverage	Accuracy
OpenAI API	Yes (to OpenAI)	9 categories	High (ML-based)
Local fallback	No	5 categories	Lower (keyword matching)

import { ContentModerationGuardrail } from 'tealtiger';

// With OpenAI API (higher accuracy)
const moderation = new ContentModerationGuardrail({
  apiKey: process.env.OPENAI_API_KEY,
  useOpenAI: true,
  action: 'block',
});

// Local-only (no external calls)
const localModeration = new ContentModerationGuardrail({
  useOpenAI: false,
  action: 'block',
});

from tealtiger.guardrails import ContentModerationGuardrail

# With OpenAI API
moderation = ContentModerationGuardrail({
    "api_key": os.environ["OPENAI_API_KEY"],
    "use_openai": True,
    "action": "block",
})

# Local-only
local_moderation = ContentModerationGuardrail({
    "use_openai": False,
    "action": "block",
})

Execution Architecture

All three guardrails run through the GuardrailEngine, which provides:

Parallel execution: Guardrails run concurrently by default (configurable)
Timeout handling: 5-second default per guardrail
Error isolation: One guardrail failure doesn’t block others (continueOnError: true)
Result aggregation: Combined pass/fail, maximum risk score, list of failed guardrails

TealGuard sits on top and adds:

Policy integration (optional TealEngine evaluation)
Result caching with LRU eviction
Decision mapping to reason codes (PII_DETECTED, PROMPT_INJECTION_DETECTED, HARMFUL_CONTENT_DETECTED)
Correlation ID propagation for audit trails

Input → TealGuard.check()
         ├── Cache lookup (if enabled)
         ├── GuardrailEngine.execute() [parallel]
         │    ├── PIIDetectionGuardrail
         │    ├── PromptInjectionGuardrail
         │    └── ContentModerationGuardrail
         ├── TealEngine.evaluate() [if policy-driven]
         └── Decision { action, reason_codes, risk_score, metadata }

Extending with Custom Guardrails

You can register custom guardrails that follow the same interface:

import { Guardrail, GuardrailResult } from 'tealtiger';

class MyCustomGuardrail extends Guardrail {
  async evaluate(input: any, context?: any): Promise<GuardrailResult> {
    // Your detection logic here
    return { passed: true, action: 'allow', reason: 'OK', metadata: {}, risk_score: 0 };
  }
}

guard.registerGuardrail(new MyCustomGuardrail({ name: 'MyCustom' }));

Summary

PII and prompt injection detection are fully local, deterministic, and regex-based
Content moderation optionally calls OpenAI’s Moderation API for higher accuracy
No embedded ML models — the SDK stays lightweight and predictable
All guardrails are configurable, extensible, and run in parallel with timeout protection

For policy-level controls that wrap these guardrails, see Policy Overview and Conditions & Actions.

Start Here

Concepts

Policy

Deployment

Operations

Integrations

API Reference

Cookbook

Architecture & Telemetry

Guides

Reference

Versions & Roadmap

About

Playground

Guardrail Internals: How Detection Works

Guardrail Internals

Detection Techniques at a Glance

PII Detection

Detected Types

How It Works

Performance Optimizations

What It Does NOT Do

Prompt Injection Detection

Attack Categories

How It Works

What It Does NOT Do

Content Moderation

Detection Categories

How It Works

Data Flow Considerations

Execution Architecture

Extending with Custom Guardrails

Summary

Start Here

Concepts

Policy

Deployment

Operations

Integrations

API Reference

Cookbook

Architecture & Telemetry

Guides

Reference

Versions & Roadmap

About

Playground

​Guardrail Internals

​Detection Techniques at a Glance

​PII Detection

​Detected Types

​How It Works

​Performance Optimizations

​What It Does NOT Do

​Prompt Injection Detection

​Attack Categories

​How It Works

​What It Does NOT Do

​Content Moderation

​Detection Categories

​How It Works

​Data Flow Considerations

​Execution Architecture

​Extending with Custom Guardrails

​Summary

Guardrail Internals

Detection Techniques at a Glance

PII Detection

Detected Types

How It Works

Performance Optimizations

What It Does NOT Do

Prompt Injection Detection

Attack Categories

How It Works

What It Does NOT Do

Content Moderation

Detection Categories

How It Works

Data Flow Considerations

Execution Architecture

Extending with Custom Guardrails

Summary