Cost Optimization - TealTiger Docs

Optimize infrastructure and LLM API costs with TealTiger’s built-in cost optimization features. This guide covers provider selection, request batching, caching strategies, and cost tracking.

Why Cost Optimization Matters

Benefits:

✅ Reduce LLM costs - Save 40-60% on API costs
✅ Prevent budget overruns - Set limits and get alerts
✅ Track spending - Per-user, per-feature cost attribution
✅ Optimize performance - Balance cost vs latency
✅ Forecast costs - Predict future spending

Cost Drivers:

LLM API calls (tokens, models, providers)
Infrastructure (compute, memory, storage)
Data transfer (egress, cross-region)
Monitoring and logging
Secrets management

Quick Start

Enable Cost Tracking

import { TealOpenAI } from 'tealtiger';

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  costTracking: {
    enabled: true,
    budgetLimit: 1000, // $1000/month
    budgetWindow: 'monthly',
    alerts: ['team@example.com']
  }
});

// Get cost report
const report = await client.getCostReport({
  timeRange: 'last_30_days'
});

console.log(`Total cost: $${report.totalCost.toFixed(2)}`);
console.log(`Cost by provider:`, report.costByProvider);

Automatic Provider Selection

Cost-Optimized Client

import { TealCostOptimizer } from 'tealtiger';

const optimizer = new TealCostOptimizer({
  providers: {
    openai: { apiKey: process.env.OPENAI_API_KEY },
    anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
    gemini: { apiKey: process.env.GEMINI_API_KEY }
  },
  strategy: 'lowest_cost', // or 'lowest_latency', 'balanced'
  constraints: {
    maxCost: 0.01, // $0.01 per request
    maxLatency: 2000 // 2 seconds
  }
});

// Automatically selects cheapest provider
const response = await optimizer.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
  model: 'gpt-4-equivalent' // Maps to cheapest equivalent
});

Provider Cost Comparison

// Compare costs before making request
const comparison = await optimizer.compareCosts({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  model: 'gpt-4-equivalent'
});

console.log('Cost comparison:');
comparison.providers.forEach(p => {
  console.log(`${p.provider}: $${p.estimatedCost.toFixed(4)} (${p.estimatedLatency}ms)`);
});

// Use recommended provider
const response = await optimizer.chat({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  provider: comparison.recommended
});

Optimization Strategies

Lowest Cost:

Always selects cheapest provider
Best for batch processing
May have higher latency

Lowest Latency:

Selects fastest provider
Best for real-time applications
May have higher costs

Balanced:

Optimizes cost-latency tradeoff
Best for most applications
Configurable weights

const optimizer = new TealCostOptimizer({
  strategy: 'balanced',
  weights: {
    cost: 0.6,    // 60% weight on cost
    latency: 0.4  // 40% weight on latency
  }
});

Request Batching

Batch Multiple Requests

import { TealBatchProcessor } from 'tealtiger';

const batchProcessor = new TealBatchProcessor({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY,
  batchSize: 10,
  batchWindow: 1000 // 1 second
});

// Add requests to batch
const promises = [];
for (let i = 0; i < 100; i++) {
  promises.push(
    batchProcessor.add({
      model: 'gpt-4',
      messages: [{ role: 'user', content: `Request ${i}` }]
    })
  );
}

// Automatically batched and sent
const responses = await Promise.all(promises);

// Cost savings: ~30-40% compared to individual requests

Caching Strategies

Response Caching

import { TealOpenAI } from 'tealtiger';

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  cache: {
    enabled: true,
    ttl: 3600, // 1 hour
    maxSize: 1000, // 1000 entries
    strategy: 'lru' // Least Recently Used
  }
});

// First request - cache miss
const response1 = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'What is AI?' }]
});

// Second request - cache hit (no API call, no cost)
const response2 = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'What is AI?' }]
});

Semantic Caching

// Cache similar requests
const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  cache: {
    enabled: true,
    type: 'semantic',
    similarityThreshold: 0.95 // 95% similarity
  }
});

// These will be cached as similar:
// "What is AI?" and "What is artificial intelligence?"

Budget Management

Set Budget Limits

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  budgetLimit: 100, // $100
  budgetWindow: 'daily', // or 'hourly', 'weekly', 'monthly'
  budgetAction: 'block' // or 'warn', 'throttle'
});

// When budget exceeded:
// - 'block': Reject requests
// - 'warn': Log warning, continue
// - 'throttle': Rate limit requests

Budget Alerts

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  budgetLimit: 1000,
  budgetAlerts: [
    { threshold: 0.5, action: 'email', recipients: ['team@example.com'] },
    { threshold: 0.8, action: 'slack', webhook: process.env.SLACK_WEBHOOK },
    { threshold: 0.95, action: 'pagerduty', key: process.env.PAGERDUTY_KEY }
  ]
});

Cost Attribution

Per-User Cost Tracking

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  costTracking: {
    enabled: true,
    dimensions: ['user_id', 'feature', 'environment']
  }
});

// Track cost per user
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  metadata: {
    user_id: 'user123',
    feature: 'chatbot',
    environment: 'production'
  }
});

// Get cost report by user
const report = await client.getCostReport({
  groupBy: 'user_id',
  timeRange: 'last_7_days'
});

Cost Allocation Tags

// Tag requests for cost allocation
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  tags: {
    team: 'engineering',
    project: 'chatbot-v2',
    cost_center: 'R&D'
  }
});

Infrastructure Cost Optimization

Serverless Optimization

# Lambda configuration
FunctionName: tealtiger-app
MemorySize: 512  # Right-size memory
Timeout: 30      # Reduce timeout
ReservedConcurrentExecutions: 10  # Limit concurrency

# Cost savings: ~40% compared to default settings

Container Optimization

# Multi-stage build for smaller images
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-alpine
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY app.py .

# Image size: 150MB (vs 800MB without optimization)

Kubernetes Resource Limits

resources:
  requests:
    cpu: 100m      # Right-size CPU
    memory: 256Mi  # Right-size memory
  limits:
    cpu: 500m
    memory: 512Mi

# Cost savings: ~50% compared to over-provisioning

Cost Monitoring

Real-Time Cost Dashboard

import { TealCostDashboard } from 'tealtiger';

const dashboard = new TealCostDashboard({
  refreshInterval: 60000, // 1 minute
  metrics: [
    'cost_per_hour',
    'cost_per_request',
    'cost_by_provider',
    'cost_by_model',
    'budget_utilization'
  ]
});

// Start dashboard server
dashboard.start(3000);
// Access at http://localhost:3000

Cost Reports

// Daily cost report
const dailyReport = await client.getCostReport({
  timeRange: 'today',
  groupBy: ['provider', 'model']
});

// Monthly cost report
const monthlyReport = await client.getCostReport({
  timeRange: 'this_month',
  groupBy: ['user_id', 'feature']
});

// Export to CSV
await monthlyReport.exportCSV('cost-report.csv');

Best Practices

Set budget limits to prevent overruns
Enable caching for repeated requests
Use batch processing when possible
Right-size infrastructure resources
Monitor costs in real-time
Use cost-optimized providers for non-critical workloads
Implement rate limiting to control usage
Track costs per user/feature for attribution

Cost Comparison

Provider Pricing (per 1M tokens)

Provider	Model	Input	Output	Total (1:1)
OpenAI	GPT-4	$30	$60	$45
OpenAI	GPT-3.5	$0.50	$1.50	$1.00
Anthropic	Claude 3 Opus	$15	$75	$45
Anthropic	Claude 3 Sonnet	$3	$15	$9
Google	Gemini Pro	$0.50	$1.50	$1.00
AWS	Bedrock Claude	$15	$75	$45
Cohere	Command	$1	$2	$1.50
Mistral	Large	$4	$12	$8

Infrastructure Pricing

Platform	Compute	Memory	Cost/Month
AWS Lambda	1GB-sec	128MB	$0.0000166667
Azure Functions	1GB-sec	128MB	$0.000016
Google Cloud Functions	1GB-sec	128MB	$0.0000025
Kubernetes (EKS)	1 vCPU	2GB	$73
Heroku	1 dyno	512MB	$25

Support

Documentation: docs.tealtiger.ai
Cost Calculator: tealtiger.ai/calculator
GitHub Issues: Report issues
Email: reachout@tealtiger.ai

Getting Started

Core Concepts

Advanced Concepts

Policy Design

Deployment

Integrations

API Reference - TypeScript

API Reference - Python

Cookbook

Governance & Compliance

Guides

Architecture

Audit & Telemetry

Reference

Versions & Roadmap

About

Playground

​Why Cost Optimization Matters

​Quick Start

​Enable Cost Tracking

​Automatic Provider Selection

​Cost-Optimized Client

​Provider Cost Comparison

​Optimization Strategies

​Request Batching

​Batch Multiple Requests

​Caching Strategies

​Response Caching

​Semantic Caching

​Budget Management

​Set Budget Limits

​Budget Alerts

​Cost Attribution

​Per-User Cost Tracking

​Cost Allocation Tags

​Infrastructure Cost Optimization

​Serverless Optimization

​Container Optimization

​Kubernetes Resource Limits

​Cost Monitoring

​Real-Time Cost Dashboard

​Cost Reports

​Best Practices

​Cost Comparison

​Provider Pricing (per 1M tokens)

​Infrastructure Pricing

​Support