Skip to main content
Optimize infrastructure and LLM API costs with TealTiger’s built-in cost optimization features. This guide covers provider selection, request batching, caching strategies, and cost tracking.

Why Cost Optimization Matters

Benefits:
  • Reduce LLM costs - Save 40-60% on API costs
  • Prevent budget overruns - Set limits and get alerts
  • Track spending - Per-user, per-feature cost attribution
  • Optimize performance - Balance cost vs latency
  • Forecast costs - Predict future spending
Cost Drivers:
  • LLM API calls (tokens, models, providers)
  • Infrastructure (compute, memory, storage)
  • Data transfer (egress, cross-region)
  • Monitoring and logging
  • Secrets management

Quick Start

Enable Cost Tracking

import { TealOpenAI } from 'tealtiger';

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  costTracking: {
    enabled: true,
    budgetLimit: 1000, // $1000/month
    budgetWindow: 'monthly',
    alerts: ['team@example.com']
  }
});

// Get cost report
const report = await client.getCostReport({
  timeRange: 'last_30_days'
});

console.log(`Total cost: $${report.totalCost.toFixed(2)}`);
console.log(`Cost by provider:`, report.costByProvider);

Automatic Provider Selection

Cost-Optimized Client

import { TealCostOptimizer } from 'tealtiger';

const optimizer = new TealCostOptimizer({
  providers: {
    openai: { apiKey: process.env.OPENAI_API_KEY },
    anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
    gemini: { apiKey: process.env.GEMINI_API_KEY }
  },
  strategy: 'lowest_cost', // or 'lowest_latency', 'balanced'
  constraints: {
    maxCost: 0.01, // $0.01 per request
    maxLatency: 2000 // 2 seconds
  }
});

// Automatically selects cheapest provider
const response = await optimizer.chat({
  messages: [{ role: 'user', content: 'Hello!' }],
  model: 'gpt-4-equivalent' // Maps to cheapest equivalent
});

Provider Cost Comparison

// Compare costs before making request
const comparison = await optimizer.compareCosts({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  model: 'gpt-4-equivalent'
});

console.log('Cost comparison:');
comparison.providers.forEach(p => {
  console.log(`${p.provider}: $${p.estimatedCost.toFixed(4)} (${p.estimatedLatency}ms)`);
});

// Use recommended provider
const response = await optimizer.chat({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  provider: comparison.recommended
});

Optimization Strategies

Lowest Cost:
  • Always selects cheapest provider
  • Best for batch processing
  • May have higher latency
Lowest Latency:
  • Selects fastest provider
  • Best for real-time applications
  • May have higher costs
Balanced:
  • Optimizes cost-latency tradeoff
  • Best for most applications
  • Configurable weights
const optimizer = new TealCostOptimizer({
  strategy: 'balanced',
  weights: {
    cost: 0.6,    // 60% weight on cost
    latency: 0.4  // 40% weight on latency
  }
});

Request Batching

Batch Multiple Requests

import { TealBatchProcessor } from 'tealtiger';

const batchProcessor = new TealBatchProcessor({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY,
  batchSize: 10,
  batchWindow: 1000 // 1 second
});

// Add requests to batch
const promises = [];
for (let i = 0; i < 100; i++) {
  promises.push(
    batchProcessor.add({
      model: 'gpt-4',
      messages: [{ role: 'user', content: `Request ${i}` }]
    })
  );
}

// Automatically batched and sent
const responses = await Promise.all(promises);

// Cost savings: ~30-40% compared to individual requests

Caching Strategies

Response Caching

import { TealOpenAI } from 'tealtiger';

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  cache: {
    enabled: true,
    ttl: 3600, // 1 hour
    maxSize: 1000, // 1000 entries
    strategy: 'lru' // Least Recently Used
  }
});

// First request - cache miss
const response1 = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'What is AI?' }]
});

// Second request - cache hit (no API call, no cost)
const response2 = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'What is AI?' }]
});

Semantic Caching

// Cache similar requests
const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  cache: {
    enabled: true,
    type: 'semantic',
    similarityThreshold: 0.95 // 95% similarity
  }
});

// These will be cached as similar:
// "What is AI?" and "What is artificial intelligence?"

Budget Management

Set Budget Limits

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  budgetLimit: 100, // $100
  budgetWindow: 'daily', // or 'hourly', 'weekly', 'monthly'
  budgetAction: 'block' // or 'warn', 'throttle'
});

// When budget exceeded:
// - 'block': Reject requests
// - 'warn': Log warning, continue
// - 'throttle': Rate limit requests

Budget Alerts

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  budgetLimit: 1000,
  budgetAlerts: [
    { threshold: 0.5, action: 'email', recipients: ['team@example.com'] },
    { threshold: 0.8, action: 'slack', webhook: process.env.SLACK_WEBHOOK },
    { threshold: 0.95, action: 'pagerduty', key: process.env.PAGERDUTY_KEY }
  ]
});

Cost Attribution

Per-User Cost Tracking

const client = new TealOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  costTracking: {
    enabled: true,
    dimensions: ['user_id', 'feature', 'environment']
  }
});

// Track cost per user
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  metadata: {
    user_id: 'user123',
    feature: 'chatbot',
    environment: 'production'
  }
});

// Get cost report by user
const report = await client.getCostReport({
  groupBy: 'user_id',
  timeRange: 'last_7_days'
});

Cost Allocation Tags

// Tag requests for cost allocation
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
  tags: {
    team: 'engineering',
    project: 'chatbot-v2',
    cost_center: 'R&D'
  }
});

Infrastructure Cost Optimization

Serverless Optimization

# Lambda configuration
FunctionName: tealtiger-app
MemorySize: 512  # Right-size memory
Timeout: 30      # Reduce timeout
ReservedConcurrentExecutions: 10  # Limit concurrency

# Cost savings: ~40% compared to default settings

Container Optimization

# Multi-stage build for smaller images
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-alpine
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY app.py .

# Image size: 150MB (vs 800MB without optimization)

Kubernetes Resource Limits

resources:
  requests:
    cpu: 100m      # Right-size CPU
    memory: 256Mi  # Right-size memory
  limits:
    cpu: 500m
    memory: 512Mi

# Cost savings: ~50% compared to over-provisioning

Cost Monitoring

Real-Time Cost Dashboard

import { TealCostDashboard } from 'tealtiger';

const dashboard = new TealCostDashboard({
  refreshInterval: 60000, // 1 minute
  metrics: [
    'cost_per_hour',
    'cost_per_request',
    'cost_by_provider',
    'cost_by_model',
    'budget_utilization'
  ]
});

// Start dashboard server
dashboard.start(3000);
// Access at http://localhost:3000

Cost Reports

// Daily cost report
const dailyReport = await client.getCostReport({
  timeRange: 'today',
  groupBy: ['provider', 'model']
});

// Monthly cost report
const monthlyReport = await client.getCostReport({
  timeRange: 'this_month',
  groupBy: ['user_id', 'feature']
});

// Export to CSV
await monthlyReport.exportCSV('cost-report.csv');

Best Practices

  1. Set budget limits to prevent overruns
  2. Enable caching for repeated requests
  3. Use batch processing when possible
  4. Right-size infrastructure resources
  5. Monitor costs in real-time
  6. Use cost-optimized providers for non-critical workloads
  7. Implement rate limiting to control usage
  8. Track costs per user/feature for attribution

Cost Comparison

Provider Pricing (per 1M tokens)

ProviderModelInputOutputTotal (1:1)
OpenAIGPT-4$30$60$45
OpenAIGPT-3.5$0.50$1.50$1.00
AnthropicClaude 3 Opus$15$75$45
AnthropicClaude 3 Sonnet$3$15$9
GoogleGemini Pro$0.50$1.50$1.00
AWSBedrock Claude$15$75$45
CohereCommand$1$2$1.50
MistralLarge$4$12$8

Infrastructure Pricing

PlatformComputeMemoryCost/Month
AWS Lambda1GB-sec128MB$0.0000166667
Azure Functions1GB-sec128MB$0.000016
Google Cloud Functions1GB-sec128MB$0.0000025
Kubernetes (EKS)1 vCPU2GB$73
Heroku1 dyno512MB$25

Support