Caching Strategies

Intelligent caching that reduces costs and latency while maintaining freshness

MCPify's caching layer is designed specifically for AI agents, providing complete transparency about cache state, freshness, and invalidation options. This allows agents to make intelligent decisions about when to use cached data versus fetching fresh results.

Multi-Tier Cache Architecture

Edge Cache (CDN)

Globally distributed edge locations for static content and frequently accessed data

Cached Content

  • • API schemas and metadata
  • • Tool descriptions
  • • Static configuration
  • • Public responses

Configuration

  • • TTL: 1-24 hours
  • • 150+ edge locations
  • • < 50ms global latency
  • • Automatic purge on update

Application Cache (Redis)

High-performance in-memory cache for API responses and computed results

Cached Content

  • • API responses
  • • Aggregated data
  • • Session state
  • • Temporary results

Configuration

  • • TTL: 1-60 minutes
  • • 100GB+ memory
  • • < 1ms latency
  • • LRU eviction policy

Query Cache (Database)

Persistent cache for expensive queries and historical data

Cached Content

  • • Complex query results
  • • Historical snapshots
  • • Materialized views
  • • Pre-computed reports

Configuration

  • • TTL: 1-7 days
  • • Unlimited storage
  • • < 10ms latency
  • • Version control

Cache Transparency for AI Agents

Every cached response includes metadata that helps AI agents understand the cache state:

{
  "data": {
    // ... actual response data ...
  },
  "meta": {
    "cache": {
      "status": "hit",              // hit, miss, refresh
      "key": "api:crm:contacts:list:page1",
      "timestamp": "2025-08-23T10:30:00Z",
      "ttl": 300,                   // seconds remaining
      "maxAge": 3600,               // configured max age
      "stale": false,               // is data stale?
      "revalidating": false,        // background refresh?
      "source": "redis",            // edge, redis, database, origin
      "tags": ["contacts", "crm"]   // cache tags for invalidation
    },
    "performance": {
      "responseTime": 12,           // milliseconds
      "cacheTime": 2,              // time to fetch from cache
      "originTime": null           // time to fetch from origin (if miss)
    }
  }
}

Agent Controls

Agents can control caching behavior per request:

  • cache: "no-cache" - Skip cache
  • cache: "force-cache" - Use cache if available
  • maxAge: 60 - Accept cache up to 60s old
  • staleWhileRevalidate: true - Use stale while refreshing

Invalidation Tools

Explicit cache invalidation capabilities:

  • Invalidate by key pattern
  • Invalidate by tags
  • Purge entire service cache
  • Webhook-triggered invalidation

Caching Strategies

Time-Based Caching (TTL)

Simple time-to-live based expiration for predictable data:

{
  "caching": {
    "strategy": "ttl",
    "ttl": 3600,              // 1 hour
    "vary": ["user", "lang"], // Cache key variations
    "private": false          // Can be shared across users
  }
}

Best for: Reference data, catalog information, slowly changing content

Event-Based Invalidation

Invalidate cache when specific events occur:

{
  "caching": {
    "strategy": "event",
    "ttl": 86400,            // 1 day max
    "invalidateOn": [
      "product.updated",
      "inventory.changed",
      "price.modified"
    ],
    "tags": ["products", "inventory"]
  }
}

Best for: Dynamic data with known change triggers, real-time systems

Adaptive Caching

AI-driven cache strategy based on access patterns:

{
  "caching": {
    "strategy": "adaptive",
    "minTTL": 60,           // Minimum 1 minute
    "maxTTL": 3600,         // Maximum 1 hour
    "factors": {
      "hitRate": 0.3,      // Weight for hit rate
      "cost": 0.4,        // Weight for computation cost
      "frequency": 0.3    // Weight for access frequency
    }
  }
}

Best for: Unpredictable access patterns, cost optimization

Performance Impact

Typical Performance Gains

Response Time Reduction95%
API Cost Reduction85%
Token Usage Optimization60%
Cache Hit Rate (Average)78%

Configuration Examples

Global Cache Configuration

// mcpify.config.js
export default {
  caching: {
    enabled: true,
    defaultTTL: 300,        // 5 minutes default
    maxTTL: 86400,         // 24 hours maximum
    
    // Redis configuration
    redis: {
      cluster: true,
      nodes: ["redis-1:6379", "redis-2:6379"],
      password: process.env.REDIS_PASSWORD,
      db: 0
    },
    
    // Edge cache configuration
    cdn: {
      provider: "cloudflare",
      zones: ["us", "eu", "asia"],
      purgeApi: process.env.CDN_PURGE_API
    },
    
    // Cache key generation
    keyStrategy: {
      includeHeaders: ["Authorization", "X-Tenant-ID"],
      includeQuery: true,
      hash: "sha256"
    },
    
    // Performance monitoring
    monitoring: {
      trackHitRate: true,
      trackLatency: true,
      exportMetrics: true
    }
  }
}

Per-Endpoint Cache Rules

{
  "endpoints": {
    "/api/products": {
      "cache": {
        "ttl": 3600,
        "tags": ["products"],
        "vary": ["category", "sort"]
      }
    },
    "/api/user/profile": {
      "cache": {
        "ttl": 300,
        "private": true,
        "vary": ["user_id"]
      }
    },
    "/api/realtime/stock": {
      "cache": {
        "enabled": false  // No caching for real-time data
      }
    }
  }
}

Caching Best Practices

✅ Do's

  • • Set appropriate TTLs based on data volatility
  • • Use cache tags for granular invalidation
  • • Monitor cache hit rates and adjust
  • • Implement cache warming for critical data
  • • Use stale-while-revalidate for better UX
  • • Respect cache headers from origin

❌ Don'ts

  • • Don't cache sensitive or personal data
  • • Don't use excessive TTLs for dynamic data
  • • Don't ignore cache invalidation events
  • • Don't cache error responses
  • • Don't forget to vary cache by user context
  • • Don't disable caching without measurement

Cache Monitoring

MCPify provides comprehensive cache analytics to optimize your caching strategy:

Hit Rate Analysis

Track cache effectiveness per endpoint

TTL Optimization

AI-recommended TTL adjustments

Invalidation Tracking

Monitor cache purge patterns

Related Documentation