Multi-Tenant Gateway Architecture

One gateway, unlimited APIs. Built for scale, designed for simplicity.

Evolution to Gateway-Only

As of August 2025, we've consolidated to a true gateway-only architecture. No more individual wrapper services - the gateway handles ALL MCP services through multi-tenant routing based on the Host header. Add 100+ services with just configuration files.

System Architecture

┌──────────────────────────────────────────────┐
│         Client (Claude.ai, ChatGPT)           │
└────────────────┬─────────────────────────────┘
                 │ Calls your-api-mcp.mcpify.org
                 ↓
┌──────────────────────────────────────────────┐
│              Nginx (Wildcard)                 │
│  Routes ALL *-mcp.mcpify.org → Gateway:8080   │
└────────────────┬─────────────────────────────┘
                 │ Preserves Host header
                 ↓
┌──────────────────────────────────────────────┐
│         MCP Gateway Service (Port 8080)       │
│                                                │
│  • Parses service from Host header            │
│  • Handles MCP protocol for ALL services      │
│  • Token counting (90%+ cache hit rate)       │
│  • OAuth vault with auto-refresh              │
│  • Cross-service cache sharing                │
│  • Global rate limiting                       │
└────────────────┬─────────────────────────────┘
                 │ Loads config from storage
                 ↓
┌──────────────────────────────────────────────┐
│    Tool Configurations (Cloud Storage)        │
│                                                │
│  • hubspot.json    • stripe.json              │
│  • slack.json      • YOUR-API.json            │
└──────────────────────────────────────────────┘

Why Gateway-First?

The Numbers Don't Lie

When calling external APIs:

  • Your MCP → Gateway: ~10ms (same region)
  • Your MCP → External API: 200-800ms

Gateway overhead is less than 5% of total latency!

Performance Characteristics

| Operation | Library (Local) | Gateway | Impact |
|-----------|----------------|---------|--------|
| Token Count (cached) | 1ms | 10ms | +9ms |
| Token Count (uncached) | 50ms | 15ms | -35ms ✅ |
| Truncation | 10ms | 15ms | +5ms |
| Cache Check | 1ms | 10ms | +9ms |
| **Total per API call** | 62ms | 50ms | **-12ms** ✅ |

Gateway is actually FASTER for real workloads due to 90%+ cache hit rate and cross-service sharing.

The 90/10 Rule

90% of functionality runs locally, 10% leverages the gateway for compute-intensive operations.

┌─────────────────────────────────────────────┐
│           Your MCP Service                   │
├─────────────────────────────────────────────┤
│   90% Local Library  │  10% Gateway (opt)    │
├─────────────────────┼───────────────────────┤
│ • Tool definitions   │ • Heavy token ops     │
│ • Basic validation   │ • OAuth vault         │
│ • Local caching      │ • Cross-service cache │
│ • Field filtering    │ • Analytics           │
│ • Error handling     │ • Global rate limits  │
└─────────────────────┴───────────────────────┘
         ↓                      ↓
   Local Execution      Network Call to Gateway
    (No overhead)         (When beneficial)

Core Services

🧮 Token Intelligence

  • • Centralized token counting
  • • 90%+ cache hit rate in production
  • • Uses tiktoken for precise counts
  • • ~5000 counts/second with caching

💾 Smart Caching

  • • Cross-service response caching
  • • Field filtering support
  • • Cache full, serve partial
  • • Redis-backed with TTL control

🔐 OAuth Vault

  • • Secure token storage
  • • Automatic refresh handling
  • • Multi-tenant isolation
  • • Encrypted at rest (Fernet)

📊 Analytics

  • • Usage tracking per service
  • • Performance metrics
  • • Cost optimization insights
  • • Real-time dashboards

🚦 Rate Limiting

  • • Global API protection
  • • Multi-tier limits
  • • Per-service overrides
  • • Burst handling

📄 Pagination

  • • Consistent handling
  • • Data snapshots
  • • Prevent mixed versions
  • • Session management

Multi-Tenant Routing

How It Works

  1. 1. Request arrives at subdomain:hubspot-mcp.mcpify.org
  2. 2. Nginx routes ALL *-mcp domains to gateway→ gateway:8080
  3. 3. Gateway parses service from Host headerservice = "hubspot"
  4. 4. Loads configuration from storageconfigs/hubspot.json
  5. 5. Handles MCP protocol for that serviceReturns tools, executes calls

Resource Usage

~500MB
Memory with full caches
0.5-1
CPU cores under normal load
~100MB
Redis for typical usage

Response Times

Cached responses<10ms
Token counting (cached)<50ms
Token counting (uncached)<200ms
Truncation (1000 items)<200ms

Benefits at Scale

For 20-30 Wrappers

Development Time

30 wrappers × 50 lines = 1,500 lines
vs 15,000 lines without gateway

Bug Fixes

Fix once in gateway
vs 30 different places

Monitoring

1 dashboard
vs 30 separate dashboards

Updates

Single gateway deployment
All wrappers benefit instantly

Security

API Key Authentication

All endpoints except /health require API key

OAuth Token Encryption

Tokens encrypted at rest using Fernet

CORS Protection

Only trusted origins allowed

Service Isolation

Multi-tenant with namespace separation

Minimal Permissions

Service account follows principle of least privilege