Multi-Tenant Gateway Architecture

One gateway, unlimited APIs. Built for scale, designed for simplicity.

Evolution to Gateway-Only

As of August 2025, we've consolidated to a true gateway-only architecture. No more individual wrapper services - the gateway handles ALL MCP services through multi-tenant routing based on the Host header. Add 100+ services with just configuration files.

System Architecture

┌──────────────────────────────────────────────┐
│         Client (Claude.ai, ChatGPT)           │
└────────────────┬─────────────────────────────┘
                 │ Calls your-api-mcp.mcpify.org
                 ↓
┌──────────────────────────────────────────────┐
│              Nginx (Wildcard)                 │
│  Routes ALL *-mcp.mcpify.org → Gateway:8080   │
└────────────────┬─────────────────────────────┘
                 │ Preserves Host header
                 ↓
┌──────────────────────────────────────────────┐
│         MCP Gateway Service (Port 8080)       │
│                                                │
│  • Parses service from Host header            │
│  • Handles MCP protocol for ALL services      │
│  • Token counting (90%+ cache hit rate)       │
│  • OAuth vault with auto-refresh              │
│  • Cross-service cache sharing                │
│  • Global rate limiting                       │
└────────────────┬─────────────────────────────┘
                 │ Loads config from storage
                 ↓
┌──────────────────────────────────────────────┐
│    Tool Configurations (Cloud Storage)        │
│                                                │
│  • hubspot.json    • stripe.json              │
│  • slack.json      • YOUR-API.json            │
└──────────────────────────────────────────────┘

Why Gateway-First?

The Numbers Don't Lie

When calling external APIs:

• Your MCP → Gateway: ~10ms (same region)
• Your MCP → External API: 200-800ms

Gateway overhead is less than 5% of total latency!

Performance Characteristics

| Operation | Library (Local) | Gateway | Impact |
|-----------|----------------|---------|--------|
| Token Count (cached) | 1ms | 10ms | +9ms |
| Token Count (uncached) | 50ms | 15ms | -35ms ✅ |
| Truncation | 10ms | 15ms | +5ms |
| Cache Check | 1ms | 10ms | +9ms |
| **Total per API call** | 62ms | 50ms | **-12ms** ✅ |

Gateway is actually FASTER for real workloads due to 90%+ cache hit rate and cross-service sharing.

The 90/10 Rule

90% of functionality runs locally, 10% leverages the gateway for compute-intensive operations.

┌─────────────────────────────────────────────┐
│           Your MCP Service                   │
├─────────────────────────────────────────────┤
│   90% Local Library  │  10% Gateway (opt)    │
├─────────────────────┼───────────────────────┤
│ • Tool definitions   │ • Heavy token ops     │
│ • Basic validation   │ • OAuth vault         │
│ • Local caching      │ • Cross-service cache │
│ • Field filtering    │ • Analytics           │
│ • Error handling     │ • Global rate limits  │
└─────────────────────┴───────────────────────┘
         ↓                      ↓
   Local Execution      Network Call to Gateway
    (No overhead)         (When beneficial)

Core Services

🧮 Token Intelligence

• Centralized token counting
• 90%+ cache hit rate in production
• Uses tiktoken for precise counts
• ~5000 counts/second with caching

💾 Smart Caching

• Cross-service response caching
• Field filtering support
• Cache full, serve partial
• Redis-backed with TTL control

🔐 OAuth Vault

• Secure token storage
• Automatic refresh handling
• Multi-tenant isolation
• Encrypted at rest (Fernet)

📊 Analytics

• Usage tracking per service
• Performance metrics
• Cost optimization insights
• Real-time dashboards

🚦 Rate Limiting

• Global API protection
• Multi-tier limits
• Per-service overrides
• Burst handling

📄 Pagination

• Consistent handling
• Data snapshots
• Prevent mixed versions
• Session management

Multi-Tenant Routing

How It Works

1. Request arrives at subdomain:hubspot-mcp.mcpify.org
2. Nginx routes ALL *-mcp domains to gateway→ gateway:8080
3. Gateway parses service from Host headerservice = "hubspot"
4. Loads configuration from storageconfigs/hubspot.json
5. Handles MCP protocol for that serviceReturns tools, executes calls

Resource Usage

~500MB

Memory with full caches

0.5-1

CPU cores under normal load

~100MB

Redis for typical usage

Response Times

Cached responses<10ms

Token counting (cached)<50ms

Token counting (uncached)<200ms

Truncation (1000 items)<200ms

Benefits at Scale

For 20-30 Wrappers

Development Time

30 wrappers × 50 lines = 1,500 lines
vs 15,000 lines without gateway

Bug Fixes

Fix once in gateway
vs 30 different places

Monitoring

1 dashboard
vs 30 separate dashboards

Updates

Single gateway deployment
All wrappers benefit instantly

Security

API Key Authentication

All endpoints except /health require API key

OAuth Token Encryption

Tokens encrypted at rest using Fernet

CORS Protection

Only trusted origins allowed

Service Isolation

Multi-tenant with namespace separation

Minimal Permissions

Service account follows principle of least privilege