Multi-Tenant Gateway Architecture
One gateway, unlimited APIs. Built for scale, designed for simplicity.
Evolution to Gateway-Only
As of August 2025, we've consolidated to a true gateway-only architecture. No more individual wrapper services - the gateway handles ALL MCP services through multi-tenant routing based on the Host header. Add 100+ services with just configuration files.
System Architecture
┌──────────────────────────────────────────────┐ │ Client (Claude.ai, ChatGPT) │ └────────────────┬─────────────────────────────┘ │ Calls your-api-mcp.mcpify.org ↓ ┌──────────────────────────────────────────────┐ │ Nginx (Wildcard) │ │ Routes ALL *-mcp.mcpify.org → Gateway:8080 │ └────────────────┬─────────────────────────────┘ │ Preserves Host header ↓ ┌──────────────────────────────────────────────┐ │ MCP Gateway Service (Port 8080) │ │ │ │ • Parses service from Host header │ │ • Handles MCP protocol for ALL services │ │ • Token counting (90%+ cache hit rate) │ │ • OAuth vault with auto-refresh │ │ • Cross-service cache sharing │ │ • Global rate limiting │ └────────────────┬─────────────────────────────┘ │ Loads config from storage ↓ ┌──────────────────────────────────────────────┐ │ Tool Configurations (Cloud Storage) │ │ │ │ • hubspot.json • stripe.json │ │ • slack.json • YOUR-API.json │ └──────────────────────────────────────────────┘
Why Gateway-First?
The Numbers Don't Lie
When calling external APIs:
- • Your MCP → Gateway: ~10ms (same region)
- • Your MCP → External API: 200-800ms
Gateway overhead is less than 5% of total latency!
Performance Characteristics
| Operation | Library (Local) | Gateway | Impact | |-----------|----------------|---------|--------| | Token Count (cached) | 1ms | 10ms | +9ms | | Token Count (uncached) | 50ms | 15ms | -35ms ✅ | | Truncation | 10ms | 15ms | +5ms | | Cache Check | 1ms | 10ms | +9ms | | **Total per API call** | 62ms | 50ms | **-12ms** ✅ |
Gateway is actually FASTER for real workloads due to 90%+ cache hit rate and cross-service sharing.
The 90/10 Rule
90% of functionality runs locally, 10% leverages the gateway for compute-intensive operations.
┌─────────────────────────────────────────────┐ │ Your MCP Service │ ├─────────────────────────────────────────────┤ │ 90% Local Library │ 10% Gateway (opt) │ ├─────────────────────┼───────────────────────┤ │ • Tool definitions │ • Heavy token ops │ │ • Basic validation │ • OAuth vault │ │ • Local caching │ • Cross-service cache │ │ • Field filtering │ • Analytics │ │ • Error handling │ • Global rate limits │ └─────────────────────┴───────────────────────┘ ↓ ↓ Local Execution Network Call to Gateway (No overhead) (When beneficial)
Core Services
🧮 Token Intelligence
- • Centralized token counting
- • 90%+ cache hit rate in production
- • Uses tiktoken for precise counts
- • ~5000 counts/second with caching
💾 Smart Caching
- • Cross-service response caching
- • Field filtering support
- • Cache full, serve partial
- • Redis-backed with TTL control
🔐 OAuth Vault
- • Secure token storage
- • Automatic refresh handling
- • Multi-tenant isolation
- • Encrypted at rest (Fernet)
📊 Analytics
- • Usage tracking per service
- • Performance metrics
- • Cost optimization insights
- • Real-time dashboards
🚦 Rate Limiting
- • Global API protection
- • Multi-tier limits
- • Per-service overrides
- • Burst handling
📄 Pagination
- • Consistent handling
- • Data snapshots
- • Prevent mixed versions
- • Session management
Multi-Tenant Routing
How It Works
- 1. Request arrives at subdomain:
hubspot-mcp.mcpify.org
- 2. Nginx routes ALL *-mcp domains to gateway
→ gateway:8080
- 3. Gateway parses service from Host header
service = "hubspot"
- 4. Loads configuration from storage
configs/hubspot.json
- 5. Handles MCP protocol for that service
Returns tools, executes calls
Resource Usage
Response Times
Benefits at Scale
For 20-30 Wrappers
Development Time
30 wrappers × 50 lines = 1,500 lines
vs 15,000 lines without gateway
Bug Fixes
Fix once in gateway
vs 30 different places
Monitoring
1 dashboard
vs 30 separate dashboards
Updates
Single gateway deployment
All wrappers benefit instantly
Security
API Key Authentication
All endpoints except /health require API key
OAuth Token Encryption
Tokens encrypted at rest using Fernet
CORS Protection
Only trusted origins allowed
Service Isolation
Multi-tenant with namespace separation
Minimal Permissions
Service account follows principle of least privilege