Overview
Visca AI Gateway implements flexible rate limiting to protect infrastructure, ensure fair usage, and help you manage costs. Rate limits can be configured per API key, user, or model.Default Limits
Requests per Minute
Free Tier: 60 RPM Pro: 600 RPM Enterprise: Custom
Tokens per Minute
Free Tier: 40,000 TPM Pro: 400,000 TPM Enterprise: Custom
Concurrent Requests
Free Tier: 5 Pro: 20 Enterprise: Unlimited
Daily Request Limit
Free Tier: 5,000 Pro: Unlimited Enterprise: Unlimited
Model-Specific Limits
Different models have different rate limits based on provider constraints:| Model | Requests/Min | Tokens/Min | Notes |
|---|---|---|---|
| GPT-4 Turbo | 500 | 300,000 | Shared across all GPT-4 variants |
| GPT-4 | 200 | 40,000 | Lower limit due to capacity |
| GPT-3.5 Turbo | 3,500 | 350,000 | Highest throughput |
| Claude 3.5 Sonnet | 1,000 | 400,000 | High capacity model |
| Claude 3 Opus | 200 | 40,000 | Premium model, lower limits |
| Gemini 1.5 Pro | 360 | 4,000,000 | Highest token limit |
| DALL-E 3 | 50 | N/A | Image generation only |
| Embeddings | 3,000 | 1,000,000 | High throughput for RAG |
Rate Limit Headers
Every API response includes rate limit information:Header Meanings
X-RateLimit-Limit-Requests: Maximum requests per minuteX-RateLimit-Remaining-Requests: Requests left in current windowX-RateLimit-Reset-Requests: When request limit resetsX-RateLimit-Limit-Tokens: Maximum tokens per minuteX-RateLimit-Remaining-Tokens: Tokens left in current windowX-RateLimit-Reset-Tokens: When token limit resetsRetry-After: Seconds to wait before retrying (on 429 errors)
Handling Rate Limits
Exponential Backoff
Implement exponential backoff when you hit limits:Request Queuing
Queue requests to stay within limits:Configuring Custom Limits
Per API Key
Set limits when creating API keys:Per User
Set user-specific limits:Per Model
Override default model limits:Monitoring Usage
Dashboard
View real-time usage in the dashboard:- Current RPM and TPM
- Historical usage graphs
- Top consumers
- Rate limit violations
API
Query usage programmatically:Best Practices
Check Headers
Always check rate limit headers before making requests
Implement Backoff
Use exponential backoff for automatic retry
Batch Requests
Combine multiple operations when possible
Cache Responses
Cache common responses to reduce API calls
Use Streaming
Streaming doesn’t reduce limits but improves UX
Monitor Usage
Set up alerts for approaching rate limits
Common Errors
429 Too Many Requests
429 Too Many Requests
Cause: Exceeded requests per minute or tokens per minuteSolution: Implement exponential backoff and check Retry-After header
402 Payment Required
402 Payment Required
Cause: Exceeded daily request limit on free tierSolution: Upgrade to Pro plan or wait until limit resets
503 Service Unavailable
503 Service Unavailable
Enterprise Features
Unlimited Requests
No rate limits on number of requests
Custom Token Limits
Set token limits based on your needs
Priority Queue
Your requests processed first during high load
Dedicated Capacity
Reserved infrastructure for your workload