Overview
Visca AI Gateway’s intelligent routing automatically selects the best provider for each request based on your specified strategy. This ensures optimal cost, latency, and reliability for your AI applications.Cost Optimization
Save up to 90% by routing to the cheapest provider
Low Latency
Route to the fastest provider for your region
High Availability
Automatic failover when providers are down
Load Balancing
Distribute load across multiple providers
Routing Strategies
Cost-Optimized Routing
Automatically routes requests to the cheapest provider that supports the requested model capabilities.- Analyzes model requirements (context length, capabilities)
- Finds equivalent models across providers
- Compares pricing per 1M tokens
- Routes to the cheapest option
- OpenAI GPT-4o: $5.00 / 1M input tokens
- Alternative provider: $0.50 / 1M input tokens
- Savings: 90%
Latency-Optimized Routing
Routes to the provider with the lowest latency for your geographic region.- Continuously monitors provider latency
- Considers geographic distance
- Routes to fastest responding provider
- Adapts to real-time performance
- Standard routing: 500-1000ms
- Latency-optimized: 150-300ms
- Improvement: up to 3x faster
Priority-Based Routing
Define a custom provider preference order with automatic failover.- Compliance requirements (prefer specific providers)
- Contract commitments (use specific quotas first)
- Quality preferences (prioritize certain providers)
Load-Balanced Routing
Distribute requests evenly across multiple providers to maximize throughput and reliability.- Avoid rate limits on individual providers
- Better resilience during high traffic
- Improved overall throughput
Automatic Failover
If a provider is unavailable or returns an error, requests automatically fail over to a backup provider.1
Primary provider fails
Request sent to primary provider returns 503 or times out
2
Automatic retry
Gateway immediately retries with next available provider
3
Seamless response
User receives response without knowing about the failover
4
Health monitoring
Failed provider marked as unhealthy, health checks resume
Configuring Failover
- Default Behavior
- Custom Configuration
- Disable Failover
Failover is enabled by default with these settings:
- Max retries: 3
- Timeout: 30 seconds
- Backoff: Exponential (1s, 2s, 4s)
- Fallback providers: Automatic selection
Failover Scenarios
Provider Downtime
Provider Downtime
Scenario: OpenAI experiences an outageBehavior:
- Request fails with 503 error
- Gateway routes to Anthropic Claude
- Response returned seamlessly
- OpenAI marked unhealthy for 5 minutes
Rate Limiting
Rate Limiting
Scenario: Hit rate limit on primary provider Behavior: 1. Receives 429
rate limit error 2. Immediately routes to backup provider 3. Original provider
recovers after rate limit window
Timeout
Timeout
Scenario: Provider takes too long to respond Behavior: 1. Request
times out after 30 seconds 2. Gateway cancels and retries with faster provider
3. Slow provider latency tracked for future routing
Invalid Response
Invalid Response
Scenario: Provider returns malformed responseBehavior:
- Gateway detects invalid JSON/format
- Automatically retries with another provider
- Logs error for investigation
Model Equivalency
Gateway automatically maps model requests to equivalent models across providers:| Original Request | Alternative Providers |
|---|---|
gpt-4o | Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro |
gpt-4o-mini | Claude 3 Haiku, Gemini 1.5 Flash |
gpt-3.5-turbo | Llama 3.1 70B, Mixtral 8x7B |
claude-3-opus | GPT-4 Turbo, Gemini 1.5 Pro |
Model equivalency considers: - Context window size - Capabilities (vision,
function calling) - Performance characteristics - Output quality
Advanced Routing Rules
Conditional Routing
Route based on request characteristics:Time-Based Routing
Route differently based on time of day or day of week:Budget-Based Routing
Set spending caps and automatically switch to cheaper providers:Monitoring Routing Performance
View Routing Analytics
Access your dashboard to see:- Cost savings from intelligent routing
- Latency improvements by strategy
- Failover statistics and success rates
- Provider health and uptime metrics
Routing Metrics API
Best Practices
Cost Optimization
Cost Optimization
- Use
cost-optimizedfor batch processing - Set budget limits to prevent overruns
- Monitor savings in dashboard
- Consider model equivalency tradeoffs
Performance Optimization
Performance Optimization
- Use
latency-optimizedfor user-facing features - Specify user regions for better routing - Enable caching for repeated queries - Monitor latency metrics
Reliability
Reliability
- Always enable failover for production - Use
load-balancedfor high-traffic applications - Set appropriate timeout values - Monitor failover rates
Testing Strategies
Testing Strategies
Configuration Examples
Startup Cost Optimization
Enterprise High Availability
Global Application
Troubleshooting
Routing not working as expected
Routing not working as expected
Check response headers for routing information:
Not seeing cost savings
Not seeing cost savings
- Verify cost-optimized strategy is enabled - Check if model equivalents are available - Review provider pricing in dashboard - Ensure failover not overriding strategy
Higher latency than expected
Higher latency than expected
- Switch to latency-optimized strategy - Check provider health status - Verify user region is correct - Consider geographic proximity to providers
Too many failovers
Too many failovers
- Check provider health dashboard
- Increase timeout values
- Review error logs for patterns
- Consider different provider priority