Overview
Extend Visca AI Gateway by integrating your own AI models, self-hosted providers, or custom endpoints alongside managed providers like OpenAI and Anthropic.Bring Your Own Models
Host your own fine-tuned or custom models
Unified Interface
Access all models through one API
Intelligent Routing
Route between managed and custom providers
Cost Control
Use custom models for cost-sensitive workloads
Supported Custom Providers
- Self-hosted OpenAI-compatible APIs (vLLM, Text Generation Inference)
- Local models (Ollama, LocalAI, LM Studio)
- Custom endpoints (Your own model API)
- Other cloud providers (Together AI, Replicate, Hugging Face)
Adding a Custom Provider
Configuration
Using Custom Models
Once configured, use custom models like any other:Routing with Custom Models
Mix custom and managed providers:Request/Response Transforms
Transform requests for non-OpenAI-compatible APIs:Authentication Methods
- API Key
- Custom Headers
- OAuth 2.0
Health Checks
Configure automatic health monitoring:Load Balancing
Distribute load across multiple custom endpoints:Best Practices
Monitoring & Observability
Monitoring & Observability
- Enable health checks for all custom providers - Set appropriate timeouts (custom models may be slower) - Monitor error rates and latency - Set up alerts for provider downtime
Cost Management
Cost Management
- Configure accurate cost_per_token for billing - Use custom models for high-volume, cost-sensitive workloads - Fall back to managed providers for mission-critical requests - Monitor actual costs vs. configured costs
Security
Security
- Store API keys in environment variables - Use HTTPS for all custom endpoints - Implement rate limiting on custom providers - Regularly rotate credentials
Performance
Performance
- Deploy custom models close to gateway (low latency) - Use connection pooling for custom endpoints - Configure appropriate timeout values - Load balance across multiple instances
Example Integrations
- vLLM
- Ollama
- Hugging Face
Troubleshooting
Connection Errors
Connection Errors
- Verify base_url is accessible from gateway - Check firewall rules and network connectivity - Ensure custom provider is running - Test with curl/postman first
Authentication Failures
Authentication Failures
- Verify API key is correct - Check header format matches provider requirements - Ensure OAuth tokens are not expired - Review provider-specific auth documentation
Response Format Issues
Response Format Issues
- Configure correct response transform - Check provider returns expected format - Enable debug logging to inspect raw responses - Validate against OpenAI API spec
Next Steps
Model Routing
Configure intelligent routing with custom models
Load Balancing
Distribute load across providers
Monitoring
Monitor custom provider performance
Cost Tracking
Track costs for custom models