Overview
Extend Visca AI Gateway by integrating your own AI models, self-hosted providers, or custom endpoints alongside managed providers like OpenAI and Anthropic.Bring Your Own Models
Host your own fine-tuned or custom models
Unified Interface
Access all models through one API
Intelligent Routing
Route between managed and custom providers
Cost Control
Use custom models for cost-sensitive workloads
Supported Custom Providers
- Self-hosted OpenAI-compatible APIs (vLLM, Text Generation Inference)
- Local models (Ollama, LocalAI, LM Studio)
- Custom endpoints (Your own model API)
- Other cloud providers (Together AI, Replicate, Hugging Face)
Adding a Custom Provider
1
Navigate to Providers
Go to Settings → Custom Providers in your dashboard
2
Add Provider
Click “Add Custom Provider” and enter provider details
3
Configure Endpoint
Enter base URL, authentication, and model mappings
4
Test Connection
Run a test request to verify configuration
5
Enable Routing
Add custom models to your routing configuration
Configuration
Using Custom Models
Once configured, use custom models like any other:Routing with Custom Models
Mix custom and managed providers:Request/Response Transforms
Transform requests for non-OpenAI-compatible APIs:Authentication Methods
- API Key
- Custom Headers
- OAuth 2.0
Health Checks
Configure automatic health monitoring:Load Balancing
Distribute load across multiple custom endpoints:Best Practices
Monitoring & Observability
Monitoring & Observability
- Enable health checks for all custom providers - Set appropriate timeouts (custom models may be slower) - Monitor error rates and latency - Set up alerts for provider downtime
Cost Management
Cost Management
- Configure accurate cost_per_token for billing - Use custom models for high-volume, cost-sensitive workloads - Fall back to managed providers for mission-critical requests - Monitor actual costs vs. configured costs
Security
Security
- Store API keys in environment variables - Use HTTPS for all custom endpoints - Implement rate limiting on custom providers - Regularly rotate credentials
Performance
Performance
- Deploy custom models close to gateway (low latency) - Use connection pooling for custom endpoints - Configure appropriate timeout values - Load balance across multiple instances
Example Integrations
- vLLM
- Ollama
- Hugging Face
Troubleshooting
Connection Errors
Connection Errors
- Verify base_url is accessible from gateway - Check firewall rules and network connectivity - Ensure custom provider is running - Test with curl/postman first
Authentication Failures
Authentication Failures
- Verify API key is correct - Check header format matches provider requirements - Ensure OAuth tokens are not expired - Review provider-specific auth documentation
Response Format Issues
Response Format Issues
- Configure correct response transform - Check provider returns expected format - Enable debug logging to inspect raw responses - Validate against OpenAI API spec