Custom Providers

Overview

Extend Visca AI Gateway by integrating your own AI models, self-hosted providers, or custom endpoints alongside managed providers like OpenAI and Anthropic.

Bring Your Own Models

Host your own fine-tuned or custom models

Unified Interface

Access all models through one API

Intelligent Routing

Route between managed and custom providers

Cost Control

Use custom models for cost-sensitive workloads

Supported Custom Providers

Self-hosted OpenAI-compatible APIs (vLLM, Text Generation Inference)
Local models (Ollama, LocalAI, LM Studio)
Custom endpoints (Your own model API)
Other cloud providers (Together AI, Replicate, Hugging Face)

Adding a Custom Provider

Navigate to Providers

Go to Settings → Custom Providers in your dashboard

Add Provider

Click “Add Custom Provider” and enter provider details

Configure Endpoint

Enter base URL, authentication, and model mappings

Test Connection

Run a test request to verify configuration

Enable Routing

Add custom models to your routing configuration

Configuration

{
  "name": "my-vllm-server",
  "type": "openai-compatible",
  "base_url": "https://my-vllm.example.com/v1",
  "api_key": "your-api-key",
  "models": [
    {
      "id": "llama-3-70b",
      "name": "Llama 3 70B",
      "context_window": 8192,
      "cost_per_1k_input": 0.0001,
      "cost_per_1k_output": 0.0002
    }
  ]
}

Using Custom Models

Once configured, use custom models like any other:

import openai

client = openai.OpenAI(
    base_url="https://gateway.visca.ai/v1",
    api_key="your-api-key"
)

# Use your custom model
response = client.chat.completions.create(
    model="llama-3-70b",  # Your custom model ID
    messages=[{"role": "user", "content": "Hello!"}]
)

Routing with Custom Models

Mix custom and managed providers:

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Analyze this data..."}],
    extra_body={
        "routing": {
            "strategy": "cost-optimized",
            "providers": ["my-vllm-server", "openai", "anthropic"],
            "fallback": True
        }
    }
)

Request/Response Transforms

Transform requests for non-OpenAI-compatible APIs:

def transform_request(openai_request):
    return {
        "prompt": openai_request["messages"][-1]["content"],
        "max_tokens": openai_request.get("max_tokens", 100),
        "temperature": openai_request.get("temperature", 0.7)
    }

Authentication Methods

API Key
Custom Headers
OAuth 2.0

{
  "auth_type": "bearer",
  "api_key": "${API_KEY}"
}

{
  "auth_type": "custom_headers",
  "headers": {
    "X-API-Key": "${API_KEY}",
    "X-Organization": "org-123"
  }
}

{
  "auth_type": "oauth2",
  "client_id": "${CLIENT_ID}",
  "client_secret": "${CLIENT_SECRET}",
  "token_url": "https://auth.example.com/token"
}

Health Checks

Configure automatic health monitoring:

{
	"health_check": {
		"enabled": true,
		"endpoint": "/health",
		"interval_seconds": 60,
		"timeout_seconds": 5,
		"failure_threshold": 3
	}
}

Load Balancing

Distribute load across multiple custom endpoints:

{
	"name": "llama-3-cluster",
	"type": "openai-compatible",
	"endpoints": [
		{
			"base_url": "https://llama-1.example.com/v1",
			"weight": 1
		},
		{
			"base_url": "https://llama-2.example.com/v1",
			"weight": 1
		},
		{
			"base_url": "https://llama-3.example.com/v1",
			"weight": 2
		}
	]
}

Best Practices

Monitoring & Observability

Enable health checks for all custom providers - Set appropriate timeouts (custom models may be slower) - Monitor error rates and latency - Set up alerts for provider downtime

Cost Management

Configure accurate cost_per_token for billing - Use custom models for high-volume, cost-sensitive workloads - Fall back to managed providers for mission-critical requests - Monitor actual costs vs. configured costs

Security

Store API keys in environment variables - Use HTTPS for all custom endpoints - Implement rate limiting on custom providers - Regularly rotate credentials

Performance

Deploy custom models close to gateway (low latency) - Use connection pooling for custom endpoints - Configure appropriate timeout values - Load balance across multiple instances

Example Integrations

vLLM
Ollama
Hugging Face

# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-2-70b-hf \
  --host 0.0.0.0 \
  --port 8000

{
  "name": "vllm-llama2",
  "type": "openai-compatible",
  "base_url": "http://localhost:8000/v1",
  "models": [{"id": "meta-llama/Llama-2-70b-hf"}]
}

# Start Ollama
ollama serve
ollama pull llama2

{
  "name": "ollama-local",
  "type": "ollama",
  "base_url": "http://localhost:11434",
  "models": [{"id": "llama2"}]
}

{
  "name": "huggingface",
  "type": "custom",
  "base_url": "https://api-inference.huggingface.co/models",
  "headers": {
    "Authorization": "Bearer ${HF_API_KEY}"
  },
  "models": [
    {
      "id": "mistralai/Mixtral-8x7B-Instruct-v0.1",
      "endpoint_suffix": "/mistralai/Mixtral-8x7B-Instruct-v0.1"
    }
  ]
}

Troubleshooting

Connection Errors

Verify base_url is accessible from gateway - Check firewall rules and network connectivity - Ensure custom provider is running - Test with curl/postman first

Authentication Failures

Verify API key is correct - Check header format matches provider requirements - Ensure OAuth tokens are not expired - Review provider-specific auth documentation

Response Format Issues

Configure correct response transform - Check provider returns expected format - Enable debug logging to inspect raw responses - Validate against OpenAI API spec

Next Steps

Model Routing

Configure intelligent routing with custom models

Load Balancing

Distribute load across providers

Monitoring

Monitor custom provider performance

Cost Tracking

Track costs for custom models

Get Started

Features

Integrations

Guides

Resources

Overview

Bring Your Own Models

Unified Interface

Intelligent Routing

Cost Control

Supported Custom Providers

Adding a Custom Provider

Configuration

Using Custom Models

Routing with Custom Models

Request/Response Transforms

Authentication Methods

Health Checks

Load Balancing

Best Practices

Example Integrations

Troubleshooting

Next Steps

Model Routing

Load Balancing

Monitoring

Cost Tracking

Get Started

Features

Integrations

Guides

Resources

​Overview

Bring Your Own Models

Unified Interface

Intelligent Routing

Cost Control

​Supported Custom Providers

​Adding a Custom Provider

​Configuration

​Using Custom Models

​Routing with Custom Models

​Request/Response Transforms

​Authentication Methods

​Health Checks

​Load Balancing

​Best Practices

​Example Integrations

​Troubleshooting

​Next Steps

Model Routing

Load Balancing

Monitoring

Cost Tracking

Overview

Supported Custom Providers

Adding a Custom Provider

Configuration

Using Custom Models

Routing with Custom Models

Request/Response Transforms

Authentication Methods

Health Checks

Load Balancing

Best Practices

Example Integrations

Troubleshooting

Next Steps