Documentation Index Fetch the complete documentation index at: https://docs.gateway.visca.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Visca AI Gateway provides native support for the Anthropic Messages API, allowing you to use Claude models with their native format while benefiting from gateway features like routing, caching, and analytics.
Endpoint
POST https://gateway.visca.ai/v1/messages
Basic Usage
import anthropic
client = anthropic.Anthropic(
base_url = "https://gateway.visca.ai/v1" ,
api_key = "your-visca-api-key" # Not your Anthropic key
)
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
messages = [{
"role" : "user" ,
"content" : "Hello, Claude!"
}]
)
print (message.content[ 0 ].text)
Supported Models
All Claude models available:
claude-3-5-sonnet-20241022 (latest, recommended)
claude-3-5-haiku-20241022
claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307
Streaming
Stream responses for real-time output:
with client.messages.stream(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
messages = [{ "role" : "user" , "content" : "Write a story" }]
) as stream:
for text in stream.text_stream:
print (text, end = "" , flush = True )
System Messages
Claude uses a separate system parameter:
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
system = "You are a helpful AI assistant specializing in Python programming." ,
messages = [{
"role" : "user" ,
"content" : "How do I read a CSV file?"
}]
)
Vision
Send images with Claude 3 models:
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
messages = [{
"role" : "user" ,
"content" : [
{
"type" : "image" ,
"source" : {
"type" : "url" ,
"url" : "https://example.com/image.jpg"
}
},
{
"type" : "text" ,
"text" : "What's in this image?"
}
]
}]
)
Prompt Caching
Enable prompt caching to save costs on repeated content:
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
system = [{
"type" : "text" ,
"text" : "You are an AI assistant..." ,
"cache_control" : { "type" : "ephemeral" }
}],
messages = [{
"role" : "user" ,
"content" : "Hello!"
}]
)
# Check cache usage
print ( f "Cache read tokens: { message.usage.cache_read_input_tokens } " )
print ( f "Cache creation tokens: { message.usage.cache_creation_input_tokens } " )
Define tools for Claude to use:
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
tools = [{
"name" : "get_weather" ,
"description" : "Get the current weather for a location" ,
"input_schema" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "City name"
}
},
"required" : [ "location" ]
}
}],
messages = [{
"role" : "user" ,
"content" : "What's the weather in San Francisco?"
}]
)
# Handle tool use
if message.stop_reason == "tool_use" :
tool_use = next (block for block in message.content if block.type == "tool_use" )
print ( f "Claude wants to call: { tool_use.name } " )
print ( f "With arguments: { tool_use.input } " )
Gateway Features
The Anthropic endpoint supports all gateway features:
Routing
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
messages = [{ "role" : "user" , "content" : "Hello" }],
metadata = {
"routing" : {
"strategy" : "latency-optimized" ,
"fallback" : True
}
}
)
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
messages = [{ "role" : "user" , "content" : "Hello" }],
metadata = {
"user_id" : "user-123" ,
"session_id" : "sess-456"
}
)
Cost Tracking
All requests through the Anthropic endpoint are tracked in analytics with accurate cost attribution.
Differences from Native Anthropic API
Feature Native Anthropic Via Gateway Endpoint https://api.anthropic.comhttps://gateway.visca.aiAPI Key Anthropic API key Visca gateway key Analytics Limited Full dashboard Routing Single provider Multi-provider fallback Caching Prompt caching + Response caching Rate Limits Anthropic limits Configurable limits
Best Practices
Use System Messages Define behavior with system parameter, not in first user message
Enable Caching Cache large system prompts and documents to reduce costs by 90%
Stream When Possible Use streaming for better user experience on long responses
Handle Tool Use Implement proper tool use loops for agentic workflows
Error Handling
from anthropic import APIError, RateLimitError
try :
message = client.messages.create(
model = "claude-3-5-sonnet-20241022" ,
max_tokens = 1024 ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
except RateLimitError:
print ( "Rate limit exceeded" )
except APIError as e:
print ( f "API error: { e.status_code } - { e.message } " )
Migration from Direct Anthropic
Change Base URL
Update base_url to https://gateway.visca.ai/v1
Update API Key
Use your Visca gateway API key instead of Anthropic key
Test Requests
Verify all requests work as expected
Enable Features
Add routing, metadata, and analytics as needed
Next Steps
OpenAI Endpoint Use OpenAI format with cross-provider routing
Model Routing Route between Claude and other models
Prompt Caching Advanced caching strategies
Tool Use Guide Build agents with Claude’s tool use