Skip to main content

Overview

Visca AI Gateway provides unified access to vision-capable AI models from multiple providers, enabling image analysis, OCR, visual Q&A, and more through a single API.

Multi-Provider Support

Access GPT-4 Vision, Claude 3, Gemini Vision, and more

Automatic Routing

Route to the best model based on image type and task

Image Optimization

Automatic image resizing and format conversion

Cost Optimization

Intelligent caching and token-efficient processing

Supported Models

  • OpenAI
  • Anthropic
  • Google
  • GPT-4 Vision (gpt-4-vision-preview) - GPT-4o (gpt-4o) - GPT-4o Mini (gpt-4o-mini) Best for: General visual understanding, OCR, detailed analysis

Basic Usage

Send images via URL or base64:
import openai

client = openai.OpenAI(
base_url="https://gateway.visca.ai/v1",
api_key="your-api-key"
)

response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}],
max_tokens=300
)

print(response.choices[0].message.content)

Multiple Images

Analyze multiple images in a single request:
response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Compare these two images and describe the differences"
            },
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image1.jpg"}
            },
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image2.jpg"}
            }
        ]
    }]
)

Image Detail Levels

Control cost and processing time with detail levels:
  • Low Detail
  • High Detail
  • Auto
{
    "type": "image_url",
    "image_url": {
        "url": "https://example.com/image.jpg",
        "detail": "low"  # Faster, cheaper, less detailed
    }
}
  • Lower resolution (512x512)
  • ~85 tokens per image
  • Best for: Simple classifications, basic detection

Common Use Cases

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this document. Format as markdown."},
            {"type": "image_url", "image_url": {"url": "document.jpg", "detail": "high"}}
        ]
    }]
)
Tips:
  • Use “high” detail for documents with small text
  • Consider Claude 3 for complex document layouts
  • Request structured output (JSON, markdown) for easier parsing
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Analyze this product image and provide:
                1. Product category
                2. Key features visible
                3. Condition assessment
                4. Estimated retail value
                Return as JSON."""
            },
            {"type": "image_url", "image_url": {"url": "product.jpg"}}
        ]
    }],
    response_format={"type": "json_object"}
)
response = client.chat.completions.create(
    model="claude-3-sonnet",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Analyze this image for:
                - Inappropriate content
                - Violent imagery
                - Adult content
                Return: safe/unsafe with reasoning"""
            },
            {"type": "image_url", "image_url": {"url": "user-upload.jpg", "detail": "low"}}
        ]
    }]
)
Note: Use low detail for faster moderation screening.
response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What type of animal is this?"},
                {"type": "image_url", "image_url": {"url": "animal.jpg"}}
            ]
        },
        {
            "role": "assistant",
            "content": "This is a golden retriever dog."
        },
        {
            "role": "user",
            "content": "What is the dog doing?"
        }
    ]
)

Image Formats

Supported formats:
  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • WebP (.webp)
  • GIF (.gif) - non-animated only
Maximum image size: 20MB Maximum resolution: 2048x2048 (varies by model)

Cost Optimization

1

Use Appropriate Detail Level

Set detail: "low" for simple tasks to reduce token usage by 90%
2

Resize Large Images

Pre-process images to optimal size before sending
3

Enable Response Caching

Cache identical image analysis results for 24 hours
4

Batch Processing

Process multiple images in parallel requests with rate limiting

Automatic Model Selection

Let the gateway choose the best vision model:
response = client.chat.completions.create(
    model="vision-auto",  # Gateway selects optimal model
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this receipt"},
            {"type": "image_url", "image_url": {"url": "receipt.jpg"}}
        ]
    }],
    extra_body={
        "routing": {
            "strategy": "cost-optimized",  # or "latency-optimized"
            "fallback": true
        }
    }
)

Error Handling

try:
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this"},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]
        }]
    )
except openai.BadRequestError as e:
    if "image_parse_error" in str(e):
        print("Image format not supported or corrupted")
    elif "content_policy_violation" in str(e):
        print("Image violates content policy")
    else:
        print(f"Request error: {e}")
except openai.RateLimitError:
    print("Rate limit exceeded, retry with backoff")

Best Practices

Image Quality

  • Use clear, well-lit images - Avoid blurry or low-resolution images - Ensure text is readable (for OCR) - Compress images to reduce latency

Prompting

  • Be specific about what to extract - Request structured output (JSON, lists) - Provide context about the image - Use few-shot examples for complex tasks

Performance

  • Cache frequently analyzed images - Use webhooks for async processing - Process images in parallel - Monitor token usage

Security

  • Validate image sources - Scan for malware before processing - Don’t send sensitive PII in images - Use private URLs with expiration

Next Steps