Vision & Image Analysis

Overview

Visca AI Gateway provides unified access to vision-capable AI models from multiple providers, enabling image analysis, OCR, visual Q&A, and more through a single API.

Multi-Provider Support

Access GPT-4 Vision, Claude 3, Gemini Vision, and more

Automatic Routing

Route to the best model based on image type and task

Image Optimization

Automatic image resizing and format conversion

Cost Optimization

Intelligent caching and token-efficient processing

Supported Models

OpenAI
Anthropic
Google

GPT-4 Vision (gpt-4-vision-preview) - GPT-4o (gpt-4o) - GPT-4o Mini (gpt-4o-mini) Best for: General visual understanding, OCR, detailed analysis

Basic Usage

Send images via URL or base64:

import openai

client = openai.OpenAI(
base_url="https://gateway.visca.ai/v1",
api_key="your-api-key"
)

response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}],
max_tokens=300
)

print(response.choices[0].message.content)

Multiple Images

Analyze multiple images in a single request:

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Compare these two images and describe the differences"
            },
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image1.jpg"}
            },
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image2.jpg"}
            }
        ]
    }]
)

Image Detail Levels

Control cost and processing time with detail levels:

Low Detail
High Detail
Auto

{
    "type": "image_url",
    "image_url": {
        "url": "https://example.com/image.jpg",
        "detail": "low"  # Faster, cheaper, less detailed
    }
}

Lower resolution (512x512)
~85 tokens per image
Best for: Simple classifications, basic detection

Common Use Cases

OCR & Document Extraction

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this document. Format as markdown."},
            {"type": "image_url", "image_url": {"url": "document.jpg", "detail": "high"}}
        ]
    }]
)

Tips:

Use “high” detail for documents with small text
Consider Claude 3 for complex document layouts
Request structured output (JSON, markdown) for easier parsing

Product Image Analysis

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Analyze this product image and provide:
                1. Product category
                2. Key features visible
                3. Condition assessment
                4. Estimated retail value
                Return as JSON."""
            },
            {"type": "image_url", "image_url": {"url": "product.jpg"}}
        ]
    }],
    response_format={"type": "json_object"}
)

Content Moderation

response = client.chat.completions.create(
    model="claude-3-sonnet",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": """Analyze this image for:
                - Inappropriate content
                - Violent imagery
                - Adult content
                Return: safe/unsafe with reasoning"""
            },
            {"type": "image_url", "image_url": {"url": "user-upload.jpg", "detail": "low"}}
        ]
    }]
)

Note: Use low detail for faster moderation screening.

Visual Q&A

response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What type of animal is this?"},
                {"type": "image_url", "image_url": {"url": "animal.jpg"}}
            ]
        },
        {
            "role": "assistant",
            "content": "This is a golden retriever dog."
        },
        {
            "role": "user",
            "content": "What is the dog doing?"
        }
    ]
)

Image Formats

Supported formats:

PNG (.png)
JPEG (.jpg, .jpeg)
WebP (.webp)
GIF (.gif) - non-animated only

Maximum image size: 20MB Maximum resolution: 2048x2048 (varies by model)

Cost Optimization

Use Appropriate Detail Level

Set detail: "low" for simple tasks to reduce token usage by 90%

Resize Large Images

Pre-process images to optimal size before sending

Enable Response Caching

Cache identical image analysis results for 24 hours

Batch Processing

Process multiple images in parallel requests with rate limiting

Automatic Model Selection

Let the gateway choose the best vision model:

response = client.chat.completions.create(
    model="vision-auto",  # Gateway selects optimal model
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this receipt"},
            {"type": "image_url", "image_url": {"url": "receipt.jpg"}}
        ]
    }],
    extra_body={
        "routing": {
            "strategy": "cost-optimized",  # or "latency-optimized"
            "fallback": true
        }
    }
)

Error Handling

try:
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this"},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]
        }]
    )
except openai.BadRequestError as e:
    if "image_parse_error" in str(e):
        print("Image format not supported or corrupted")
    elif "content_policy_violation" in str(e):
        print("Image violates content policy")
    else:
        print(f"Request error: {e}")
except openai.RateLimitError:
    print("Rate limit exceeded, retry with backoff")

Best Practices

Image Quality

Use clear, well-lit images - Avoid blurry or low-resolution images - Ensure text is readable (for OCR) - Compress images to reduce latency

Prompting

Be specific about what to extract - Request structured output (JSON, lists) - Provide context about the image - Use few-shot examples for complex tasks

Performance

Cache frequently analyzed images - Use webhooks for async processing - Process images in parallel - Monitor token usage

Security

Validate image sources - Scan for malware before processing - Don’t send sensitive PII in images - Use private URLs with expiration

Next Steps

Image Generation

Generate images with DALL-E, Stable Diffusion, and more

Multimodal Chat

Combine text, images, and audio in conversations

Custom Models

Integrate your own vision models

Rate Limits

Vision-specific rate limits and quotas

Get Started

Features

Integrations

Guides

Resources

Overview

Multi-Provider Support

Automatic Routing

Image Optimization

Cost Optimization

Supported Models

Basic Usage

Multiple Images

Image Detail Levels

Common Use Cases

Image Formats

Cost Optimization

Automatic Model Selection

Error Handling

Best Practices

Image Quality

Prompting

Performance

Security

Next Steps

Image Generation

Multimodal Chat

Custom Models

Rate Limits

Get Started

Features

Integrations

Guides

Resources

​Overview

Multi-Provider Support

Automatic Routing

Image Optimization

Cost Optimization

​Supported Models

​Basic Usage

​Multiple Images

​Image Detail Levels

​Common Use Cases

​Image Formats

​Cost Optimization

​Automatic Model Selection

​Error Handling

​Best Practices

Image Quality

Prompting

Performance

Security

​Next Steps

Image Generation

Multimodal Chat

Custom Models

Rate Limits

Overview

Supported Models

Basic Usage

Multiple Images

Image Detail Levels

Common Use Cases

Image Formats

Cost Optimization

Automatic Model Selection

Error Handling

Best Practices

Next Steps