Skip to main content

Overview

Reasoning models like OpenAI’s O1 and O3 use extended thinking time to solve complex problems through chain-of-thought processing, making them ideal for mathematics, coding, scientific reasoning, and strategic planning.

Extended Thinking

Models think step-by-step before answering

Complex Problem Solving

Excel at math, logic, code, and science

Transparent Reasoning

See the model’s thinking process

Higher Accuracy

Fewer errors on complex tasks

Supported Models

Model: o3 - Latest reasoning model - Highest accuracy on complex tasks
  • Extended context window (128k tokens) - $15 / 1M input tokens Best for: Research, advanced mathematics, complex code generation

Basic Usage

import openai

client = openai.OpenAI(
base_url="https://gateway.visca.ai/v1",
api_key="your-api-key"
)

response = client.chat.completions.create(
model="o1",
messages=[{
"role": "user",
"content": """Solve this problem step by step:
A train travels from City A to City B at 60 mph,
then returns at 40 mph. What is the average speed for the entire trip?"""
}]
)

print(response.choices[0].message.content)

Reasoning Tokens

Reasoning models use additional “reasoning tokens” for internal thinking:
response = client.chat.completions.create(
    model="o1",
    messages=[{"role": "user", "content": "Complex math problem..."}]
)

# Check token usage
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Reasoning tokens: {response.usage.completion_tokens_details.reasoning_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
Reasoning tokens are counted separately and billed at a lower rate than output tokens.

Use Cases

response = client.chat.completions.create(
    model="o1",
    messages=[{
        "role": "user",
        "content": """Solve this physics problem with full explanation:
        
        A 2kg object is thrown upward with initial velocity of 20 m/s.
        Ignoring air resistance, calculate:
        1. Maximum height reached
        2. Time to reach maximum height
        3. Total time in air
        4. Velocity when it returns to starting point
        
        Show all work and formulas used."""
    }]
)
response = client.chat.completions.create(
    model="o1-mini",
    messages=[{
        "role": "user",
        "content": """Review this Python code and fix any bugs:
        
        def quicksort(arr):
            if len(arr) <= 1:
                return arr
            pivot = arr[0]
            left = [x for x in arr if x < pivot]
            right = [x for x in arr if x > pivot]
            return quicksort(left) + [pivot] + quicksort(right)
        
        What's wrong and how can it be optimized?"""
    }]
)
response = client.chat.completions.create(
    model="o3",
    messages=[{
        "role": "user",
        "content": """Analyze this chess position and suggest the best move:
        
        Position (FEN): rnbqkbnr/pppp1ppp/8/4p3/4P3/8/PPPP1PPP/RNBQKBNR w KQkq e6 0 2
        
        Provide:
        1. Best move in algebraic notation
        2. Strategic reasoning
        3. Alternative moves and why they're inferior
        4. Evaluation of the position"""
    }]
)
response = client.chat.completions.create(
    model="o1",
    messages=[{
        "role": "user",
        "content": """Analyze this dataset and provide insights:
        
        Sales data:
        Q1: $1.2M, Q2: $1.5M, Q3: $1.1M, Q4: $2.3M
        
        Marketing spend:
        Q1: $200K, Q2: $250K, Q3: $180K, Q4: $400K
        
        Calculate:
        1. ROI for each quarter
        2. Correlation between marketing and sales
        3. Forecast for Q1 next year
        4. Recommendations for marketing budget allocation"""
    }]
)

Best Practices

1

Be Explicit About Reasoning

Ask the model to “think step by step” or “show your work”
2

Provide Context

Include all relevant information and constraints
3

Structured Output

Request structured responses (numbered lists, tables)
4

Verify Results

Cross-check critical calculations and logic

Limitations

Reasoning models: - Cannot use system messages (user/assistant only) - Don’t support streaming - Don’t support function calling - Have higher latency (5-30 seconds typical) - Cost more per token than standard models

Model Selection

Choose the right reasoning model:
TaskRecommended ModelWhy
Advanced researchO3Highest accuracy, worth the cost
General math/codingO1Balanced performance and cost
Quick calculationsO1-MiniFast and affordable
Production APIsO1-MiniLower latency and cost
One-off analysisO3Best results for important decisions

Cost Optimization

# Use O1-Mini for simpler reasoning tasks
response = client.chat.completions.create(
    model="o1-mini",
    messages=[{"role": "user", "content": "Simple math problem"}]
)

# Use O1 for complex tasks
response = client.chat.completions.create(
    model="o1",
    messages=[{"role": "user", "content": "Complex multi-step analysis"}]
)

# Enable automatic model selection
response = client.chat.completions.create(
    model="reasoning-auto",  # Gateway selects O1-Mini or O1 or O3
    messages=[{"role": "user", "content": "Your problem"}],
    extra_body={
        "routing": {
            "strategy": "cost-optimized"
        }
    }
)

Monitoring Reasoning

Track reasoning performance:
response = client.chat.completions.create(
    model="o1",
    messages=[{"role": "user", "content": "Complex problem"}]
)

# Log reasoning metrics
print(f"Reasoning tokens: {response.usage.completion_tokens_details.reasoning_tokens}")
print(f"Reasoning time: {response.usage.completion_tokens_details.reasoning_time_ms}ms")
print(f"Accuracy confidence: {response.choices[0].finish_reason}")

Next Steps

Function Calling

Use standard models with function calling for structured tasks

Custom Providers

Integrate your own reasoning models

Analytics

Track reasoning model performance and costs

Rate Limits

Understanding reasoning model rate limits