OpenAI: o4 Mini
openai/o4-mini
OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute.
ByopenaiInput typeOutput type
Recent activity on o4 Mini
Tokens processed per day
Thoughput
(tokens/s)
ProvidersMin (tokens/s)Max (tokens/s)Avg (tokens/s)
Azure15.65138.4965.33
OpenAI36.37100.1448.77
First Token Latency
(ms)
ProvidersMin (ms)Max (ms)Avg (ms)
Azure194522382130199.85
OpenAI1677934997973.91
Providers for o4 Mini
ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.
Latency
1.75
s
Throughput
14.2
tps
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 1.1
/ M tokens
Output
$ 4.4
/ M tokens
Cache read
$ 0.275
/ M tokens
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
-
Model limitation
Context
200.00K
Max output
100.00K
Supported Parameters
max_completion_tokens
temperature
-
top_p
-
frequency_penalty
-
presence_penalty
-
seed
logit_bias
-
logprobs
-
top_logprobs
-
response_format
stop
-
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Status Page
status page
Latency
-
Throughput
-
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 1.1
/ M tokens
Output
$ 4.4
/ M tokens
Cache read
$ 0.28
/ M tokens
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
-
Model limitation
Context
200.00K
Max output
100.00K
Supported Parameters
max_completion_tokens
temperature
-
top_p
-
frequency_penalty
-
presence_penalty
-
seed
logit_bias
-
logprobs
-
top_logprobs
-
response_format
stop
-
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
30 day retention
Moderation
Responsibility of developer
Status Page
status page
Sample code and API for o4 Mini
ZenMux normalizes requests and responses across providers for you.
OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL
python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/o4-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)