Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

OpenAI: o4 Mini

openai/o4-mini

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute.

ByopenaiInput typeOutput type

Recent activity on o4 Mini

Tokens processed per day

Thoughput

(tokens/s)

Providers	Min (tokens/s)	Max (tokens/s)	Avg (tokens/s)
Azure	15.65	138.49	65.33
OpenAI	36.37	100.14	48.77

First Token Latency

(ms)

Providers	Min (ms)	Max (ms)	Avg (ms)
Azure	1945	223821	30199.85
OpenAI	1677	93499	7973.91

Providers for o4 Mini

ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

OpenAI

Latency

1.75

Throughput

14.2

tps

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 1.1

/ M tokens

Output

$ 4.4

/ M tokens

Cache read

$ 0.275

/ M tokens

Cache write 5m

Cache write 1h

Cache write

Web search

Model limitation

Context

200.00K

Max output

100.00K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Status Page

status page

Azure

Latency

Throughput

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 1.1

/ M tokens

Output

$ 4.4

/ M tokens

Cache read

$ 0.28

/ M tokens

Cache write 5m

Cache write 1h

Cache write

Web search

Model limitation

Context

200.00K

Max output

100.00K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

30 day retention

Moderation

Responsibility of developer

Status Page

status page

Sample code and API for o4 Mini

ZenMux normalizes requests and responses across providers for you.

OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL

python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/o4-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)