OpenAI: GPT-4.1
openai/gpt-4.1
GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.
ByopenaiInput typeOutput type
Recent activity on GPT-4.1
Tokens processed per day
Thoughput
(tokens/s)
ProvidersMin (tokens/s)Max (tokens/s)Avg (tokens/s)
OpenAI6.4690.2726.16
Azure5.7570.4421.97
First Token Latency
(ms)
ProvidersMin (ms)Max (ms)Avg (ms)
OpenAI6851139923.47
Azure1165286493552.50
Providers for GPT-4.1
ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.
Latency
-
Throughput
-
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 2
/ M tokens
Output
$ 8
/ M tokens
Cache read
$ 0.5
/ M tokens
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
$ 0.025
/ request
Model limitation
Context
1.05M
Max output
32.77K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
logit_bias
logprobs
top_logprobs
response_format
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Status Page
status page
Latency
-
Throughput
-
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 2
/ M tokens
Output
$ 8
/ M tokens
Cache read
$ 0.5
/ M tokens
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
-
Model limitation
Context
1.05M
Max output
32.77K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
logit_bias
logprobs
top_logprobs
response_format
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
30 day retention
Moderation
Responsibility of developer
Status Page
status page
Latency
1.19
s
Throughput
33.23
tps
Uptime
97.96
%
Recent uptime
Oct 10,2025 - 3 PM97.96%
Price
Input
$ 2
/ M tokens
Output
$ 8
/ M tokens
Cache read
$ 0.5
/ M tokens
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
$ 0.025
/ request
Model limitation
Context
1.05M
Max output
32.77K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
logit_bias
logprobs
top_logprobs
response_format
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Status Page
status page
Sample code and API for GPT-4.1
ZenMux normalizes requests and responses across providers for you.
OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL
python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4.1",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)