Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

OpenAI: GPT-4.1

openai/gpt-4.1

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

ByopenaiInput typeOutput type

Recent activity on GPT-4.1

Tokens processed per day

Thoughput

(tokens/s)

Providers	Min (tokens/s)	Max (tokens/s)	Avg (tokens/s)
OpenAI	6.46	90.27	26.16
Azure	5.75	70.44	21.97

First Token Latency

(ms)

Providers	Min (ms)	Max (ms)	Avg (ms)
OpenAI	685	1139	923.47
Azure	1165	28649	3552.50

Providers for GPT-4.1

ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

OpenAI

Latency

Throughput

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 2

/ M tokens

Output

$ 8

/ M tokens

Cache read

$ 0.5

/ M tokens

Cache write 5m

Cache write 1h

Cache write

Web search

$ 0.025

/ request

Model limitation

Context

1.05M

Max output

32.77K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Status Page

status page

Azure

Latency

Throughput

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 2

/ M tokens

Output

$ 8

/ M tokens

Cache read

$ 0.5

/ M tokens

Cache write 5m

Cache write 1h

Cache write

Web search

Model limitation

Context

1.05M

Max output

32.77K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

30 day retention

Moderation

Responsibility of developer

Status Page

status page

OpenAI

Latency

1.19

Throughput

33.23

tps

Uptime

97.96

Recent uptime

Oct 10,2025 - 3 PM97.96%

Price

Input

$ 2

/ M tokens

Output

$ 8

/ M tokens

Cache read

$ 0.5

/ M tokens

Cache write 5m

Cache write 1h

Cache write

Web search

$ 0.025

/ request

Model limitation

Context

1.05M

Max output

32.77K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Status Page

status page

Sample code and API for GPT-4.1

ZenMux normalizes requests and responses across providers for you.

OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL

python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4.1",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)