Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

Qwen: Qwen3-Max

qwen/qwen3-max

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.

ByqwenInput typeOutput type

Recent activity on Qwen3-Max

Tokens processed per day

Thoughput

(tokens/s)

Providers	Min (tokens/s)	Max (tokens/s)	Avg (tokens/s)
Alibaba Cloud	6.97	31.09	12.74

First Token Latency

(ms)

Providers	Min (ms)	Max (ms)	Avg (ms)
Alibaba Cloud	631	998	854.53

Providers for Qwen3-Max

ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

Alibaba Cloud

Latency

0.94

Throughput

24.23

tps

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Tiered pricing

0 <= Input < 32k

Input

$ 1.2

/ M tokens

Output

$ 6

/ M tokens

Cache read

$ 0.24

/ M tokens

Cache write 5m

$ 1.5

/ M tokens

Cache write 1h

Cache write

Web search

Model limitation

Context

256.00K

Max output

32.00K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Status Page

status page

Sample code and API for Qwen3-Max

ZenMux normalizes requests and responses across providers for you.

OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL

python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="qwen/qwen3-max",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)