Z.AI: GLM 4.6
z-ai/glm-4.6
GLM-4.6 is the flagship model from Zhishen, their latest. It has a total of 355 billion parameters and an active parameter of 32 billion. GLM-4.6 has surpassed all core capabilities of GLM-4.5, specifically: Advanced coding ability: In both public benchmarks and real programming tasks, GLM-4.6's coding ability matches Claude Sonnet 4, making it the best Coding model domestically known. Context length: The context window has increased from 128K to 200K, allowing it to handle longer code and intelligent agent tasks. Inference ability: There has been an improvement in inference capabilities, and it now supports calling tools during the inference process. Search capability: Enhanced the model's performance in tool calling and search intelligent agents, performing better within the intelligent agent framework. Writing ability: In terms of style, readability, and role-playing scenarios, it more closely aligns with human preferences. Multilingual translation: Further strengthened the model's ability to handle cross-linguistic tasks. If you need any further assistance or have more text to translate, feel free to let me know!
Byz-aiInput typeOutput type
Recent activity on GLM 4.6
Tokens processed per day
Thoughput
(tokens/s)
ProvidersMin (tokens/s)Max (tokens/s)Avg (tokens/s)
Z.AI47.1287.1562.31
First Token Latency
(ms)
ProvidersMin (ms)Max (ms)Avg (ms)
Z.AI517903758.70
Providers for GLM 4.6
ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.
Latency
0.69
s
Throughput
36.28
tps
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Tiered pricing
0 <= Input < 32k
Input
$ 0.35
/ M tokens
Output
$ 1.54
/ M tokens
Cache read
$ 0.07
/ M tokens
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
-
Model limitation
Context
200.00K
Max output
128.00K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
-
presence_penalty
-
seed
-
logit_bias
-
logprobs
-
top_logprobs
-
response_format
-
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Sample code and API for GLM 4.6
ZenMux normalizes requests and responses across providers for you.
OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL
python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="z-ai/glm-4.6",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)