Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

Z.AI: GLM 4.6

z-ai/glm-4.6

GLM-4.6 is the flagship model from Zhishen, their latest. It has a total of 355 billion parameters and an active parameter of 32 billion. GLM-4.6 has surpassed all core capabilities of GLM-4.5, specifically: Advanced coding ability: In both public benchmarks and real programming tasks, GLM-4.6's coding ability matches Claude Sonnet 4, making it the best Coding model domestically known. Context length: The context window has increased from 128K to 200K, allowing it to handle longer code and intelligent agent tasks. Inference ability: There has been an improvement in inference capabilities, and it now supports calling tools during the inference process. Search capability: Enhanced the model's performance in tool calling and search intelligent agents, performing better within the intelligent agent framework. Writing ability: In terms of style, readability, and role-playing scenarios, it more closely aligns with human preferences. Multilingual translation: Further strengthened the model's ability to handle cross-linguistic tasks. If you need any further assistance or have more text to translate, feel free to let me know!

Byz-aiInput typeOutput type

Recent activity on GLM 4.6

Tokens processed per day

Thoughput

(tokens/s)

Providers	Min (tokens/s)	Max (tokens/s)	Avg (tokens/s)
Z.AI	47.12	87.15	62.31

First Token Latency

(ms)

Providers	Min (ms)	Max (ms)	Avg (ms)
Z.AI	517	903	758.70

Providers for GLM 4.6

ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

Z.AI

Latency

0.69

Throughput

36.28

tps

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Tiered pricing

0 <= Input < 32k

Input

$ 0.35

/ M tokens

Output

$ 1.54

/ M tokens

Cache read

$ 0.07

/ M tokens

Cache write 5m

Cache write 1h

Cache write

Web search

Model limitation

Context

200.00K

Max output

128.00K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Sample code and API for GLM 4.6

ZenMux normalizes requests and responses across providers for you.

OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL

python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="z-ai/glm-4.6",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)