MoonshotAI: Kimi K2 0711
moonshotai/kimi-k2-0711
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
BymoonshotaiInput typeOutput type
Recent activity on Kimi K2 0711
Tokens processed per day
Thoughput
(tokens/s)
ProvidersMin (tokens/s)Max (tokens/s)Avg (tokens/s)
Volcengine5.6834.479.07
Theta47.5462.4356.54
First Token Latency
(ms)
ProvidersMin (ms)Max (ms)Avg (ms)
Volcengine5281061853.24
Theta10745735315156.75
Providers for Kimi K2 0711
ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.
Latency
0.66
s
Throughput
8.01
tps
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 0.56
/ M tokens
Output
$ 2.23
/ M tokens
Cache read
$ 0.2
/ M tokens
Cache write 5m
-
Cache write 1h
$ 0.003
/ M tokens
Cache write
-
Web search
-
Model limitation
Context
128.00K
Max output
32.00K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
-
logit_bias
-
logprobs
-
top_logprobs
-
response_format
-
stop
tools
tool_choice
-
parallel_tool_calls
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Latency
-
Throughput
-
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 0.56
/ M tokens
Output
$ 2.23
/ M tokens
Cache read
-
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
-
Model limitation
Context
128.00K
Max output
32.00K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
logit_bias
-
logprobs
top_logprobs
-
response_format
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Sample code and API for Kimi K2 0711
ZenMux normalizes requests and responses across providers for you.
OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL
python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="moonshotai/kimi-k2-0711",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)