Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

MoonshotAI: Kimi K2 0711

moonshotai/kimi-k2-0711

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

BymoonshotaiInput typeOutput type

Recent activity on Kimi K2 0711

Tokens processed per day

Thoughput

(tokens/s)

Providers	Min (tokens/s)	Max (tokens/s)	Avg (tokens/s)
Volcengine	5.68	34.47	9.07
Theta	47.54	62.43	56.54

First Token Latency

(ms)

Providers	Min (ms)	Max (ms)	Avg (ms)
Volcengine	528	1061	853.24
Theta	1074	57353	15156.75

Providers for Kimi K2 0711

ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

Volcengine

Latency

0.66

Throughput

8.01

tps

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 0.56

/ M tokens

Output

$ 2.23

/ M tokens

Cache read

$ 0.2

/ M tokens

Cache write 5m

Cache write 1h

$ 0.003

/ M tokens

Cache write

Web search

Model limitation

Context

128.00K

Max output

32.00K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Theta

Latency

Throughput

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 0.56

/ M tokens

Output

$ 2.23

/ M tokens

Cache read

Cache write 5m

Cache write 1h

Cache write

Web search

Model limitation

Context

128.00K

Max output

32.00K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Sample code and API for Kimi K2 0711

ZenMux normalizes requests and responses across providers for you.

OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL

python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="moonshotai/kimi-k2-0711",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)