DeepSeek: DeepSeek V3.1
deepseek/deepseek-chat-v3.1
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.
BydeepseekInput typeOutput type
Recent activity on DeepSeek V3.1
Tokens processed per day
Thoughput
(tokens/s)
ProvidersMin (tokens/s)Max (tokens/s)Avg (tokens/s)
Theta14.0581.8252.61
Volcengine14.6741.323.38
First Token Latency
(ms)
ProvidersMin (ms)Max (ms)Avg (ms)
Theta57093571606.59
Volcengine88223541257.34
Providers for DeepSeek V3.1
ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.
Latency
0.62
s
Throughput
16.41
tps
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 0.28
/ M tokens
Output
$ 1.11
/ M tokens
Cache read
-
Cache write 5m
-
Cache write 1h
-
Cache write
-
Web search
-
Model limitation
Context
128.00K
Max output
65.54K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
-
logit_bias
-
logprobs
-
top_logprobs
-
response_format
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Latency
-
Throughput
-
Uptime
100.00
%
Recent uptime
Oct 10,2025 - 3 PM100.00%
Price
Input
$ 0.56
/ M tokens
Output
$ 1.68
/ M tokens
Cache read
$ 0.11
/ M tokens
Cache write 5m
-
Cache write 1h
$ 0.0024
/ M tokens
Cache write
-
Web search
-
Model limitation
Context
128.00K
Max output
65.54K
Supported Parameters
max_completion_tokens
temperature
top_p
frequency_penalty
presence_penalty
seed
-
logit_bias
-
logprobs
-
top_logprobs
-
response_format
stop
tools
tool_choice
parallel_tool_calls
-
Model Protocol Compatibility
openai
anthropic
-
Data policy
Prompt training
false
Prompt Logging
Zero retention
Moderation
Responsibility of developer
Sample code and API for DeepSeek V3.1
ZenMux normalizes requests and responses across providers for you.
OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL
python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="deepseek/deepseek-chat-v3.1",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)