Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

DeepSeek: DeepSeek V3.1

deepseek/deepseek-chat-v3.1

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

BydeepseekInput typeOutput type

Recent activity on DeepSeek V3.1

Tokens processed per day

Thoughput

(tokens/s)

Providers	Min (tokens/s)	Max (tokens/s)	Avg (tokens/s)
Theta	14.05	81.82	52.61
Volcengine	14.67	41.3	23.38

First Token Latency

(ms)

Providers	Min (ms)	Max (ms)	Avg (ms)
Theta	570	9357	1606.59
Volcengine	882	2354	1257.34

Providers for DeepSeek V3.1

ZenMux Provider to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

Theta

Latency

0.62

Throughput

16.41

tps

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 0.28

/ M tokens

Output

$ 1.11

/ M tokens

Cache read

Cache write 5m

Cache write 1h

Cache write

Web search

Model limitation

Context

128.00K

Max output

65.54K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Volcengine

Latency

Throughput

Uptime

100.00

Recent uptime

Oct 10,2025 - 3 PM100.00%

Price

Input

$ 0.56

/ M tokens

Output

$ 1.68

/ M tokens

Cache read

$ 0.11

/ M tokens

Cache write 5m

Cache write 1h

$ 0.0024

/ M tokens

Cache write

Web search

Model limitation

Context

128.00K

Max output

65.54K

Supported Parameters

max_completion_tokens

temperature

top_p

frequency_penalty

presence_penalty

seed

logit_bias

logprobs

top_logprobs

response_format

stop

tools

tool_choice

parallel_tool_calls

Model Protocol Compatibility

openai

anthropic

Data policy

Prompt training

false

Prompt Logging

Zero retention

Moderation

Responsibility of developer

Sample code and API for DeepSeek V3.1

ZenMux normalizes requests and responses across providers for you.

OpenAI-PythonPythonTypeScriptOpenAI-TypeScriptcURL

python
from openai import OpenAI

client = OpenAI(
  base_url="https://zenmux.ai/api/v1",
  api_key="<ZenMux_API_KEY>",
)

completion = client.chat.completions.create(
  model="deepseek/deepseek-chat-v3.1",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)