Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
Context128.00K
Input$0.56/M tokens
Output$2.23/M tokens
May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model.
Context64.00K
Input$0.56/M tokens
Output$2.23/M tokens
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference.
The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.
It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.
Context128.00K
Input$0.28/M tokens
Output$1.11/M tokens
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks).
Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.
Context256.00K
Input$0.28/M tokens
Output$1.11/M tokens
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains.
The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.
Context256.00K
Input$0.28/M tokens
Output$2.78/M tokens
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).
Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.
Context256.00K
Input$1.25/M tokens
Output$5.01/M tokens
Ring-mini-2.0 is a Mixture-of-Experts (MoE) model oriented toward high-throughput inference and extensively optimized on the Ling 2.0 architecture. It uses 16B total parameters with approximately 1.4B activated per token and is reported to deliver comprehensive reasoning performance comparable to sub-10B dense LLMs. The model shows strong results on logical reasoning, code generation, and mathematical tasks, supports 128K context windows, and reports generation speeds of 300+ tokens per second.
Context128.00K
Input$0.07/M tokens
Output$0.7/M tokens
Ling-mini-2.0 is an open-source Mixture-of-Experts (MoE) large language model designed to balance strong task performance with high inference efficiency. It has 16B total parameters, with approximately 1.4B activated per token (about 789M non-embedding). Trained on over 20T tokens and refined via multi-stage supervised fine-tuning and reinforcement learning, it is reported to deliver strong results in complex reasoning and instruction following while keeping computational costs low. According to the upstream release, it reaches top-tier performance among sub-10B dense LLMs and in some cases matches or surpasses larger MoE models.
Context128.00K
Input$0.07/M tokens
Output$0.28/M tokens
Ling-flash-2.0 is an open-source Mixture-of-Experts (MoE) language model developed under the Ling 2.0 architecture. It features 100 billion total parameters, with 6.1 billion activated during inference (4.8B non-embedding).
Trained on over 20 trillion tokens and refined with supervised fine-tuning and multi-stage reinforcement learning, the model demonstrates strong performance against dense models up to 40B parameters. It excels in complex reasoning, code generation, and frontend development.
Context128.00K
Input$0.28/M tokens
Output$2.8/M tokens
Trained on over 20 trillion tokens and refined with supervised fine-tuning and multi-stage reinforcement learning, the model demonstrates strong performance against dense models up to 40B parameters. It excels in complex reasoning, code generation, and frontend development.
Context128.00K
Input$0.28/M tokens
Output$2.8/M tokens
Ling-1T is a trillion-parameter sparse mixture-of-experts (MoE) model developed by inclusionAI, optimized for efficient and scalable reasoning. Featuring approximately 50 billion active parameters per token, it is pre-trained on over 20 trillion reasoning-dense tokens, supports a 128K context length, and utilizes an Evolutionary Chain-of-Thought (Evo-CoT) process to enhance its reasoning depth. The model achieves state-of-the-art performance across complex benchmarks, demonstrating strong capabilities in code generation, software development, and advanced mathematics. In addition to its core reasoning skills, Ling-1T possesses specialized abilities in front-end code generation—combining semantic understanding with visual aesthetics—and exhibits emergent agentic capabilities, such as proficient tool use with minimal instruction tuning. Its primary use cases span software engineering, professional mathematics, complex logical reasoning, and agent-based workflows that demand a balance of high performance and efficiency.
Context128.00K
Input$0.56/M tokens
Output$2.24/M tokens