Thank you for choosing our service. Add credits today to receive an additional 20% bonus. Add Credits

Models

Input Modalities

Text

Image

File

Audio

Context Length

4K64K1M

Series

GPT

Claude

Gemini

Providers

OpenAI

Anthropic

Theta

Supported Parameters

max_completion_tokens

temperature

top_p

Models

45 models

inclusionAI: Ling-1T

43.41M tokens

Ling-1T is a trillion-parameter sparse mixture-of-experts (MoE) model developed by inclusionAI, optimized for efficient and scalable reasoning. Featuring approximately 50 billion active parameters per token, it is pre-trained on over 20 trillion reasoning-dense tokens, supports a 128K context length, and utilizes an Evolutionary Chain-of-Thought (Evo-CoT) process to enhance its reasoning depth. The model achieves state-of-the-art performance across complex benchmarks, demonstrating strong capabilities in code generation, software development, and advanced mathematics. In addition to its core reasoning skills, Ling-1T possesses specialized abilities in front-end code generation—combining semantic understanding with visual aesthetics—and exhibits emergent agentic capabilities, such as proficient tool use with minimal instruction tuning. Its primary use cases span software engineering, professional mathematics, complex logical reasoning, and agent-based workflows that demand a balance of high performance and efficiency.

Input type

Context128.00K

Input$0.56/M tokens

Output$2.24/M tokens

Z.AI: GLM 4.6

1.25M tokens

GLM-4.6 is the flagship model from Zhishen, their latest. It has a total of 355 billion parameters and an active parameter of 32 billion. GLM-4.6 has surpassed all core capabilities of GLM-4.5, specifically: Advanced coding ability: In both public benchmarks and real programming tasks, GLM-4.6's coding ability matches Claude Sonnet 4, making it the best Coding model domestically known. Context length: The context window has increased from 128K to 200K, allowing it to handle longer code and intelligent agent tasks. Inference ability: There has been an improvement in inference capabilities, and it now supports calling tools during the inference process. Search capability: Enhanced the model's performance in tool calling and search intelligent agents, performing better within the intelligent agent framework. Writing ability: In terms of style, readability, and role-playing scenarios, it more closely aligns with human preferences. Multilingual translation: Further strengthened the model's ability to handle cross-linguistic tasks. If you need any further assistance or have more text to translate, feel free to let me know!

Input type

Context200.00K

Input$0.35/M tokens

Output$1.54/M tokens

Anthropic: Claude Sonnet 4.5

27.82M tokens

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.

Input type

Context200.00K

Input$3/M tokens

Output$15/M tokens

Qwen: Qwen3-Max

2.19M tokens

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.

Input type

Context256.00K

Input$1.2/M tokens

Output$6/M tokens

Qwen: Qwen3-VL-Plus

66.08K tokens

The Qwen3 series VL models effectively integrates thinking and non-thinking modes, achieving world-leading performance in visual agent capabilities on public benchmark datasets such as OS World. This version features comprehensive upgrades in areas like visual coding, spatial perception, and multimodal reasoning, significantly enhancing visual perception and recognition abilities, and supporting the understanding of ultra-long videos.

Input type

Context262.14K

Input$0.2/M tokens

Output$1.6/M tokens

xAI: Grok 4 Fast

724.55K tokens

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Read more about the model on xAI's [news post](http://x.ai/news/grok-4-fast). Reasoning can be enabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens) Prompts and completions may be used by xAI or OpenRouter to improve future models.

Input type

Context2.00M

Input$0.2/M tokens

Output$0.5/M tokens

xAI: Grok 4 Fast None Reasoning

1.24M tokens

Input type

Context2.00M

Input$0.2/M tokens

Output$0.5/M tokens

inclusionAI: Ling-flash-2.0

567.99K tokens

Ling-flash-2.0 is an open-source Mixture-of-Experts (MoE) language model developed under the Ling 2.0 architecture. It features 100 billion total parameters, with 6.1 billion activated during inference (4.8B non-embedding). Trained on over 20 trillion tokens and refined with supervised fine-tuning and multi-stage reinforcement learning, the model demonstrates strong performance against dense models up to 40B parameters. It excels in complex reasoning, code generation, and frontend development.

Input type

Context128.00K

Input$0.28/M tokens

Output$2.8/M tokens

inclusionAI: Ring-flash-2.0

930.85K tokens

Trained on over 20 trillion tokens and refined with supervised fine-tuning and multi-stage reinforcement learning, the model demonstrates strong performance against dense models up to 40B parameters. It excels in complex reasoning, code generation, and frontend development.

Input type

Context128.00K

Input$0.28/M tokens

Output$2.8/M tokens

MoonshotAI: Kimi K2 0905

257.50K tokens

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

Input type

Context262.10K

Input$0.6/M tokens

Output$2.5/M tokens

inclusionAI: Ling-mini-2.0

1.46M tokens

Ling-mini-2.0 is an open-source Mixture-of-Experts (MoE) large language model designed to balance strong task performance with high inference efficiency. It has 16B total parameters, with approximately 1.4B activated per token (about 789M non-embedding). Trained on over 20T tokens and refined via multi-stage supervised fine-tuning and reinforcement learning, it is reported to deliver strong results in complex reasoning and instruction following while keeping computational costs low. According to the upstream release, it reaches top-tier performance among sub-10B dense LLMs and in some cases matches or surpasses larger MoE models.

Input type

Context128.00K

Input$0.07/M tokens

Output$0.28/M tokens

inclusionAI: Ring-mini-2.0

688.87K tokens

Ring-mini-2.0 is a Mixture-of-Experts (MoE) model oriented toward high-throughput inference and extensively optimized on the Ling 2.0 architecture. It uses 16B total parameters with approximately 1.4B activated per token and is reported to deliver comprehensive reasoning performance comparable to sub-10B dense LLMs. The model shows strong results on logical reasoning, code generation, and mathematical tasks, supports 128K context windows, and reports generation speeds of 300+ tokens per second.

Input type

Context128.00K

Input$0.07/M tokens

Output$0.7/M tokens

xAI: Grok Code Fast 1

240.25K tokens

Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows.

Input type

Context256.00K

Input$0.2/M tokens

Output$1.5/M tokens

Google: Gemini 2.0 Flash Lite

164.18K tokens

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5, all at extremely economical token prices.

Input type

Context1.05M

Input$0.075/M tokens

Output$0.3/M tokens

DeepSeek: DeepSeek V3.1

259.74K tokens

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

Input type

Context128.00K

Input$0.28/M tokens

Output$1.11/M tokens

Google: Gemini 2.0 Flash

11.95K tokens

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Input type

Context1.05M

Input$0.15/M tokens

Output$0.6/M tokens

Google: Gemini 2.5 Flash

556.50K tokens

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation.

Input type

Context1.05M

Input$0.3/M tokens

Output$2.5/M tokens

Google: Gemini 2.5 Pro

3.82M tokens

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Input type

Context1.05M

Input$1.25/M tokens

Output$10/M tokens

Google: Gemini 2.5 Flash Lite

165.08K tokens

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

Input type

Context1.05M

Input$0.1/M tokens

Output$0.4/M tokens

OpenAI: GPT-5

2.62M tokens

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

Input type

Context400.00K

Input$1.25/M tokens

Output$10/M tokens

OpenAI: GPT-5 Chat

2.32M tokens

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

Input type

Context128.00K

Input$1.25/M tokens

Output$10/M tokens

OpenAI: GPT-5 Mini

3.17M tokens

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model.

Input type

Context400.00K

Input$0.25/M tokens

Output$2/M tokens

OpenAI: GPT-5 Nano

429.36K tokens

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications.

Input type

Context400.00K

Input$0.05/M tokens

Output$0.4/M tokens

Anthropic: Claude Opus 4.1

10.28M tokens

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning.

Input type

Context200.00K

Input$15/M tokens

Output$75/M tokens

Z.AI: GLM 4.5

143.81K tokens

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean.

Input type

Context128.00K

Input$0.35/M tokens

Output$1.54/M tokens

Z.AI: GLM 4.5 Air

102.46K tokens

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Input type

Context128.00K

Input$0.11/M tokens

Output$0.56/M tokens

Qwen3-Coder-Plus

10.34K tokens

Powered by Qwen3, this is a powerful Coding Agent that excels in tool calling and environment interaction to achieve autonomous programming. It combines outstanding coding proficiency with versatile general-purpose abilities.

Input type

Context1000.00K

Input$1/M tokens

Output$5/M tokens

Qwen: Qwen3 235B A22B Instruct 2507

2.45M tokens

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

Input type

Context256.00K

Input$0.28/M tokens

Output$1.11/M tokens

Qwen: Qwen3 235B A22B Thinking 2507

177.26K tokens

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.

Input type

Context256.00K

Input$0.28/M tokens

Output$2.78/M tokens

Qwen: Qwen3-Coder

6.69K tokens

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

Input type

Context256.00K

Input$1.25/M tokens

Output$5.01/M tokens

MoonshotAI: Kimi K2 0711

189.61K tokens

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

Input type

Context128.00K

Input$0.56/M tokens

Output$2.23/M tokens

xAI: Grok 4

561.97K tokens

Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified. Pricing increases once the total tokens in a given request is greater than 128k tokens. See more details on the [xAI docs](https://docs.x.ai/docs/models/grok-4-0709)

Input type

Context256.00K

Input$3/M tokens

Output$15/M tokens

DeepSeek: R1 0528

233.71K tokens

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.

Input type

Context64.00K

Input$0.56/M tokens

Output$2.23/M tokens

DeepSeek: deepseek-chat (v3.2)

819.03K tokens

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. For model details, please visit the DeepSeek-V3 repo for more information, or see the launch announcement.

Input type

Context128.00K

Input$0.56/M tokens

Output$1.68/M tokens

Anthropic: Claude Opus 4

373.73K tokens

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation.

Input type

Context200.00K

Input$15/M tokens

Output$75/M tokens

Anthropic: Claude Sonnet 4

226.43M tokens

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios.

Input type

Context1000.00K

Input$3/M tokens

Output$15/M tokens

OpenAI: o4 Mini

70.02K tokens

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains. Despite its smaller size, o4-mini exhibits high accuracy in STEM tasks, visual problem solving (e.g., MathVista, MMMU), and code editing. It is especially well-suited for high-throughput scenarios where latency or cost is critical. Thanks to its efficient architecture and refined reinforcement learning training, o4-mini can chain tools, generate structured outputs, and solve multi-step tasks with minimal delay—often in under a minute.

Input type

Context200.00K

Input$1.1/M tokens

Output$4.4/M tokens

OpenAI: GPT-4.1

3.41M tokens

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Input type

Context1.05M

Input$2/M tokens

Output$8/M tokens

OpenAI: GPT-4.1 Mini

7.84M tokens

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

Input type

Context1.05M

Input$0.4/M tokens

Output$1.6/M tokens

OpenAI: GPT-4.1 Nano

127.48M tokens

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding – even higher than GPT‑4o mini. It’s ideal for tasks like classification or autocompletion.

Input type

Context1.05M

Input$0.1/M tokens

Output$0.4/M tokens

Anthropic: Claude 3.7 Sonnet

2.77M tokens

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks.

Input type

Context200.00K

Input$3/M tokens

Output$15/M tokens

Anthropic: Claude 3.5 Haiku

53.44K tokens

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions. This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.

Input type

Context200.00K

Input$0.8/M tokens

Output$4/M tokens

Anthropic: Claude 3.5 Sonnet

197.12K tokens

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

Input type

Context200.00K

Input$3/M tokens

Output$15/M tokens

OpenAI: GPT-4o-mini

813.55K tokens

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). #multimodal

Input type

Context128.00K

Input$0.15/M tokens

Output$0.6/M tokens

OpenAI: GPT-4o

74.36K tokens

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal

Input type

Context128.00K

Input$2.5/M tokens

Output$10/M tokens