Large Language Model Rankings

LLM Capability Leaderboard

Compare leading language models by coding, writing, reasoning, math, and multimodal capability.

Snapshot updated 2026-07-04. Scores are AI Explorer normalized 0-100 ratings from public leaderboard signals.

Data source

Model data is organized from public leaderboards, provider information, and curated capability signals.

Ranking logic

Scores normalize public signals across coding, writing, reasoning, math, and multimodal capability.

Best for

Use the LLM page to compare model strengths before choosing a provider or model family.

Math20 models

Overall Coding Writing Reasoning Math Multimodal

GPT-5.5

OpenAIProprietary

OpenAI frontier work model for complex real-world tasks, agentic coding, research, data analysis, and cross-tool execution.

Agentic codingReal-world workResearch workflows

Math98

Context: 1M+
Release: 2026-04

Gemini 3.5 Pro

GoogleProprietary

Google latest Pro-family general model for complex reasoning, long-context work, multimodal tasks, and high-quality generation.

Complex reasoningLong-context analysisMultimodal reasoning

Math98

Context: 1M+
Release: 2026-05

DeepSeek V4

DeepSeekOpen weights

DeepSeek next-generation V4 family model for long-context work, coding, math reasoning, and cost-efficient production APIs.

Long-context reasoningCodingCost-efficient inference

Math97

Context: 1M
Release: 2026-04

GPT-5.4

OpenAIProprietary

Frontier general model for complex reasoning, coding, tool use, and professional writing.

Software engineeringReasoning workflowsStructured writing

Math96

Context: 1M+
Release: 2026

Gemini 3.1 Pro

GoogleProprietary

General model with strong long-context, multimodal, and reasoning capability.

Long-context analysisMultimodal reasoningGeneral task quality

Math95

Context: 1M+
Release: 2026

DeepSeek R2

DeepSeekOpen weights

Open-weight reasoning model focused on math, coding, and cost-efficient deployment.

Math reasoningOpen deploymentCoding tasks

Math95

Context: 128K+
Release: 2026

Claude Opus 4.8

AnthropicProprietary

Anthropic latest Opus-class model with strong writing, code review, complex reasoning, and agentic workflow behavior.

Writing qualityCode reviewAgent workflows

Math94

Context: 200K+
Release: 2026-05

GLM-5.1

Z.AIOpen weights

Z.AI flagship model for long-horizon agent tasks, real-world engineering delivery, coding, and complex reasoning.

Agentic codingLong-horizon tasksTool use

Math94

Context: 200K
Release: 2026-04

Grok 4.1 Thinking

xAIProprietary

Thinking model for complex Q&A and fresh-information workflows.

Reasoning modeConversational tasksFresh knowledge workflows

Math94

Context: 256K+
Release: 2026

#10

DeepSeek V4 Flash

DeepSeekOpen weights

Low-latency DeepSeek V4 variant for high-throughput, cost-sensitive, and real-time product scenarios.

Fast inferenceLow costLong context

Math94

Context: 1M
Release: 2026-04

#11

Gemini 3.5 Flash

GoogleProprietary

Google latest Flash-family model for low-latency, multimodal, long-context, and high-throughput applications.

Fast responseMultimodal workflowsLong-context analysis

Math93

Context: 1M+
Release: 2026-05

#12

GLM-5

Z.AIOpen weights

Z.AI GLM-5 foundation model for general reasoning, writing, coding, and agentic engineering workflows.

General reasoningAgent workflowsCoding

Math93

Context: 200K+
Release: 2026

#13

GPT-5.3 Codex

OpenAIProprietary

Coding-optimized model for repository-scale editing, testing, and debugging.

Repository-scale codingDebuggingTool use

Math92

Context: 1M+
Release: 2026

#14

Qwen 3 Max

AlibabaProprietary

Balanced multilingual and coding model with broad API availability.

Multilingual workCodingCost-sensitive apps

Math91

Context: 1M
Release: 2026

#15

GLM-4.6

Z.AIOpen weights

Z.AI GLM-4.6 improves real-world coding, long-context processing, reasoning, search, writing, and agentic applications.

Real-world codingLong contextAgent workflows

Math88

Context: 200K
Release: 2025

#16

Mistral Large 3

Mistral AIProprietary

European model for enterprise API, coding, and multilingual workloads.

Enterprise APIMultilingual workCoding

Math86

Context: 256K+
Release: 2026

#17

Kimi K2

Moonshot AIProprietary

General model with strong long-context and Chinese-language performance.

Long contextChinese tasksResearch workflows

Math85

Context: 1M+
Release: 2026

#18

Llama 4 405B

MetaOpen weights

Large open-weight model for self-hosted enterprise and research workflows.

Open ecosystemSelf-hostingGeneral generation

Math82

Context: 128K+
Release: 2026

#19

Yi Large

01.AIProprietary

Stable model for Chinese, writing, and general knowledge tasks.

Chinese writingGeneral knowledgeAPI apps

Math82

Context: 200K+
Release: 2026

#20

Mixtral 8x22B

Mistral AIOpen weights

Open-weight MoE model for self-hosting and cost-sensitive inference.

Open weightsSelf-hostingInference cost

Math75

Context: 64K+
Release: 2024

Leaderboard Sources

Anthropic Claude model docs Artificial Analysis Google Gemini model docs Hugging Face Open LLM Leaderboard LiveCodeBench LMArena Chatbot Arena SWE-bench