大语言模型排行

LLM 能力排行榜

按代码、写作、推理、数学和多模态能力对主流大语言模型进行结构化对比。

快照更新时间：2026-07-04。分数为 AI Explorer 根据公开榜单信号整理的 0-100 标准化评分。

数据来源

模型数据根据公开榜单、厂商信息和整理后的能力信号生成。

排行逻辑

分数会标准化代码、写作、推理、数学和多模态等公开能力信号。

适合人群

适合在选择模型厂商或模型系列前，先比较不同模型的能力侧重点。

推理20 个模型

综合代码写作推理数学多模态

GPT-5.5

OpenAIProprietary

OpenAI 前沿通用工作模型，面向复杂真实任务、智能体编码、研究、数据分析和跨工具执行。

Agentic codingReal-world workResearch workflows

推理99

上下文: 1M+
发布: 2026-04

Gemini 3.5 Pro

GoogleProprietary

Google 最新 Pro 系列通用模型，重点覆盖复杂推理、长上下文、多模态和高质量生成任务。

Complex reasoningLong-context analysisMultimodal reasoning

推理99

上下文: 1M+
发布: 2026-05

GPT-5.4

OpenAIProprietary

高端通用模型，适合复杂推理、代码、工具调用和专业写作。

Software engineeringReasoning workflowsStructured writing

推理97

上下文: 1M+
发布: 2026

Claude Opus 4.8

AnthropicProprietary

Anthropic 最新 Opus 系列高端模型，写作、代码审查、复杂推理和智能体工作流表现突出。

Writing qualityCode reviewAgent workflows

推理96

上下文: 200K+
发布: 2026-05

Gemini 3.1 Pro

GoogleProprietary

长上下文、多模态和推理能力突出的通用模型。

Long-context analysisMultimodal reasoningGeneral task quality

推理96

上下文: 1M+
发布: 2026

DeepSeek V4

DeepSeekOpen weights

DeepSeek 新一代 V4 系列模型，面向长上下文、代码、数学推理和高性价比生产 API。

Long-context reasoningCodingCost-efficient inference

推理96

上下文: 1M
发布: 2026-04

GLM-5.1

Z.AIOpen weights

Z.AI 最新旗舰模型，面向长周期智能体任务、真实工程交付、代码和复杂推理。

Agentic codingLong-horizon tasksTool use

推理96

上下文: 200K
发布: 2026-04

Grok 4.1 Thinking

xAIProprietary

推理型模型，适合复杂问答和需要新信息的工作流。

Reasoning modeConversational tasksFresh knowledge workflows

推理96

上下文: 256K+
发布: 2026

GLM-5

Z.AIOpen weights

Z.AI GLM-5 基础模型，覆盖通用推理、写作、代码和智能体工程工作流。

General reasoningAgent workflowsCoding

推理95

上下文: 200K+
发布: 2026

#10

Gemini 3.5 Flash

GoogleProprietary

Google 最新 Flash 系列模型，面向低延迟、多模态、长上下文和高吞吐应用。

Fast responseMultimodal workflowsLong-context analysis

推理94

上下文: 1M+
发布: 2026-05

#11

GPT-5.3 Codex

OpenAIProprietary

面向仓库级代码编辑、测试和调试优化的模型。

Repository-scale codingDebuggingTool use

推理94

上下文: 1M+
发布: 2026

#12

DeepSeek V4 Flash

DeepSeekOpen weights

DeepSeek V4 的低延迟版本，适合高吞吐、成本敏感和实时产品场景。

Fast inferenceLow costLong context

推理93

上下文: 1M
发布: 2026-04

#13

DeepSeek R2

DeepSeekOpen weights

开放权重推理模型，侧重数学、代码和高性价比部署。

Math reasoningOpen deploymentCoding tasks

推理93

上下文: 128K+
发布: 2026

#14

Qwen 3 Max

AlibabaProprietary

多语言、代码和成本敏感应用表现均衡。

Multilingual workCodingCost-sensitive apps

推理90

上下文: 1M
发布: 2026

#15

GLM-4.6

Z.AIOpen weights

Z.AI GLM-4.6，在真实代码、长上下文、推理、搜索、写作和智能体应用上增强。

Real-world codingLong contextAgent workflows

推理90

上下文: 200K
发布: 2025

#16

Kimi K2

Moonshot AIProprietary

长上下文和中文任务表现突出的通用模型。

Long contextChinese tasksResearch workflows

推理87

上下文: 1M+
发布: 2026

#17

Mistral Large 3

Mistral AIProprietary

适合企业 API、代码和多语言任务的欧洲模型。

Enterprise APIMultilingual workCoding

推理87

上下文: 256K+
发布: 2026

#18

Llama 4 405B

MetaOpen weights

大型开放权重通用模型，适合自托管企业和研究场景。

Open ecosystemSelf-hostingGeneral generation

推理84

上下文: 128K+
发布: 2026

#19

Yi Large

01.AIProprietary

中文、写作和通用知识任务表现稳定。

Chinese writingGeneral knowledgeAPI apps

推理83

上下文: 200K+
发布: 2026

#20

Mixtral 8x22B

Mistral AIOpen weights

开放权重 MoE 模型，适合自托管和成本敏感推理。

Open weightsSelf-hostingInference cost

推理77

上下文: 64K+
发布: 2024

榜单来源

Anthropic Claude model docs Artificial Analysis Google Gemini model docs Hugging Face Open LLM Leaderboard LiveCodeBench LMArena Chatbot Arena SWE-bench