Model Horizon
DashboardModelsCompareBenchmarks
© 2026 Model Horizon
About|Terms
SYS.v0.1.0
Skip to content
  1. Home
  2. /Benchmarks
  3. /GPQA

GPQA Leaderboard

Graduate-Level Google-Proof Q&A

Expert-crafted questions in biology, physics, and chemistry that are difficult even for domain experts with internet access.

25Models Tested
94.3%Highest Score
75.8%Average
50.4%Spread
#ModelProviderScore
1Gemini 3.1 ProGoogle
GGoogle
94.3%
Try
2Gemini 3 ProGoogle
GGoogle
91.9%
Try
3Claude Opus 4.6Anthropic
AAnthropic
91.3%
Try
4GPT-5.2OpenAI
OOpenAI
90.3%
Try
5GPT-5.1OpenAI
OOpenAI
88.1%
Try
6Claude Sonnet 4.6Anthropic
AAnthropic
88%
Try
7Grok 4.1xAI
XxAI
88%
Try
8Grok 4xAI
XxAI
87.5%
Try
9Claude Opus 4.5Anthropic
AAnthropic
87%
Try
10Claude Sonnet 4.5Anthropic
AAnthropic
83.4%
Try
11GPT-5.3 CodexOpenAI
OOpenAI
81%
Try
12DeepSeek R1DeepSeek
DDeepSeek
81%
Try
13Gemini 3 FlashGoogle
GGoogle
80%
Try
14DeepSeek V3.2DeepSeek
DDeepSeek
79.9%
Try
15o3OpenAI
OOpenAI
79.7%
Try
16Gemini 2.5 ProGoogle
GGoogle
74%
Try
17o4-miniOpenAI
OOpenAI
73.4%
Try
18Llama 4 MaverickMeta
MMeta
69.8%
Try
19GPT-4.1OpenAI
OOpenAI
66.3%
Try
20Gemini 2.5 FlashGoogle
GGoogle
65.8%
Try
21Llama 4 ScoutMeta
MMeta
57.2%
Try
22Claude Haiku 4.5Anthropic
AAnthropic
55%
Try
23GPT-4.1 miniOpenAI
OOpenAI
52.1%
Try
24Mistral Small 3.2Mistral
MMistral
46.1%
Try
25Mistral Large 3Mistral
MMistral
43.9%
Try