Model Horizon
DashboardModelsCompareBenchmarks
© 2026 Model Horizon
About|Terms
SYS.v0.1.0
Skip to content
  1. Home
  2. /Benchmarks
  3. /AIME

AIME Leaderboard

American Invitational Mathematics Examination

Competition-level math problems from the AMC pipeline. Tests advanced problem-solving with proof-style integer answers. Strong differentiator among frontier models.

26Models Tested
96.7%Highest Score
64.6%Average
84.7%Spread
#ModelProviderScore
1o3OpenAI
OOpenAI
96.7%
Try
2GPT-5.3 CodexOpenAI
OOpenAI
94%
Try
3o4-miniOpenAI
OOpenAI
93.4%
Try
4Gemini 3.1 ProGoogle
GGoogle
91.2%
Try
5GPT-5.2OpenAI
OOpenAI
88%
Try
6Gemini 3 ProGoogle
GGoogle
85%
Try
7Claude Opus 4.6Anthropic
AAnthropic
83.3%
Try
8GPT-5.1OpenAI
OOpenAI
82%
Try
9DeepSeek R1DeepSeek
DDeepSeek
79.8%
Try
10Claude Sonnet 4.6Anthropic
AAnthropic
78%
Try
11Grok 4.1xAI
XxAI
78%
Try
12Gemini 2.5 ProGoogle
GGoogle
75%
Try
13Claude Opus 4.5Anthropic
AAnthropic
72%
Try
14Gemini 3 FlashGoogle
GGoogle
72%
Try
15Grok 4xAI
XxAI
72%
Try
16Claude Sonnet 4.5Anthropic
AAnthropic
58%
Try
17Gemini 2.5 FlashGoogle
GGoogle
58%
Try
18DeepSeek V3.2DeepSeek
DDeepSeek
56%
Try
19GPT-4.1OpenAI
OOpenAI
52%
Try
20Llama 4 MaverickMeta
MMeta
48%
Try
21Mistral Large 3Mistral
MMistral
42%
Try
22Llama 4 ScoutMeta
MMeta
35%
Try
23GPT-4.1 miniOpenAI
OOpenAI
32%
Try
24Claude Haiku 4.5Anthropic
AAnthropic
28%
Try
25Mistral Small 3.2Mistral
MMistral
18%
Try
26GPT-4.1 nanoOpenAI
OOpenAI
12%
Try