American Invitational Mathematics Examination
Competition-level math problems from the AMC pipeline. Tests advanced problem-solving with proof-style integer answers. Strong differentiator among frontier models.
| # | Model | Score | |
|---|---|---|---|
| 1 | o3OpenAI | 96.7% | Try |
| 2 | GPT-5.3 CodexOpenAI | 94% | Try |
| 3 | o4-miniOpenAI | 93.4% | Try |
| 4 | Gemini 3.1 ProGoogle | 91.2% | Try |
| 5 | GPT-5.2OpenAI | 88% | Try |
| 6 | Gemini 3 ProGoogle | 85% | Try |
| 7 | Claude Opus 4.6Anthropic | 83.3% | Try |
| 8 | GPT-5.1OpenAI | 82% | Try |
| 9 | DeepSeek R1DeepSeek | 79.8% | Try |
| 10 | Claude Sonnet 4.6Anthropic | 78% | Try |
| 11 | Grok 4.1xAI | 78% | Try |
| 12 | Gemini 2.5 ProGoogle | 75% | Try |
| 13 | Claude Opus 4.5Anthropic | 72% | Try |
| 14 | Gemini 3 FlashGoogle | 72% | Try |
| 15 | Grok 4xAI | 72% | Try |
| 16 | Claude Sonnet 4.5Anthropic | 58% | Try |
| 17 | Gemini 2.5 FlashGoogle | 58% | Try |
| 18 | DeepSeek V3.2DeepSeek | 56% | Try |
| 19 | GPT-4.1OpenAI | 52% | Try |
| 20 | Llama 4 MaverickMeta | 48% | Try |
| 21 | Mistral Large 3Mistral | 42% | Try |
| 22 | Llama 4 ScoutMeta | 35% | Try |
| 23 | GPT-4.1 miniOpenAI | 32% | Try |
| 24 | Claude Haiku 4.5Anthropic | 28% | Try |
| 25 | Mistral Small 3.2Mistral | 18% | Try |
| 26 | GPT-4.1 nanoOpenAI | 12% | Try |