Massive Multitask Language Understanding
Tests knowledge across 57 academic subjects including STEM, humanities, and social sciences. Measures breadth of world knowledge.
| # | Model | Score | |
|---|---|---|---|
| 1 | Gemini 3 ProGoogle | 93.1% | Try |
| 2 | GPT-5.3 CodexOpenAI | 93% | Try |
| 3 | GPT-5.2OpenAI | 92.8% | Try |
| 4 | Gemini 3.1 ProGoogle | 92.6% | Try |
| 5 | Claude Opus 4.6Anthropic | 92.5% | Try |
| 6 | GPT-5.1OpenAI | 91.5% | Try |
| 7 | Claude Sonnet 4.6Anthropic | 91% | Try |
| 8 | DeepSeek R1DeepSeek | 90.8% | Try |
| 9 | Claude Opus 4.5Anthropic | 90.5% | Try |
| 10 | Grok 4.1xAI | 90.2% | Try |
| 11 | o3OpenAI | 89.4% | Try |
| 12 | Gemini 2.5 ProGoogle | 89% | Try |
| 13 | Claude Sonnet 4.5Anthropic | 88.7% | Try |
| 14 | Gemini 3 FlashGoogle | 88% | Try |
| 15 | Grok 4xAI | 87.5% | Try |
| 16 | o4-miniOpenAI | 86.8% | Try |
| 17 | GPT-4.1OpenAI | 86% | Try |
| 18 | DeepSeek V3.2DeepSeek | 85.7% | Try |
| 19 | Mistral Large 3Mistral | 84% | Try |
| 20 | Gemini 2.5 FlashGoogle | 83.5% | Try |
| 21 | Llama 4 MaverickMeta | 82.5% | Try |
| 22 | Claude Haiku 4.5Anthropic | 82% | Try |
| 23 | GPT-4.1 miniOpenAI | 80.5% | Try |
| 24 | Llama 4 ScoutMeta | 78% | Try |
| 25 | Mistral Small 3.2Mistral | 72.7% | Try |
| 26 | GPT-4.1 nanoOpenAI | 72% | Try |