Graduate-Level Google-Proof Q&A
Expert-crafted questions in biology, physics, and chemistry that are difficult even for domain experts with internet access.
| # | Model | Score | |
|---|---|---|---|
| 1 | Gemini 3.1 ProGoogle | 94.3% | Try |
| 2 | Gemini 3 ProGoogle | 91.9% | Try |
| 3 | Claude Opus 4.6Anthropic | 91.3% | Try |
| 4 | GPT-5.2OpenAI | 90.3% | Try |
| 5 | GPT-5.1OpenAI | 88.1% | Try |
| 6 | Claude Sonnet 4.6Anthropic | 88% | Try |
| 7 | Grok 4.1xAI | 88% | Try |
| 8 | Grok 4xAI | 87.5% | Try |
| 9 | Claude Opus 4.5Anthropic | 87% | Try |
| 10 | Claude Sonnet 4.5Anthropic | 83.4% | Try |
| 11 | GPT-5.3 CodexOpenAI | 81% | Try |
| 12 | DeepSeek R1DeepSeek | 81% | Try |
| 13 | Gemini 3 FlashGoogle | 80% | Try |
| 14 | DeepSeek V3.2DeepSeek | 79.9% | Try |
| 15 | o3OpenAI | 79.7% | Try |
| 16 | Gemini 2.5 ProGoogle | 74% | Try |
| 17 | o4-miniOpenAI | 73.4% | Try |
| 18 | Llama 4 MaverickMeta | 69.8% | Try |
| 19 | GPT-4.1OpenAI | 66.3% | Try |
| 20 | Gemini 2.5 FlashGoogle | 65.8% | Try |
| 21 | Llama 4 ScoutMeta | 57.2% | Try |
| 22 | Claude Haiku 4.5Anthropic | 55% | Try |
| 23 | GPT-4.1 miniOpenAI | 52.1% | Try |
| 24 | Mistral Small 3.2Mistral | 46.1% | Try |
| 25 | Mistral Large 3Mistral | 43.9% | Try |