Model Horizon
DashboardModelsCompareBenchmarks
© 2026 Model Horizon
About|Terms
SYS.v0.1.0
Skip to content
  1. Home
  2. /Benchmarks
  3. /SimpleQA

SimpleQA Leaderboard

SimpleQA Factuality Benchmark

Short-form factual questions with verifiable answers. Measures factual accuracy and resistance to hallucination. Lower scores are common even for frontier models.

26Models Tested
79.6%Highest Score
37%Average
65.4%Spread
#ModelProviderScore
1Gemini 3.1 ProGoogle
GGoogle
79.6%
Try
2GPT-5.3 CodexOpenAI
OOpenAI
58%
Try
3GPT-5.2OpenAI
OOpenAI
52.5%
Try
4Gemini 3 ProGoogle
GGoogle
49%
Try
5GPT-5.1OpenAI
OOpenAI
48%
Try
6o3OpenAI
OOpenAI
47.9%
Try
7Claude Opus 4.6Anthropic
AAnthropic
43.2%
Try
8GPT-4.1OpenAI
OOpenAI
42.8%
Try
9Gemini 2.5 ProGoogle
GGoogle
41.5%
Try
10o4-miniOpenAI
OOpenAI
40.3%
Try
11Claude Sonnet 4.6Anthropic
AAnthropic
39.5%
Try
12Grok 4.1xAI
XxAI
38%
Try
13Claude Opus 4.5Anthropic
AAnthropic
36%
Try
14Gemini 3 FlashGoogle
GGoogle
36%
Try
15Grok 4xAI
XxAI
34.2%
Try
16DeepSeek V3.2DeepSeek
DDeepSeek
33%
Try
17DeepSeek R1DeepSeek
DDeepSeek
31.4%
Try
18Claude Sonnet 4.5Anthropic
AAnthropic
30.8%
Try
19Mistral Large 3Mistral
MMistral
29%
Try
20Gemini 2.5 FlashGoogle
GGoogle
28.3%
Try
21Llama 4 MaverickMeta
MMeta
27.5%
Try
22GPT-4.1 miniOpenAI
OOpenAI
26.5%
Try
23Llama 4 ScoutMeta
MMeta
21%
Try
24Claude Haiku 4.5Anthropic
AAnthropic
19%
Try
25Mistral Small 3.2Mistral
MMistral
15.5%
Try
26GPT-4.1 nanoOpenAI
OOpenAI
14.2%
Try