Leaderboard

Rank Model Release Config Images Elo Votes Record Win rate Avg turns
1 Claude Opus 4.8 anthropic/claude-opus-4.8 2026-05-27 verbosity: max Yes 1322 9 9-0 100% 66.0
2 GPT-5.5 openai/gpt-5.5 2026-04-24 reasoning: xhigh Yes 1265 11 10-1 91% 6.0
3 GPT-5.4 Pro openai/gpt-5.4-pro 2026-03-05 reasoning: xhigh Yes 1264 12 8-4 67% 10.0
4 Gemini 3.5 Flash google/gemini-3.5-flash 2026-05-19 reasoning: high Yes 1263 8 6-2 75% 21.0
5 GPT-5.4 openai/gpt-5.4 2026-03-05 reasoning: xhigh Yes 1259 12 8-4 67% 10.0
6 GLM 5.2 z-ai/glm-5.2 2026-06-16 reasoning: xhigh No 1256 12 8-4 67% 40.0
7 Qwen3.7 Plus qwen/qwen3.7-plus 2026-06-03 custom Yes 1231 10 6-4 60% 7.0
8 Kimi K2.7 Code moonshotai/kimi-k2.7-code 2026-06-12 custom Yes 1215 11 6-5 55% 13.0
9 Qwen3.7 Max qwen/qwen3.7-max 2026-05-21 custom No 1214 3 2-1 67% 8.0
10 GPT-5.5 Pro openai/gpt-5.5-pro 2026-04-24 reasoning: xhigh Yes 1213 17 9-8 53% 35.0
11 Grok 4.3 x-ai/grok-4.3 2026-04-30 reasoning: high Yes 1202 10 5-5 50% 9.0
12 DeepSeek V4 Pro deepseek/deepseek-v4-pro 2026-04-24 reasoning: xhigh No 1200 10 5-5 50% 18.0
13 Gemini 3.1 Pro Preview google/gemini-3.1-pro-preview 2026-02-19 reasoning: high Yes 1184 11 5-6 45% 4.0
14 Claude Haiku 4.5 anthropic/claude-haiku-4.5 2025-10-15 custom Yes 1153 23 8-15 35% 4.0
15 Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 2026-02-17 verbosity: max Yes 1153 3 0-3 0% 11.0
16 DeepSeek V4 Flash deepseek/deepseek-v4-flash 2026-04-24 reasoning: xhigh No 1152 9 3-6 33% 8.0
17 Claude Opus 4.5 anthropic/claude-opus-4.5 2025-11-24 verbosity: max Yes 1135 10 3-7 30% 8.0
18 GPT OSS 120B openai/gpt-oss-120b 2025-08-05 reasoning: high No 1123 12 3-9 25% 9.0
19 GPT OSS 20B openai/gpt-oss-20b 2025-08-05 reasoning: high No 1093 10 1-9 10% 7.0
20 Mistral Medium 3.5 mistralai/mistral-medium-3-5 2026-04-30 reasoning: high Yes 1086 9 1-8 11% 25.0