Model | Source | Date | Average | RBench-T | RBench-T (zh) |
OpenAI o1 🥇 | Link | 2024-12-17 | 69.6 | 69.0 | 70.1 |
Gemini2.0-Flash-Thinking 🥈 | Link | 2025-01-21 | 68.0 | 68.4 | 67.5 |
Doubao1.5Pro 🥉 | Link | 2025-01-21 | 62.7 | 62.0 | 63.4 |
o1-Preview | Link | 2024-09-12 | 62.5 | 62.3 | 62.6 |
o1-mini | Link | 2024-09-12 | 62.0 | 64.0 | 59.9 |
Doubao-Pro | Link | 2024-12-15 | 60.8 | 60.7 | 60.8 |
DeepSeek-R1 | Link | 2025-01-22 | 60.3 | 61.2 | 59.3 |
DeepSeek-V3 | Link | 2024-12-26 | 58.1 | 59.6 | 56.6 |
Claude3.5-sonnet | Link | 2024-06-20 | 57.4 | 57.5 | 57.3 |
MiniMax-Text-01 | Link | 2025-1-14 | 53.7 | 53.8 | 53.6 |
Qwen2.5-72B | Link | 2024-09-19 | 52.9 | 53.7 | 52.0 |
GPT-4o | Link | 2024-11-20 | 52.6 | 53.6 | 51.6 |
GLM-Zero-Preview | Link | 2024-12-31 | 51.1 | 53.6 | 48.6 |
Qwen2.5-32B | Link | 2024-09-19 | 50.4 | 50.8 | 49.9 |
Qwen2.5-7B | Link | 2024-09-19 | 44.1 | 43.6 | 44.5 |
Llama-3.1-8B | Link | 2024-07-23 | 24.9 | 26.1 | 23.6 |
The values in the table represent the Top-1 accuracy, in %