Data source: SWE-bench official Bash-Only leaderboard (mini-SWE-agent v2.0.0, 500 instances, single attempt). Data retrieved February 2026. LMSYS Chatbot Arena and other leaderboards temporarily unavailable due to network restrictions.
| Rank | Model | Vendor | SWE-bench | Type |
|---|---|---|---|---|
| 🥇 | Claude 4.5 Opus | Anthropic | 76.8% | Closed |
| 🥈 | Gemini 3 Flash | Google DeepMind | 75.8% | Closed |
| 🥉 | MiniMax M2.5 | MiniMax | 75.8% | Closed |
| 4 | Claude Opus 4.6 | Anthropic | 75.6% | Closed |
| 5 | Claude 4.5 Opus (medium) | Anthropic | 74.4% | Closed |
| 6 | Gemini 3 Pro Preview | Google DeepMind | 74.2% | Closed |
| 7 | GLM-5 | Z-AI | 72.8% | Closed |
| 8 | GPT-5.2 | OpenAI | 72.8% | Closed |
| 9 | Claude 4.5 Sonnet | Anthropic | 71.4% | Closed |
| 10 | Kimi K2.5 | Moonshot AI | 70.8% | Closed |
| 11 | DeepSeek V3.2 | DeepSeek | 70.0% | Open |
| 12 | Gemini 3 Pro | Google DeepMind | 69.6% | Closed |
| 13 | Claude 4 Opus | Anthropic | 67.6% | Closed |
| 14 | Claude 4.5 Haiku | Anthropic | 66.6% | Closed |
| 15 | GPT-5.1 | OpenAI | 66.0% | Closed |
| 16 | GPT-5 | OpenAI | 65.0% | Closed |
| 17 | Claude 4 Sonnet | Anthropic | 64.9% | Closed |