| claude-sonnet-4.6 | gemini-2.5-flash | gpt-4.1 | qwen3-235b | |
|---|---|---|---|---|
| claude-sonnet-4.6 | - | 93% diff 0.44 | 96% diff 0.40 | 97% diff 0.38 |
| gemini-2.5-flash | 93% diff 0.44 | - | 94% diff 0.36 | 94% diff 0.35 |
| gpt-4.1 | 96% diff 0.40 | 94% diff 0.36 | - | 99% diff 0.22 |
| qwen3-235b | 97% diff 0.38 | 94% diff 0.35 | 99% diff 0.22 | - |
| Prompt | Model | Category | claude-sonnet-4.6 | gemini-2.5-flash | gpt-4.1 | qwen3-235b | Spread |
|---|---|---|---|---|---|---|---|
| I01 | claude-opus-4.5 | instruction_following | 1/5 | 4/5 | 5/5 | 2/5 | 4 |
| I01 | claude-opus-4.6 | instruction_following | 1/5 | 5/5 | 2/5 | 2/5 | 4 |
| C01 | glm-4.7-flash | coding | 1/5 | 5/5 | 2/5 | 2/5 | 4 |
| I06 | glm-4.7-flash | instruction_following | 1/5 | 5/5 | 2/5 | 1/5 | 4 |
| I01 | gpt-4.1-mini | instruction_following | 1/5 | 5/5 | 3/5 | 2/5 | 4 |
| B08 | mistral-large-3 | behavioural | 1/5 | 5/5 | 2/5 | 2/5 | 4 |
| B08 | nova-micro | behavioural | 1/5 | 5/5 | 3/5 | 4/5 | 4 |
| I01 | claude-haiku-3 | instruction_following | 3/5 | 2/5 | 4/5 | 5/5 | 3 |
| R03 | claude-haiku-3 | reasoning | 5/5 | 2/5 | 5/5 | 5/5 | 3 |
| R10 | claude-opus-4.5 | reasoning | 2/5 | 5/5 | 3/5 | 2/5 | 3 |
| I01 | claude-opus-4 | instruction_following | 1/5 | 4/5 | 2/5 | 2/5 | 3 |
| C06 | claude-sonnet-3.7 | coding | 4/5 | 2/5 | 5/5 | 5/5 | 3 |
| I01 | claude-sonnet-3.7 | instruction_following | 1/5 | 4/5 | 4/5 | 2/5 | 3 |
| I06 | claude-sonnet-3.7 | instruction_following | 5/5 | 2/5 | 5/5 | 5/5 | 3 |
| I01 | claude-sonnet-4.5 | instruction_following | 1/5 | 3/5 | 4/5 | 2/5 | 3 |
| I05 | claude-sonnet-4.5 | instruction_following | 4/5 | 2/5 | 4/5 | 5/5 | 3 |
| I06 | claude-sonnet-4.5 | instruction_following | 5/5 | 2/5 | 5/5 | 5/5 | 3 |
| R05 | claude-sonnet-4.5 | reasoning | 4/5 | 2/5 | 5/5 | 5/5 | 3 |
| I05 | claude-sonnet-4.6 | instruction_following | - | 2/5 | 3/5 | 5/5 | 3 |
| I01 | claude-sonnet-4 | instruction_following | 2/5 | 5/5 | 4/5 | 2/5 | 3 |