| Run Label | Run ID | Scenarios | Wins | Unstable | Total Turns | Avg Score | Started |
|---|---|---|---|---|---|---|---|
| epoch_cohere_deepseek | 046b0253 | 1 | 0 | 0 | 1 | 1.00 | 05-30 16:25 |
| - | e7ece5c0 | 1 | 1 | 0 | 8 | 1.00 | 05-30 09:54 |
| FAIR_MATRIX_epoch_001_perturbation_standard_relay_chamber_5c_gemini_zai | 7974f123 | 8 | 208 | 9 | 2200 | 0.90 | 05-30 09:53 |
| fair_matrix_epoch_001_normal_signal_room_v1_anthropic_zai | e63d6813 | 1 | 6 | 0 | 70 | 0.94 | 05-30 09:40 |
| comparison_gemini_cohere | 8052ef50 | 5 | 5 | 0 | 57 | 0.87 | 05-29 20:21 |
| comparison_grok_zai | 2efc8fb6 | 5 | 5 | 0 | 41 | 0.94 | 05-29 20:17 |
| comparison_gpt_deepseek | eeaf3b28 | 5 | 5 | 0 | 46 | 0.93 | 05-29 20:11 |
| comparison_claude_mistral | 2c01bd7d | 5 | 4 | 1 | 51 | 0.86 | 05-29 20:05 |
| epoch_cohere_deepseek | 22d2628b | 1 | 0 | 0 | 3 | 1.00 | 05-29 19:54 |
| - | 4af94623 | 4 | 4 | 0 | 32 | 0.97 | 05-29 19:36 |
| - | 5e0f6aee | 1 | 0 | 0 | 0 | 0.00 | 05-29 19:22 |
| - | 0e05ca2c | 1 | 0 | 0 | 0 | 0.00 | 05-29 19:20 |
| - | 55f18d1a | 1 | 0 | 0 | 0 | 0.00 | 05-29 19:15 |
| - | 96c6e641 | 1 | 0 | 0 | 1 | 1.00 | 05-29 19:13 |
| - | e67d6c75 | 4 | 4 | 0 | 36 | 0.83 | 05-29 19:07 |
| - | f3901c2c | 4 | 4 | 0 | 24 | 1.00 | 05-29 18:59 |
| epoch_cohere_mistral | 22c07706 | 1 | 0 | 0 | 1 | 1.00 | 05-29 08:37 |
| epoch_cohere_mistral | fbbb6542 | 1 | 1 | 0 | 14 | 0.77 | 05-29 08:35 |
| epoch_cohere_mistral | 9f2a94bc | 1 | 1 | 0 | 10 | 0.74 | 05-29 08:34 |
| epoch_cohere_mistral | b4dfb05e | 1 | 1 | 0 | 10 | 0.74 | 05-29 08:32 |
| epoch_cohere_mistral | 4b5f1575 | 1 | 1 | 0 | 20 | 0.48 | 05-29 08:30 |
| epoch_cohere_mistral | 8a4dc955 | 1 | 1 | 0 | 10 | 0.74 | 05-29 08:28 |
| epoch_cohere_mistral | a8a58996 | 1 | 1 | 0 | 10 | 0.74 | 05-29 08:27 |
| epoch_cohere_mistral | 05b9ba96 | 1 | 1 | 0 | 10 | 0.81 | 05-29 08:26 |
| epoch_cohere_mistral | 54124a8a | 1 | 1 | 0 | 10 | 0.74 | 05-29 08:24 |
| epoch_cohere_deepseek | cbab34dd | 1 | 1 | 0 | 18 | 0.68 | 05-29 08:23 |
| epoch_grok_deepseek | 797d56e8 | 1 | 0 | 0 | 0 | 0.00 | 05-28 21:00 |
| epoch_gemini_deepseek | 081e643c | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:59 |
| epoch_gemini_deepseek | 92af76d1 | 1 | 1 | 0 | 16 | 0.70 | 05-28 20:58 |
| epoch_gemini_deepseek | d5bc2533 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:57 |
| epoch_gemini_deepseek | 036ccd08 | 1 | 1 | 0 | 16 | 0.66 | 05-28 20:55 |
| epoch_gemini_deepseek | b191889f | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:54 |
| epoch_gemini_deepseek | ca035fa3 | 1 | 1 | 0 | 20 | 0.72 | 05-28 20:52 |
| epoch_gemini_deepseek | 240d6201 | 1 | 1 | 0 | 16 | 0.68 | 05-28 20:50 |
| epoch_gemini_deepseek | 3b769242 | 1 | 0 | 1 | 24 | 0.53 | 05-28 20:48 |
| epoch_gemini_deepseek | eae08873 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:48 |
| epoch_gemini_deepseek | f2cc183d | 1 | 1 | 0 | 12 | 0.84 | 05-28 20:47 |
| epoch_gemini_deepseek | ad228094 | 1 | 1 | 0 | 14 | 0.76 | 05-28 20:45 |
| epoch_gemini_deepseek | d2f54c19 | 1 | 1 | 0 | 18 | 0.66 | 05-28 20:44 |
| epoch_gemini_deepseek | 019761c2 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:43 |
| epoch_gemini_deepseek | fb6f7db4 | 1 | 1 | 0 | 10 | 0.75 | 05-28 20:42 |
| epoch_gemini_deepseek | e133c8de | 1 | 1 | 0 | 13 | 0.77 | 05-28 20:41 |
| epoch_gemini_deepseek | 58ac31f1 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:40 |
| epoch_gemini_deepseek | cbef1a8c | 1 | 1 | 0 | 16 | 0.68 | 05-28 20:38 |
| epoch_gemini_deepseek | 2f7f1c18 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:38 |
| epoch_gemini_deepseek | 3eec062f | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:37 |
| epoch_gemini_deepseek | 825725ec | 1 | 1 | 0 | 12 | 0.73 | 05-28 20:36 |
| epoch_gemini_deepseek | 7129e914 | 1 | 1 | 0 | 14 | 0.70 | 05-28 20:34 |
| epoch_gemini_deepseek | 96093854 | 1 | 1 | 0 | 11 | 0.78 | 05-28 20:33 |
| epoch_gemini_deepseek | 198adf1b | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:32 |
| epoch_gemini_deepseek | 2e2716ba | 1 | 1 | 0 | 10 | 0.81 | 05-28 20:32 |
| epoch_gemini_deepseek | 1d61b8c2 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:31 |
| epoch_gemini_deepseek | 7eea2821 | 1 | 1 | 0 | 10 | 0.81 | 05-28 20:30 |
| epoch_gemini_deepseek | 813027b7 | 1 | 1 | 0 | 12 | 0.73 | 05-28 20:29 |
| epoch_gemini_deepseek | 7bef6cb4 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:28 |
| epoch_gemini_deepseek | dbcea488 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:27 |
| epoch_gemini_deepseek | 58016d69 | 1 | 1 | 0 | 13 | 0.77 | 05-28 20:26 |
| epoch_gemini_deepseek | e964000f | 1 | 1 | 0 | 16 | 0.68 | 05-28 20:24 |
| epoch_gemini_deepseek | fc7d23e3 | 1 | 1 | 0 | 10 | 0.81 | 05-28 20:23 |
| epoch_gemini_deepseek | e41a0ff9 | 1 | 1 | 0 | 10 | 0.81 | 05-28 20:23 |
| epoch_gemini_deepseek | fca74501 | 1 | 1 | 0 | 16 | 0.68 | 05-28 20:21 |
| epoch_gemini_deepseek | 6aa592d2 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:21 |
| epoch_deepseek_grok | 65d241f5 | 1 | 1 | 0 | 10 | 0.75 | 05-28 20:20 |
| epoch_deepseek_cohere | df1b0d33 | 1 | 1 | 0 | 22 | 0.76 | 05-28 20:15 |
| epoch_deepseek_cohere | f4e79c6b | 1 | 1 | 0 | 14 | 0.73 | 05-28 20:14 |
| epoch_deepseek_cohere | 6d958adb | 1 | 1 | 0 | 19 | 0.76 | 05-28 20:10 |
| epoch_deepseek_cohere | 8ee9f8fa | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:09 |
| epoch_deepseek_cohere | 2d2751d2 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:08 |
| epoch_deepseek_cohere | 5f98d552 | 1 | 1 | 0 | 14 | 0.80 | 05-28 20:07 |
| epoch_deepseek_cohere | d7cef8f6 | 1 | 1 | 0 | 12 | 0.78 | 05-28 20:06 |
| epoch_cohere_mistral | 63ab75b7 | 1 | 0 | 1 | 28 | 0.51 | 05-28 20:02 |
| epoch_cohere_mistral | 52e0fc5a | 1 | 1 | 0 | 16 | 0.84 | 05-28 19:59 |
| epoch_cohere_mistral | 7e1860de | 1 | 1 | 0 | 14 | 0.77 | 05-28 19:57 |
| epoch_cohere_mistral | ebe14f06 | 1 | 1 | 0 | 14 | 0.82 | 05-28 19:54 |
| epoch_cohere_mistral | 0fe0b1c8 | 1 | 1 | 0 | 14 | 0.57 | 05-28 19:52 |
| epoch_cohere_mistral | 18c8ea4e | 1 | 1 | 0 | 14 | 0.82 | 05-28 19:50 |
| epoch_cohere_mistral | 1d4644d4 | 1 | 1 | 0 | 10 | 0.75 | 05-28 19:49 |
| epoch_cohere_mistral | 2c3f60c7 | 1 | 1 | 0 | 20 | 0.55 | 05-28 19:46 |
| epoch_cohere_mistral | 674cb1a7 | 1 | 1 | 0 | 14 | 0.57 | 05-28 19:44 |
| epoch_cohere_mistral | 9cb1610a | 1 | 1 | 0 | 14 | 0.77 | 05-28 19:42 |
| epoch_cohere_mistral | 52b966d8 | 1 | 1 | 0 | 18 | 0.50 | 05-28 19:40 |
| epoch_cohere_mistral | eae7e83c | 1 | 1 | 0 | 16 | 0.57 | 05-28 19:38 |
| epoch_cohere_mistral | c0b2f935 | 1 | 1 | 0 | 14 | 0.77 | 05-28 19:37 |
| epoch_cohere_mistral | 5470370d | 1 | 0 | 1 | 27 | 0.33 | 05-28 19:33 |
| epoch_cohere_mistral | e7285cd9 | 1 | 1 | 0 | 10 | 0.75 | 05-28 19:32 |
| epoch_cohere_mistral | 0bae673e | 1 | 0 | 1 | 17 | 0.44 | 05-28 19:29 |
| epoch_cohere_mistral | 22a39a5e | 1 | 1 | 0 | 14 | 0.81 | 05-28 19:28 |
| epoch_cohere_mistral | afaa514d | 1 | 1 | 0 | 10 | 0.75 | 05-28 19:26 |
| epoch_cohere_mistral | e5a11a9c | 1 | 1 | 0 | 10 | 0.75 | 05-28 19:25 |
| epoch_cohere_mistral | 65e32232 | 1 | 1 | 0 | 16 | 0.52 | 05-28 19:23 |
| epoch_cohere_mistral | 507eb857 | 1 | 1 | 0 | 10 | 0.75 | 05-28 19:21 |
| epoch_cohere_mistral | 3ceea561 | 1 | 1 | 0 | 12 | 0.72 | 05-28 19:19 |
| epoch_cohere_mistral | 53c4b995 | 1 | 1 | 0 | 14 | 0.77 | 05-28 19:18 |
| epoch_cohere_mistral | b26e96cf | 1 | 1 | 0 | 18 | 0.81 | 05-28 19:15 |
| epoch_cohere_mistral | 37ab60f8 | 1 | 1 | 0 | 10 | 0.75 | 05-28 19:14 |
| epoch_cohere_mistral | d6aee988 | 1 | 1 | 0 | 18 | 0.48 | 05-28 19:12 |
| epoch_cohere_mistral | 07b5c403 | 1 | 1 | 0 | 14 | 0.82 | 05-28 19:10 |
| epoch_cohere_mistral | 85532b11 | 1 | 1 | 0 | 20 | 0.68 | 05-28 19:07 |
| epoch_cohere_mistral | aaa17628 | 1 | 1 | 0 | 19 | 0.81 | 05-28 19:03 |
| epoch_cohere_mistral | 92858264 | 1 | 1 | 0 | 28 | 0.40 | 05-28 18:58 |
| epoch_cohere_mistral | e8c5442d | 1 | 1 | 0 | 13 | 0.77 | 05-28 18:54 |
| epoch_cohere_mistral | 3ca45fa3 | 1 | 1 | 0 | 14 | 0.81 | 05-28 18:53 |
| epoch_cohere_mistral | 362e82cf | 1 | 1 | 0 | 10 | 0.74 | 05-28 18:52 |
| epoch_cohere_deepseek | 8b4cb5b2 | 1 | 1 | 0 | 12 | 0.73 | 05-28 18:51 |
| epoch_mistral_grok | 5bf0da81 | 1 | 1 | 0 | 14 | 0.79 | 05-28 18:34 |
| epoch_mistral_deepseek | 4a5de436 | 1 | 1 | 0 | 16 | 0.84 | 05-28 18:32 |
| epoch_mistral_deepseek | 4d0f2123 | 1 | 1 | 0 | 10 | 0.94 | 05-28 18:30 |
| epoch_mistral_cohere | d321ea10 | 1 | 1 | 0 | 12 | 0.94 | 05-28 18:28 |
| epoch_mistral_cohere | d8d2cd81 | 1 | 1 | 0 | 12 | 0.94 | 05-28 18:27 |
| epoch_mistral_cohere | d30a97be | 1 | 1 | 0 | 12 | 0.94 | 05-28 18:25 |
| epoch_mistral_cohere | 50e277b2 | 1 | 1 | 0 | 12 | 0.94 | 05-28 18:23 |
| epoch_mistral_cohere | 598ac173 | 1 | 1 | 0 | 12 | 0.94 | 05-28 18:21 |
| epoch_grok_gemini | 3ef5f0a0 | 1 | 1 | 0 | 8 | 0.91 | 05-28 18:20 |
| epoch_grok_gemini | 0db16168 | 1 | 1 | 0 | 14 | 0.82 | 05-28 18:20 |
| epoch_gemini_deepseek | b632c521 | 1 | 1 | 0 | 10 | 1.00 | 05-28 18:19 |
| epoch_gemini_deepseek | 39a24cab | 1 | 1 | 0 | 10 | 0.88 | 05-28 18:18 |
| epoch_gemini_deepseek | dc116a62 | 1 | 1 | 0 | 10 | 0.94 | 05-28 18:17 |
| epoch_deepseek_cohere | 15fad396 | 1 | 1 | 0 | 12 | 0.95 | 05-28 18:16 |
| epoch_deepseek_cohere | a197fdb5 | 1 | 0 | 1 | 31 | 0.91 | 05-28 18:11 |
| epoch_cohere_mistral | 2159c0a8 | 1 | 1 | 0 | 12 | 0.85 | 05-28 18:10 |
| epoch_cohere_mistral | 7a15ae13 | 1 | 1 | 0 | 14 | 0.86 | 05-28 18:08 |
| epoch_cohere_mistral | eaef1895 | 1 | 1 | 0 | 12 | 0.85 | 05-28 18:06 |
| epoch_cohere_mistral | d8c2fd37 | 1 | 1 | 0 | 12 | 0.85 | 05-28 18:04 |
| epoch_mistral_grok | a62cc66b | 1 | 1 | 0 | 12 | 0.84 | 05-28 17:59 |
| epoch_mistral_deepseek | c4c4c5c0 | 1 | 1 | 0 | 10 | 0.88 | 05-28 17:58 |
| epoch_mistral_deepseek | 781b189c | 1 | 0 | 1 | 28 | 0.72 | 05-28 17:54 |
| epoch_mistral_cohere | 9db80718 | 1 | 1 | 0 | 16 | 0.86 | 05-28 17:52 |
| epoch_mistral_cohere | c7bf358e | 1 | 1 | 0 | 12 | 0.94 | 05-28 17:50 |
| epoch_mistral_cohere | 7938228e | 1 | 1 | 0 | 12 | 0.94 | 05-28 17:48 |
| epoch_mistral_cohere | 632c504b | 1 | 1 | 0 | 12 | 0.94 | 05-28 17:47 |
| epoch_mistral_cohere | bce7f5e9 | 1 | 1 | 0 | 12 | 0.94 | 05-28 17:45 |
| epoch_grok_gemini | a62b43c0 | 1 | 1 | 0 | 14 | 0.72 | 05-28 17:44 |
| epoch_grok_gemini | 625ceac0 | 1 | 1 | 0 | 8 | 0.91 | 05-28 17:44 |
| epoch_gemini_deepseek | 8cb33f8d | 1 | 1 | 0 | 10 | 0.88 | 05-28 17:43 |
| epoch_gemini_deepseek | 9d67f187 | 1 | 1 | 0 | 10 | 0.87 | 05-28 17:42 |
| epoch_gemini_deepseek | d23f99a8 | 1 | 1 | 0 | 10 | 0.87 | 05-28 17:42 |
| epoch_deepseek_cohere | a0fecad8 | 1 | 1 | 0 | 14 | 0.90 | 05-28 17:40 |
| epoch_deepseek_cohere | d8f47e9a | 1 | 1 | 0 | 16 | 0.91 | 05-28 17:39 |
| epoch_cohere_mistral | 93b50a39 | 1 | 1 | 0 | 14 | 0.78 | 05-28 17:37 |
| epoch_cohere_mistral | 9205fede | 1 | 1 | 0 | 14 | 0.86 | 05-28 17:35 |
| epoch_cohere_mistral | dd52d925 | 1 | 1 | 0 | 14 | 0.92 | 05-28 17:33 |
| epoch_cohere_mistral | db759cf0 | 1 | 0 | 1 | 32 | 0.70 | 05-28 17:29 |
| - | ca752992 | 1 | 1 | 0 | 10 | 0.88 | 05-28 17:24 |
| epoch_cohere_mistral_001 | 95c9577e | 1 | 0 | 0 | 1 | 1.00 | 05-28 16:50 |
| epoch_deepseek_claude_003 | 2fd76d96 | 5 | 5 | 0 | 47 | 0.85 | 05-28 16:45 |
| epoch_deepseek_claude_002 | 3c4eab20 | 5 | 5 | 0 | 47 | 0.87 | 05-28 16:40 |
| epoch_deepseek_claude_001 | a171b6b3 | 5 | 5 | 0 | 51 | 0.82 | 05-28 16:35 |
| epoch_gpt_gemini_003 | 836185ac | 4 | 3 | 1 | 49 | 0.78 | 05-28 16:32 |
| epoch_gpt_gemini_002 | e575559a | 5 | 5 | 0 | 51 | 0.89 | 05-28 16:29 |
| epoch_gpt_gemini_001 | d2d25db4 | 5 | 5 | 0 | 51 | 0.88 | 05-28 16:26 |
| epoch_claude_grok_003 | a7b3a7bd | 5 | 5 | 0 | 53 | 0.87 | 05-28 16:21 |
| epoch_claude_grok_002 | a118c9b4 | 5 | 5 | 0 | 53 | 0.92 | 05-28 16:17 |
| epoch_claude_grok_001 | 38ebf1f8 | 5 | 5 | 0 | 43 | 0.86 | 05-28 16:14 |
| epoch_claude_gpt_003 | e8877710 | 5 | 5 | 0 | 39 | 0.97 | 05-28 16:11 |
| epoch_claude_gpt_002 | 8ceafb14 | 5 | 5 | 0 | 43 | 0.91 | 05-28 16:08 |
| epoch_claude_gpt_001 | 630a49ba | 5 | 5 | 0 | 37 | 0.93 | 05-28 16:05 |
| comparison_cohere_mistral_003 | b2384ca1 | 5 | 4 | 1 | 53 | 0.85 | 05-27 21:55 |
| comparison_cohere_mistral_002 | af299df9 | 5 | 4 | 1 | 64 | 0.81 | 05-27 21:44 |
| comparison_cohere_mistral_001 | 4ef1dbd1 | 5 | 3 | 2 | 69 | 0.76 | 05-27 21:32 |
| epoch_cohere_mistral_001 | cf1de6b6 | 5 | 4 | 0 | 62 | 0.81 | 05-27 21:17 |
| epoch_deepseek_claude_003 | 8244a1bc | 5 | 5 | 0 | 57 | 0.80 | 05-27 21:12 |
| epoch_deepseek_claude_002 | de618f05 | 3 | 2 | 1 | 51 | 0.79 | 05-27 21:06 |
| epoch_deepseek_claude_001 | 25c996e8 | 3 | 2 | 1 | 59 | 0.73 | 05-27 20:58 |
| epoch_gpt_gemini_003 | c3a6b228 | 4 | 3 | 1 | 43 | 0.81 | 05-27 20:56 |
| epoch_gpt_gemini_002 | b008409b | 5 | 5 | 0 | 47 | 0.92 | 05-27 20:53 |
| epoch_gpt_gemini_001 | a016322a | 5 | 5 | 0 | 45 | 0.91 | 05-27 20:51 |
| epoch_claude_grok_003 | b57d88b2 | 5 | 5 | 0 | 47 | 0.92 | 05-27 20:48 |
| epoch_claude_grok_002 | 50213929 | 4 | 3 | 1 | 51 | 0.83 | 05-27 20:44 |
| epoch_claude_grok_001 | ffac0184 | 5 | 5 | 0 | 53 | 0.95 | 05-27 20:41 |
| epoch_claude_gpt_003 | 62cc15b5 | 5 | 5 | 0 | 63 | 0.91 | 05-27 20:36 |
| epoch_claude_gpt_002 | bf9b97fb | 5 | 5 | 0 | 45 | 0.94 | 05-27 20:33 |
| epoch_claude_gpt_001 | 4b8fd274 | 5 | 5 | 0 | 43 | 0.96 | 05-27 20:29 |
| Model | Agent Sessions | Avg Score | Scratch Use % | Avg Incomplete Msgs | Avg Premature Acts | Avg Impossible Acts | Total Instab Warnings | Total Recoveries |
|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.8 | 62 | 0.93 | 2% | 0.2 | 0 | 0 | 0 | 0 |
| Cohere Cmd-A | 54 | 0.81 | 6% | 0.3 | 0 | 0 | 19 | 3 |
| DeepSeek V4 Pro | 54 | 0.9 | 0% | 0.2 | 0 | 0 | 1 | 0 |
| GLM 5.1 | 53 | 0.94 | 6% | 0.2 | 0 | 0 | 0 | 0 |
| GPT-5.5 | 57 | 0.92 | 0% | 0.2 | 0 | 0 | 0 | 0 |
| Gemini 3.1 Pro | 56 | 0.94 | 0% | 0.1 | 0 | 0 | 0 | 0 |
| Grok 4.3 | 57 | 0.92 | 0% | 0.2 | 0 | 0 | 0 | 0 |
| Mistral Lg | 54 | 0.86 | 13% | 1.1 | 0 | 0 | 21 | 4 |
// scratch use = externalised reasoning · premature = attempted action before sufficient knowledge · impossible = called tool on unavailable object · run_type = control
| Model | Agent Sessions | Avg Score | Scratch Use % | Avg Incomplete Msgs | Avg Premature Acts | Avg Impossible Acts | Total Instab Warnings | Total Recoveries |
|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.8 | 5 | 0.88 | 0% | 0.4 | 0 | 0 | 0 | 0 |
| Claude S4.6 | 85 | 0.87 | 46% | 0.9 | 0 | 0.1 | 12 | 1 |
| Cohere Cmd-A | 101 | 0.76 | 14% | 1.5 | 0 | 0 | 26 | 4 |
| DeepSeek V4 | 89 | 0.86 | 11% | 0.4 | 0.1 | 0 | 6 | 0 |
| DeepSeek V4 Pro | 6 | 0.97 | 0% | 0.0 | 0 | 0 | 0 | 0 |
| GLM 5.1 | 11 | 0.99 | 0% | 0.0 | 0 | 0 | 0 | 0 |
| GPT-5.5 | 67 | 0.93 | 0% | 0.3 | 0.1 | 0 | 0 | 0 |
| Gemini 3.1 | 81 | 0.73 | 14% | 1.3 | 0 | 0 | 6 | 1 |
| Gemini 3.1 Pro | 5 | 0.88 | 0% | 0.4 | 0 | 0 | 0 | 0 |
| Grok 4.3 | 46 | 0.92 | 0% | 0.2 | 0 | 0 | 3 | 0 |
| Mistral Lg | 92 | 0.81 | 14% | 0.3 | 0 | 0.1 | 30 | 4 |
// scratch use = externalised reasoning · premature = attempted action before sufficient knowledge · impossible = called tool on unavailable object · run_type = exploratory
| Scenario | Models | Outcome | Turns | Fired | Detection | Disclosed | Repair By | Repair Objects | Complete | Non-repair |
|---|---|---|---|---|---|---|---|---|---|---|
| S5c — Power Outage | A:Gemini 3.1 Pro / B:GLM 5.1 | unknown | 9 | yes@4 | B@5 (+1) | - | B | cipher_display | - | - |
| S5c — Power Outage | A:Gemini 3.1 Pro / B:Mistral Lg | unstable | 48 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Gemini 3.1 Pro / B:Cohere Cmd-A | win | 16 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Gemini 3.1 Pro / B:DeepSeek V4 Pro | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Grok 4.3 / B:GLM 5.1 | win | 12 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Grok 4.3 / B:Mistral Lg | win | 10 | yes@4 | A@5 (+1) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Grok 4.3 / B:Cohere Cmd-A | win | 16 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Grok 4.3 / B:DeepSeek V4 Pro | win | 12 | yes@4 | A@5 (+1) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Grok 4.3 / B:Gemini 3.1 Pro | win | 10 | yes@4 | B@4 (+0) | - | B | cipher_display, signal_console | yes | - |
| S5c — Power Outage | A:GPT-5.5 / B:GLM 5.1 | win | 14 | yes@4 | A@5 (+1) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:GPT-5.5 / B:Mistral Lg | win | 10 | yes@4 | A@5 (+1) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:GPT-5.5 / B:Cohere Cmd-A | win | 16 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:GPT-5.5 / B:DeepSeek V4 Pro | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:GPT-5.5 / B:Gemini 3.1 Pro | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:GPT-5.5 / B:Grok 4.3 | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:GLM 5.1 | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:Mistral Lg | win | 10 | yes@4 | A@5 (+1) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:Cohere Cmd-A | win | 16 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:DeepSeek V4 Pro | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:Gemini 3.1 Pro | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:Grok 4.3 | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5c — Power Outage | A:Claude Opus 4.8 / B:GPT-5.5 | win | 10 | yes@4 | B@4 (+0) | - | A | signal_console, cipher_display | yes | - |
| S5b — Faulty Relay | A:Mistral Lg / B:GLM 5.1 | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Cohere Cmd-A / B:GLM 5.1 | win | 18 | yes@3 | A@13 (+10) | - | A | reset_switch, signal_console | yes | - |
| S5b — Faulty Relay | A:Cohere Cmd-A / B:Mistral Lg | win | 10 | yes@3 | - | - | A | signal_console | - | - |
| S5b — Faulty Relay | A:DeepSeek V4 Pro / B:GLM 5.1 | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:DeepSeek V4 Pro / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:DeepSeek V4 Pro / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Gemini 3.1 Pro / B:GLM 5.1 | win | 8 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Gemini 3.1 Pro / B:Mistral Lg | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Gemini 3.1 Pro / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Gemini 3.1 Pro / B:DeepSeek V4 Pro | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Grok 4.3 / B:GLM 5.1 | win | 12 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Grok 4.3 / B:Mistral Lg | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Grok 4.3 / B:Cohere Cmd-A | win | 16 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Grok 4.3 / B:DeepSeek V4 Pro | win | 8 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Grok 4.3 / B:Gemini 3.1 Pro | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:GLM 5.1 | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:DeepSeek V4 Pro | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:Gemini 3.1 Pro | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:GLM 5.1 | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:Cohere Cmd-A | win | 8 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:DeepSeek V4 Pro | win | 8 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:Gemini 3.1 Pro | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:Claude Opus 4.8 / B:GPT-5.5 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Mistral Lg / B:GLM 5.1 | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Cohere Cmd-A / B:GLM 5.1 | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Cohere Cmd-A / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:DeepSeek V4 Pro / B:GLM 5.1 | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:DeepSeek V4 Pro / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:DeepSeek V4 Pro / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Gemini 3.1 Pro / B:GLM 5.1 | win | 14 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Gemini 3.1 Pro / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Gemini 3.1 Pro / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Gemini 3.1 Pro / B:DeepSeek V4 Pro | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Grok 4.3 / B:GLM 5.1 | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Grok 4.3 / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Grok 4.3 / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Grok 4.3 / B:DeepSeek V4 Pro | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Grok 4.3 / B:Gemini 3.1 Pro | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:GLM 5.1 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:Cohere Cmd-A | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:DeepSeek V4 Pro | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:Gemini 3.1 Pro | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:GLM 5.1 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:Mistral Lg | win | 10 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:Cohere Cmd-A | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:DeepSeek V4 Pro | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:Gemini 3.1 Pro | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Claude Opus 4.8 / B:GPT-5.5 | win | 6 | - | - | - | - | - | - | - |
| S5d — Dusty Logbook | A:GLM 5.1 / B:Gemini 3.1 | win | 6 | - | - | - | - | - | - | - |
| S5c — Power Outage | A:GLM 5.1 / B:Gemini 3.1 | win | 12 | yes@4 | B@4 (+0) | - | B | cipher_display, signal_console | yes | - |
| S5b — Faulty Relay | A:GLM 5.1 / B:Gemini 3.1 | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GLM 5.1 / B:Gemini 3.1 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | Llama 3.3 70B | unknown | 0 | - | - | - | - | - | - | - |
| S5a — Do Not Press | Llama 3.3 70B | unknown | 0 | - | - | - | - | - | - | - |
| S5a — Do Not Press | Llama 3.3 70B | unknown | 0 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:Llama 3.3 70B / B:Mistral Lg | unknown | 1 | - | - | - | - | - | - | - |
| S5d — Dusty Logbook | A:DeepSeek V4 / B:Gemini 3.1 | win | 8 | - | - | - | - | - | - | - |
| S5c — Power Outage | A:DeepSeek V4 / B:Gemini 3.1 | win | 8 | yes@4 | B@4 (+0) | - | B | cipher_display, signal_console | yes | - |
| S5b — Faulty Relay | A:DeepSeek V4 / B:Gemini 3.1 | win | 8 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:DeepSeek V4 / B:Gemini 3.1 | win | 12 | - | - | - | - | - | - | - |
| S5d — Dusty Logbook | A:GPT-5.5 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5c — Power Outage | A:GPT-5.5 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5b — Faulty Relay | A:GPT-5.5 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| S5a — Do Not Press | A:GPT-5.5 / B:Grok 4.3 | win | 6 | - | - | - | - | - | - | - |
| Model | Agent Sessions | Cause Disclosure Rate | Repair Discovery Rate | Avg Repair Lag |
|---|---|---|---|---|
| Claude Opus 4.8 | 21 | 0% | 33% | 1.0 |
| Cohere Cmd-A | 18 | 0% | 28% | 3.0 |
| DeepSeek V4 | 4 | 0% | 25% | 0.0 |
| DeepSeek V4 Pro | 18 | 0% | 22% | 1.0 |
| GLM 5.1 | 22 | 0% | 23% | 2.2 |
| GPT-5.5 | 25 | 0% | 28% | 1.0 |
| Gemini 3.1 | 8 | 0% | 25% | 0.0 |
| Gemini 3.1 Pro | 21 | 0% | 29% | 1.3 |
| Grok 4.3 | 25 | 0% | 28% | 2.0 |
| Llama 3.3 70B | 7 | 0% | 0% | - |
| Mistral Lg | 19 | 0% | 21% | 1.6 |
// Power outage detection lag: scratch users 0.0 turns; non-scratch users 0.3 turns.
| Scenario | Agent A | Agent B | Outcome | Turns | A Score | B Score | Scratch A/B | Instab | Repairs | Started |
|---|---|---|---|---|---|---|---|---|---|---|
| S3 — Combined | Cohere Cmd-A | DeepSeek V4 Pro | active | 1 | 1.00 | 0.00 | · / · | · | · | 05-30 16:25 |
| S5c — Power Outage | Gemini 3.1 Pro | GLM 5.1 | active | 9 | 0.00 | 0.84 | · / · | · | · | 05-30 15:15 |
| S5c — Power Outage | Gemini 3.1 Pro | Mistral Lg | unstable | 48 | 1.00 | 0.72 | · / · | 9⚠ | 3↺ | 05-30 13:45 |
| S5c — Power Outage | Gemini 3.1 Pro | Cohere Cmd-A | win | 16 | 0.93 | 0.93 | · / · | · | · | 05-30 13:44 |
| S5c — Power Outage | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:43 |
| S5c — Power Outage | Grok 4.3 | GLM 5.1 | win | 12 | 1.00 | 0.97 | · / ✓ | · | · | 05-30 13:40 |
| S5c — Power Outage | Grok 4.3 | Mistral Lg | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 13:39 |
| S5c — Power Outage | Grok 4.3 | Cohere Cmd-A | win | 16 | 0.93 | 0.93 | · / · | · | · | 05-30 13:38 |
| S5c — Power Outage | Grok 4.3 | DeepSeek V4 Pro | win | 12 | 1.00 | 1.00 | · / · | · | · | 05-30 13:36 |
| S5c — Power Outage | Grok 4.3 | Gemini 3.1 Pro | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:35 |
| S5c — Power Outage | GPT-5.5 | GLM 5.1 | win | 14 | 1.00 | 1.00 | · / · | · | · | 05-30 13:34 |
| S5c — Power Outage | GPT-5.5 | Mistral Lg | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 13:32 |
| S5c — Power Outage | GPT-5.5 | Cohere Cmd-A | win | 16 | 0.85 | 0.93 | · / · | · | · | 05-30 13:31 |
| S5c — Power Outage | GPT-5.5 | DeepSeek V4 Pro | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:30 |
| S5c — Power Outage | GPT-5.5 | Gemini 3.1 Pro | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:29 |
| S5c — Power Outage | GPT-5.5 | Grok 4.3 | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:28 |
| S5c — Power Outage | Claude Opus 4.8 | GLM 5.1 | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:27 |
| S5c — Power Outage | Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 13:26 |
| S5c — Power Outage | Claude Opus 4.8 | Cohere Cmd-A | win | 16 | 0.93 | 0.93 | · / · | · | · | 05-30 13:25 |
| S5c — Power Outage | Claude Opus 4.8 | DeepSeek V4 Pro | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 13:23 |
| S5c — Power Outage | Claude Opus 4.8 | Gemini 3.1 Pro | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 13:23 |
| S5c — Power Outage | Claude Opus 4.8 | Grok 4.3 | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 13:22 |
| S5c — Power Outage | Claude Opus 4.8 | GPT-5.5 | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 13:21 |
| S5b — Faulty Relay | Mistral Lg | GLM 5.1 | win | 10 | 0.96 | 0.86 | ✓ / · | · | · | 05-30 13:20 |
| S5b — Faulty Relay | Cohere Cmd-A | GLM 5.1 | win | 18 | 0.67 | 0.93 | · / · | · | · | 05-30 13:18 |
| S5b — Faulty Relay | Cohere Cmd-A | Mistral Lg | win | 10 | 0.64 | 1.00 | · / · | · | · | 05-30 13:17 |
| S5b — Faulty Relay | DeepSeek V4 Pro | GLM 5.1 | win | 10 | 1.00 | 0.74 | · / · | · | · | 05-30 13:16 |
| S5b — Faulty Relay | DeepSeek V4 Pro | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 13:15 |
| S5b — Faulty Relay | DeepSeek V4 Pro | Cohere Cmd-A | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 13:14 |
| S5b — Faulty Relay | Gemini 3.1 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 13:12 |
| S5b — Faulty Relay | Gemini 3.1 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 13:12 |
| S5b — Faulty Relay | Gemini 3.1 Pro | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · / · | · | · | 05-30 13:11 |
| S5b — Faulty Relay | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 10 | 1.00 | 0.88 | · / · | · | · | 05-30 13:10 |
| S5b — Faulty Relay | Grok 4.3 | GLM 5.1 | win | 12 | 1.00 | 0.90 | · / · | · | · | 05-30 13:09 |
| S5b — Faulty Relay | Grok 4.3 | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 13:08 |
| S5b — Faulty Relay | Grok 4.3 | Cohere Cmd-A | win | 16 | 1.00 | 0.93 | · / · | · | · | 05-30 13:07 |
| S5b — Faulty Relay | Grok 4.3 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 13:06 |
| S5b — Faulty Relay | Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 13:06 |
| S5b — Faulty Relay | GPT-5.5 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 13:05 |
| S5b — Faulty Relay | GPT-5.5 | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 13:04 |
| S5b — Faulty Relay | GPT-5.5 | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · / · | · | · | 05-30 13:03 |
| S5b — Faulty Relay | GPT-5.5 | DeepSeek V4 Pro | win | 6 | 1.00 | 0.80 | · / · | · | · | 05-30 13:03 |
| S5b — Faulty Relay | GPT-5.5 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 13:02 |
| S5b — Faulty Relay | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 13:02 |
| S5b — Faulty Relay | Claude Opus 4.8 | GLM 5.1 | win | 10 | 1.00 | 0.88 | · / · | · | · | 05-30 13:00 |
| S5b — Faulty Relay | Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 12:59 |
| S5b — Faulty Relay | Claude Opus 4.8 | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 12:58 |
| S5b — Faulty Relay | Claude Opus 4.8 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.82 | · / · | · | · | 05-30 12:57 |
| S5b — Faulty Relay | Claude Opus 4.8 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:57 |
| S5b — Faulty Relay | Claude Opus 4.8 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:56 |
| S5b — Faulty Relay | Claude Opus 4.8 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:56 |
| S5a — Do Not Press | Mistral Lg | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 12:55 |
| S5a — Do Not Press | Cohere Cmd-A | GLM 5.1 | win | 10 | 0.64 | 0.86 | · / · | · | · | 05-30 12:54 |
| S5a — Do Not Press | Cohere Cmd-A | Mistral Lg | win | 10 | 0.76 | 0.72 | · / · | · | · | 05-30 12:53 |
| S5a — Do Not Press | DeepSeek V4 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 12:51 |
| S5a — Do Not Press | DeepSeek V4 Pro | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 12:50 |
| S5a — Do Not Press | DeepSeek V4 Pro | Cohere Cmd-A | win | 10 | 0.76 | 0.88 | · / · | · | · | 05-30 12:49 |
| S5a — Do Not Press | Gemini 3.1 Pro | GLM 5.1 | win | 14 | 1.00 | 0.91 | · / · | · | · | 05-30 12:47 |
| S5a — Do Not Press | Gemini 3.1 Pro | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 12:45 |
| S5a — Do Not Press | Gemini 3.1 Pro | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · / · | · | · | 05-30 12:45 |
| S5a — Do Not Press | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:44 |
| S5a — Do Not Press | Grok 4.3 | GLM 5.1 | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 12:43 |
| S5a — Do Not Press | Grok 4.3 | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 12:41 |
| S5a — Do Not Press | Grok 4.3 | Cohere Cmd-A | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 12:40 |
| S5a — Do Not Press | Grok 4.3 | DeepSeek V4 Pro | win | 6 | 1.00 | 0.77 | · / · | · | · | 05-30 12:39 |
| S5a — Do Not Press | Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:38 |
| S5a — Do Not Press | GPT-5.5 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:38 |
| S5a — Do Not Press | GPT-5.5 | Mistral Lg | win | 10 | 1.00 | 0.72 | · / · | · | · | 05-30 12:36 |
| S5a — Do Not Press | GPT-5.5 | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · / · | · | · | 05-30 12:36 |
| S5a — Do Not Press | GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 12:35 |
| S5a — Do Not Press | GPT-5.5 | Gemini 3.1 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 12:34 |
| S5a — Do Not Press | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:34 |
| S5a — Do Not Press | Claude Opus 4.8 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:33 |
| S5a — Do Not Press | Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 0.86 | · / · | · | · | 05-30 12:31 |
| S5a — Do Not Press | Claude Opus 4.8 | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 12:31 |
| S5a — Do Not Press | Claude Opus 4.8 | DeepSeek V4 Pro | win | 6 | 1.00 | 0.80 | · / · | · | · | 05-30 12:30 |
| S5a — Do Not Press | Claude Opus 4.8 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:30 |
| S5a — Do Not Press | Claude Opus 4.8 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:29 |
| S5a — Do Not Press | Claude Opus 4.8 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:29 |
| S4b — Relay Transform | Mistral Lg | GLM 5.1 | win | 8 | 1.00 | 0.95 | · / ✓ | · | · | 05-30 12:28 |
| S4b — Relay Transform | Cohere Cmd-A | GLM 5.1 | win | 10 | 0.76 | 0.72 | · / · | · | · | 05-30 12:27 |
| S4b — Relay Transform | Cohere Cmd-A | Mistral Lg | win | 10 | 0.76 | 0.76 | · / · | · | · | 05-30 12:26 |
| S4b — Relay Transform | DeepSeek V4 Pro | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:25 |
| S4b — Relay Transform | DeepSeek V4 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:25 |
| S4b — Relay Transform | DeepSeek V4 Pro | Cohere Cmd-A | win | 14 | 0.73 | 0.91 | · / · | · | · | 05-30 12:23 |
| S4b — Relay Transform | Gemini 3.1 Pro | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:22 |
| S4b — Relay Transform | Gemini 3.1 Pro | Mistral Lg | win | 8 | 1.00 | 0.95 | · / ✓ | · | · | 05-30 12:22 |
| S4b — Relay Transform | Gemini 3.1 Pro | Cohere Cmd-A | unstable | 14 | 1.00 | 0.29 | · / · | 2⚠ | · | 05-30 12:21 |
| S4b — Relay Transform | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 12:20 |
| S4b — Relay Transform | Grok 4.3 | GLM 5.1 | win | 8 | 0.70 | 0.82 | · / · | · | · | 05-30 12:19 |
| S4b — Relay Transform | Grok 4.3 | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:18 |
| S4b — Relay Transform | Grok 4.3 | Cohere Cmd-A | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:18 |
| S4b — Relay Transform | Grok 4.3 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 12:17 |
| S4b — Relay Transform | Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:17 |
| S4b — Relay Transform | GPT-5.5 | GLM 5.1 | win | 8 | 0.85 | 0.82 | · / · | · | · | 05-30 12:16 |
| S4b — Relay Transform | GPT-5.5 | Mistral Lg | unstable | 14 | 1.00 | 0.29 | · / · | 2⚠ | · | 05-30 12:14 |
| S4b — Relay Transform | GPT-5.5 | Cohere Cmd-A | unstable | 18 | 0.80 | 0.33 | · / · | 2⚠ | · | 05-30 12:13 |
| S4b — Relay Transform | GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 12:12 |
| S4b — Relay Transform | GPT-5.5 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:12 |
| S4b — Relay Transform | GPT-5.5 | Grok 4.3 | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 12:11 |
| S4b — Relay Transform | Claude Opus 4.8 | GLM 5.1 | win | 10 | 0.88 | 0.82 | · / ✓ | · | · | 05-30 12:10 |
| S4b — Relay Transform | Claude Opus 4.8 | Mistral Lg | win | 8 | 0.85 | 0.85 | · / · | · | · | 05-30 12:09 |
| S4b — Relay Transform | Claude Opus 4.8 | Cohere Cmd-A | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 12:08 |
| S4b — Relay Transform | Claude Opus 4.8 | DeepSeek V4 Pro | win | 10 | 0.88 | 1.00 | · / · | · | · | 05-30 12:07 |
| S4b — Relay Transform | Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 0.85 | 0.85 | · / · | · | · | 05-30 12:06 |
| S4b — Relay Transform | Claude Opus 4.8 | Grok 4.3 | win | 8 | 0.70 | 0.85 | · / · | · | · | 05-30 12:06 |
| S4b — Relay Transform | Claude Opus 4.8 | GPT-5.5 | win | 8 | 0.85 | 0.82 | · / · | · | · | 05-30 12:05 |
| S4a — Relay Lookup | Mistral Lg | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:05 |
| S4a — Relay Lookup | Cohere Cmd-A | GLM 5.1 | win | 10 | 0.76 | 0.88 | · / · | · | · | 05-30 12:04 |
| S4a — Relay Lookup | Cohere Cmd-A | Mistral Lg | win | 10 | 0.76 | 0.88 | · / · | · | · | 05-30 12:03 |
| S4a — Relay Lookup | DeepSeek V4 Pro | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:02 |
| S4a — Relay Lookup | DeepSeek V4 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:02 |
| S4a — Relay Lookup | DeepSeek V4 Pro | Cohere Cmd-A | win | 8 | 0.85 | 0.85 | · / · | · | · | 05-30 12:01 |
| S4a — Relay Lookup | Gemini 3.1 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 12:00 |
| S4a — Relay Lookup | Gemini 3.1 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 12:00 |
| S4a — Relay Lookup | Gemini 3.1 Pro | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 11:59 |
| S4a — Relay Lookup | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 11:58 |
| S4a — Relay Lookup | Grok 4.3 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:57 |
| S4a — Relay Lookup | Grok 4.3 | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:57 |
| S4a — Relay Lookup | Grok 4.3 | Cohere Cmd-A | win | 8 | 0.85 | 0.85 | · / · | · | · | 05-30 11:56 |
| S4a — Relay Lookup | Grok 4.3 | DeepSeek V4 Pro | win | 9 | 0.76 | 1.00 | · / · | · | · | 05-30 11:54 |
| S4a — Relay Lookup | Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:54 |
| S4a — Relay Lookup | GPT-5.5 | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 11:53 |
| S4a — Relay Lookup | GPT-5.5 | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:52 |
| S4a — Relay Lookup | GPT-5.5 | Cohere Cmd-A | unstable | 14 | 1.00 | 0.29 | · / · | 2⚠ | · | 05-30 11:51 |
| S4a — Relay Lookup | GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 11:50 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:49 |
| S4a — Relay Lookup | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:49 |
| S4a — Relay Lookup | Claude Opus 4.8 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:48 |
| S4a — Relay Lookup | Claude Opus 4.8 | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:48 |
| S4a — Relay Lookup | Claude Opus 4.8 | Cohere Cmd-A | unstable | 18 | 1.00 | 0.44 | · / · | 2⚠ | · | 05-30 11:46 |
| S4a — Relay Lookup | Claude Opus 4.8 | DeepSeek V4 Pro | win | 12 | 1.00 | 0.73 | · / · | · | · | 05-30 11:45 |
| S4a — Relay Lookup | Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 11:44 |
| S4a — Relay Lookup | Claude Opus 4.8 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:44 |
| S4a — Relay Lookup | Claude Opus 4.8 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-30 11:43 |
| S3 — Combined | Mistral Lg | GLM 5.1 | win | 8 | 0.65 | 1.00 | · / · | · | · | 05-30 11:42 |
| S3 — Combined | Cohere Cmd-A | GLM 5.1 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-30 11:41 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | unstable | 27 | 0.41 | 0.52 | · / ✓ | 9⚠ | 2↺ | 05-30 11:38 |
| S3 — Combined | DeepSeek V4 Pro | GLM 5.1 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-30 11:37 |
| S3 — Combined | DeepSeek V4 Pro | Mistral Lg | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-30 11:36 |
| S3 — Combined | DeepSeek V4 Pro | Cohere Cmd-A | win | 29 | 0.49 | 0.90 | · / · | 2⚠ | 1↺ | 05-30 11:23 |
| S3 — Combined | Gemini 3.1 Pro | GLM 5.1 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-30 11:22 |
| S3 — Combined | Gemini 3.1 Pro | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-30 11:20 |
| S3 — Combined | Gemini 3.1 Pro | Cohere Cmd-A | win | 12 | 0.68 | 0.83 | · / · | · | · | 05-30 11:19 |
| S3 — Combined | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 12 | 0.57 | 0.87 | · / · | · | · | 05-30 11:18 |
| S3 — Combined | Grok 4.3 | GLM 5.1 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-30 11:16 |
| S3 — Combined | Grok 4.3 | Mistral Lg | win | 12 | 0.57 | 0.83 | · / · | · | · | 05-30 11:15 |
| S3 — Combined | Grok 4.3 | Cohere Cmd-A | win | 12 | 0.57 | 0.83 | · / · | · | · | 05-30 11:14 |
| S3 — Combined | Grok 4.3 | DeepSeek V4 Pro | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-30 11:13 |
| S3 — Combined | Grok 4.3 | Gemini 3.1 Pro | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-30 11:12 |
| S3 — Combined | GPT-5.5 | GLM 5.1 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-30 11:11 |
| S3 — Combined | GPT-5.5 | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-30 11:09 |
| S3 — Combined | GPT-5.5 | Cohere Cmd-A | win | 16 | 0.59 | 0.75 | · / · | · | · | 05-30 11:08 |
| S3 — Combined | GPT-5.5 | DeepSeek V4 Pro | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-30 11:07 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 Pro | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-30 11:06 |
| S3 — Combined | GPT-5.5 | Grok 4.3 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-30 11:05 |
| S3 — Combined | Claude Opus 4.8 | GLM 5.1 | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-30 11:04 |
| S3 — Combined | Claude Opus 4.8 | Mistral Lg | win | 12 | 0.57 | 0.88 | · / · | · | · | 05-30 11:02 |
| S3 — Combined | Claude Opus 4.8 | Cohere Cmd-A | unstable | 22 | 0.76 | 0.45 | · / · | 2⚠ | · | 05-30 11:00 |
| S3 — Combined | Claude Opus 4.8 | DeepSeek V4 Pro | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-30 10:59 |
| S3 — Combined | Claude Opus 4.8 | Gemini 3.1 Pro | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-30 10:59 |
| S3 — Combined | Claude Opus 4.8 | Grok 4.3 | win | 14 | 0.53 | 0.91 | · / · | · | · | 05-30 10:57 |
| S3 — Combined | Claude Opus 4.8 | GPT-5.5 | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-30 10:57 |
| S2 — Mirrored | Mistral Lg | GLM 5.1 | win | 21 | 0.56 | 0.94 | ✓ / · | · | · | 05-30 10:54 |
| S2 — Mirrored | Cohere Cmd-A | GLM 5.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:54 |
| S2 — Mirrored | Cohere Cmd-A | Mistral Lg | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:53 |
| S2 — Mirrored | DeepSeek V4 Pro | GLM 5.1 | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-30 10:52 |
| S2 — Mirrored | DeepSeek V4 Pro | Mistral Lg | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-30 10:51 |
| S2 — Mirrored | DeepSeek V4 Pro | Cohere Cmd-A | win | 7 | 1.00 | 1.00 | · / · | · | · | 05-30 10:50 |
| S2 — Mirrored | Gemini 3.1 Pro | GLM 5.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:49 |
| S2 — Mirrored | Gemini 3.1 Pro | Mistral Lg | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-30 10:48 |
| S2 — Mirrored | Gemini 3.1 Pro | Cohere Cmd-A | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-30 10:47 |
| S2 — Mirrored | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-30 10:46 |
| S2 — Mirrored | Grok 4.3 | GLM 5.1 | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-30 10:44 |
| S2 — Mirrored | Grok 4.3 | Mistral Lg | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-30 10:43 |
| S2 — Mirrored | Grok 4.3 | Cohere Cmd-A | win | 13 | 0.83 | 0.80 | · / · | · | · | 05-30 10:42 |
| S2 — Mirrored | Grok 4.3 | DeepSeek V4 Pro | win | 15 | 0.68 | 0.73 | · / · | · | · | 05-30 10:40 |
| S2 — Mirrored | Grok 4.3 | Gemini 3.1 Pro | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-30 10:39 |
| S2 — Mirrored | GPT-5.5 | GLM 5.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:38 |
| S2 — Mirrored | GPT-5.5 | Mistral Lg | win | 7 | 1.00 | 1.00 | · / · | · | · | 05-30 10:38 |
| S2 — Mirrored | GPT-5.5 | Cohere Cmd-A | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-30 10:37 |
| S2 — Mirrored | GPT-5.5 | DeepSeek V4 Pro | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-30 10:37 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 Pro | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-30 10:36 |
| S2 — Mirrored | GPT-5.5 | Grok 4.3 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:35 |
| S2 — Mirrored | Claude Opus 4.8 | GLM 5.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:34 |
| S2 — Mirrored | Claude Opus 4.8 | Mistral Lg | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:34 |
| S2 — Mirrored | Claude Opus 4.8 | Cohere Cmd-A | win | 7 | 1.00 | 1.00 | · / · | · | · | 05-30 10:34 |
| S2 — Mirrored | Claude Opus 4.8 | DeepSeek V4 Pro | win | 9 | 0.96 | 1.00 | ✓ / · | · | · | 05-30 10:33 |
| S2 — Mirrored | Claude Opus 4.8 | Gemini 3.1 Pro | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:32 |
| S2 — Mirrored | Claude Opus 4.8 | Grok 4.3 | win | 7 | 1.00 | 1.00 | · / · | · | · | 05-30 10:31 |
| S2 — Mirrored | Claude Opus 4.8 | GPT-5.5 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-30 10:31 |
| S1 — Signal Room | Mistral Lg | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 10:30 |
| S1 — Signal Room | Cohere Cmd-A | GLM 5.1 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:29 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | unstable | 43 | 0.91 | 0.40 | · / · | 9⚠ | 1↺ | 05-30 10:24 |
| S1 — Signal Room | DeepSeek V4 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 10:23 |
| S1 — Signal Room | DeepSeek V4 Pro | Mistral Lg | win | 14 | 0.83 | 0.89 | · / ✓ | · | · | 05-30 10:20 |
| S1 — Signal Room | DeepSeek V4 Pro | Cohere Cmd-A | win | 14 | 0.73 | 1.00 | · / · | · | · | 05-30 10:19 |
| S1 — Signal Room | Gemini 3.1 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 10:18 |
| S1 — Signal Room | Gemini 3.1 Pro | Mistral Lg | win | 14 | 0.83 | 0.89 | · / ✓ | · | · | 05-30 10:16 |
| S1 — Signal Room | Gemini 3.1 Pro | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-30 10:15 |
| S1 — Signal Room | Gemini 3.1 Pro | DeepSeek V4 Pro | win | 10 | 0.88 | 1.00 | · / · | · | · | 05-30 10:14 |
| S1 — Signal Room | Grok 4.3 | GLM 5.1 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:13 |
| S1 — Signal Room | Grok 4.3 | Mistral Lg | win | 12 | 0.90 | 0.87 | · / ✓ | · | · | 05-30 10:11 |
| S1 — Signal Room | Grok 4.3 | Cohere Cmd-A | win | 13 | 1.00 | 0.83 | · / ✓ | · | · | 05-30 10:09 |
| S1 — Signal Room | Grok 4.3 | DeepSeek V4 Pro | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:07 |
| S1 — Signal Room | Grok 4.3 | Gemini 3.1 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 10:06 |
| S1 — Signal Room | GPT-5.5 | GLM 5.1 | win | 12 | 0.80 | 0.90 | · / · | · | · | 05-30 10:05 |
| S1 — Signal Room | GPT-5.5 | Mistral Lg | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:04 |
| S1 — Signal Room | GPT-5.5 | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-30 10:03 |
| S1 — Signal Room | GPT-5.5 | DeepSeek V4 Pro | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:02 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 Pro | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:01 |
| S1 — Signal Room | GPT-5.5 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-30 10:00 |
| S1 — Signal Room | Claude Opus 4.8 | GLM 5.1 | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 09:59 |
| S1 — Signal Room | Claude Opus 4.8 | Mistral Lg | win | 10 | 0.88 | 1.00 | · / · | · | · | 05-30 09:58 |
| S1 — Signal Room | Claude Opus 4.8 | Cohere Cmd-A | win | 20 | 0.90 | 1.00 | · / · | · | · | 05-30 09:56 |
| S1 — Signal Room | Claude Opus 4.8 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 09:55 |
| S1 — Signal Room | Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 09:55 |
| S1 — Signal Room | GLM 5.1 | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 09:54 |
| S1 — Signal Room | Claude Opus 4.8 | Grok 4.3 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 09:54 |
| S1 — Signal Room | Claude Opus 4.8 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 09:53 |
| S1 — Signal Room | Claude Opus 4.8 | GLM 5.1 | active | 0 | 0.00 | 0.00 | · / · | · | · | 05-30 09:47 |
| S1 — Signal Room | Claude Opus 4.8 | Mistral Lg | win | 20 | 0.82 | 0.90 | · / · | · | · | 05-30 09:45 |
| S1 — Signal Room | Claude Opus 4.8 | Cohere Cmd-A | win | 16 | 0.81 | 1.00 | · / · | · | · | 05-30 09:44 |
| S1 — Signal Room | Claude Opus 4.8 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 09:43 |
| S1 — Signal Room | Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-30 09:42 |
| S1 — Signal Room | Claude Opus 4.8 | Grok 4.3 | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-30 09:41 |
| S1 — Signal Room | Claude Opus 4.8 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-30 09:40 |
| S4b — Relay Transform | Gemini 3.1 Pro | Cohere Cmd-A | win | 12 | 1.00 | 0.83 | · / ✓ | · | · | 05-29 20:24 |
| S4a — Relay Lookup | Gemini 3.1 Pro | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-29 20:24 |
| S3 — Combined | Gemini 3.1 Pro | Cohere Cmd-A | win | 14 | 0.60 | 0.86 | · / · | · | · | 05-29 20:23 |
| S2 — Mirrored | Gemini 3.1 Pro | Cohere Cmd-A | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-29 20:22 |
| S1 — Signal Room | Gemini 3.1 Pro | Cohere Cmd-A | win | 14 | 0.91 | 0.73 | · / ✓ | · | · | 05-29 20:21 |
| S4b — Relay Transform | Grok 4.3 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 20:21 |
| S4a — Relay Lookup | Grok 4.3 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 20:20 |
| S3 — Combined | Grok 4.3 | GLM 5.1 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-29 20:19 |
| S2 — Mirrored | Grok 4.3 | GLM 5.1 | win | 11 | 0.88 | 0.88 | · / · | · | · | 05-29 20:18 |
| S1 — Signal Room | Grok 4.3 | GLM 5.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-29 20:17 |
| S4b — Relay Transform | GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-29 20:16 |
| S4a — Relay Lookup | GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-29 20:15 |
| S3 — Combined | GPT-5.5 | DeepSeek V4 Pro | win | 13 | 0.53 | 1.00 | · / · | · | · | 05-29 20:13 |
| S2 — Mirrored | GPT-5.5 | DeepSeek V4 Pro | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-29 20:12 |
| S1 — Signal Room | GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-29 20:11 |
| S4b — Relay Transform | Claude Opus 4.8 | Mistral Lg | win | 8 | 0.85 | 0.85 | · / · | · | · | 05-29 20:10 |
| S4a — Relay Lookup | Claude Opus 4.8 | Mistral Lg | unstable | 14 | 1.00 | 0.29 | · / · | 2⚠ | · | 05-29 20:08 |
| S3 — Combined | Claude Opus 4.8 | Mistral Lg | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-29 20:07 |
| S2 — Mirrored | Claude Opus 4.8 | Mistral Lg | win | 7 | 1.00 | 1.00 | · / · | · | · | 05-29 20:06 |
| S1 — Signal Room | Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-29 20:05 |
| S3 — Combined | Cohere Cmd-A | DeepSeek V4 Pro | active | 3 | 1.00 | 1.00 | · / · | · | · | 05-29 19:54 |
| S5d — Dusty Logbook | GLM 5.1 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 19:40 |
| S5c — Power Outage | GLM 5.1 | Gemini 3.1 | win | 12 | 1.00 | 0.78 | · / · | · | · | 05-29 19:39 |
| S5b — Faulty Relay | GLM 5.1 | Gemini 3.1 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-29 19:38 |
| S5a — Do Not Press | GLM 5.1 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 19:36 |
| S5a — Do Not Press | Llama 3.3 70B | Llama 3.3 70B | active | 0 | 0.00 | 0.00 | · / · | · | · | 05-29 19:22 |
| S5a — Do Not Press | Llama 3.3 70B | Llama 3.3 70B | active | 0 | 0.00 | 0.00 | · / · | · | · | 05-29 19:20 |
| S5a — Do Not Press | Llama 3.3 70B | Llama 3.3 70B | active | 0 | 0.00 | 0.00 | · / · | · | · | 05-29 19:15 |
| S5a — Do Not Press | Llama 3.3 70B | Mistral Lg | active | 1 | 0.00 | 1.00 | · / · | · | · | 05-29 19:13 |
| S5d — Dusty Logbook | DeepSeek V4 | Gemini 3.1 | win | 8 | 0.85 | 0.82 | · / · | · | · | 05-29 19:09 |
| S5c — Power Outage | DeepSeek V4 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · / · | · | · | 05-29 19:08 |
| S5b — Faulty Relay | DeepSeek V4 | Gemini 3.1 | win | 8 | 0.85 | 0.82 | · / · | · | · | 05-29 19:08 |
| S5a — Do Not Press | DeepSeek V4 | Gemini 3.1 | win | 12 | 0.70 | 0.78 | · / · | · | · | 05-29 19:07 |
| S5d — Dusty Logbook | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 19:00 |
| S5c — Power Outage | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 19:00 |
| S5b — Faulty Relay | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 18:59 |
| S5a — Do Not Press | GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-29 18:59 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | active | 1 | 1.00 | 0.00 | · / · | · | · | 05-29 08:37 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 1.00 | · / · | · | · | 05-29 08:35 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-29 08:34 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-29 08:32 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.51 | · / ✓ | 3⚠ | 1↺ | 05-29 08:30 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-29 08:28 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-29 08:27 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-29 08:26 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-29 08:24 |
| S3 — Combined | Cohere Cmd-A | DeepSeek V4 | win | 18 | 0.48 | 0.87 | · / · | · | · | 05-29 08:23 |
| S3 — Combined | Grok 4.3 | DeepSeek V4 | active | 0 | 0.00 | 0.00 | · / · | · | · | 05-28 21:00 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:59 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.90 | · / ✓ | · | · | 05-28 20:58 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:57 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.82 | · / ✓ | · | · | 05-28 20:55 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:54 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 20 | 0.56 | 0.87 | · / · | · | · | 05-28 20:52 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | · / · | · | · | 05-28 20:50 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | unstable | 24 | 0.49 | 0.57 | · / · | 2⚠ | · | 05-28 20:48 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:48 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.68 | 1.00 | · / · | · | · | 05-28 20:47 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 14 | 0.60 | 0.91 | ✓ / · | · | · | 05-28 20:45 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 18 | 0.48 | 0.84 | · / ✓ | · | · | 05-28 20:44 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:43 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 20:42 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 13 | 0.53 | 1.00 | · / · | · | · | 05-28 20:41 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:40 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | · / · | · | · | 05-28 20:38 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:38 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:37 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-28 20:36 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 14 | 0.49 | 0.91 | · / · | · | · | 05-28 20:34 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 11 | 0.57 | 1.00 | · / · | · | · | 05-28 20:33 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:32 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-28 20:32 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:31 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-28 20:30 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-28 20:29 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:28 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:27 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 13 | 0.53 | 1.00 | · / · | · | · | 05-28 20:26 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | · / · | · | · | 05-28 20:24 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-28 20:23 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | · / · | · | · | 05-28 20:23 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | · / · | · | · | 05-28 20:21 |
| S3 — Combined | Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:21 |
| S3 — Combined | DeepSeek V4 | Grok 4.3 | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 20:20 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 22 | 0.59 | 0.93 | · / · | 2⚠ | 1↺ | 05-28 20:15 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 14 | 0.60 | 0.86 | ✓ / · | · | · | 05-28 20:14 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 19 | 0.60 | 0.92 | ✓ / · | · | · | 05-28 20:10 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:09 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:08 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 14 | 0.60 | 1.00 | ✓ / · | · | · | 05-28 20:07 |
| S3 — Combined | DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.57 | 1.00 | · / · | · | · | 05-28 20:06 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | unstable | 28 | 0.46 | 0.56 | · / ✓ | 6⚠ | · | 05-28 20:02 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 16 | 0.68 | 1.00 | · / · | · | · | 05-28 19:59 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 0.91 | · / · | · | · | 05-28 19:57 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.73 | 0.91 | · / · | · | · | 05-28 19:54 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 0.61 | · / · | · | · | 05-28 19:52 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.73 | 0.91 | · / · | · | · | 05-28 19:50 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 19:49 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.64 | · / · | · | · | 05-28 19:46 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 0.61 | · / · | · | · | 05-28 19:44 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 1.00 | · / · | · | · | 05-28 19:42 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 18 | 0.48 | 0.52 | · / · | · | · | 05-28 19:40 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 16 | 0.50 | 0.64 | · / ✓ | · | · | 05-28 19:38 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 0.91 | · / · | · | · | 05-28 19:37 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | unstable | 27 | 0.41 | 0.25 | · / · | 10⚠ | 2↺ | 05-28 19:33 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 19:32 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | unstable | 17 | 0.48 | 0.41 | · / · | 2⚠ | · | 05-28 19:29 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 1.00 | · / · | · | · | 05-28 19:28 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 19:26 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 19:25 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 16 | 0.50 | 0.54 | · / · | · | · | 05-28 19:23 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 19:21 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 12 | 0.57 | 0.87 | · / ✓ | · | · | 05-28 19:19 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 0.91 | · / · | · | · | 05-28 19:18 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 18 | 0.63 | 1.00 | · / · | · | · | 05-28 19:15 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | · / · | · | · | 05-28 19:14 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 18 | 0.48 | 0.48 | · / · | 4⚠ | 1↺ | 05-28 19:12 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.73 | 0.91 | · / · | · | · | 05-28 19:10 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.90 | · / ✓ | · | · | 05-28 19:07 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 19 | 0.74 | 0.89 | · / · | · | · | 05-28 19:03 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 28 | 0.41 | 0.38 | · / · | 6⚠ | 2↺ | 05-28 18:58 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 13 | 0.53 | 1.00 | · / · | · | · | 05-28 18:54 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 1.00 | · / · | · | · | 05-28 18:53 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-28 18:52 |
| S3 — Combined | Cohere Cmd-A | DeepSeek V4 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-28 18:51 |
| S1 — Signal Room | Mistral Lg | Grok 4.3 | win | 14 | 0.67 | 0.91 | · / · | · | · | 05-28 18:34 |
| S1 — Signal Room | Mistral Lg | DeepSeek V4 | win | 16 | 0.71 | 0.97 | · / ✓ | · | · | 05-28 18:32 |
| S1 — Signal Room | Mistral Lg | DeepSeek V4 | win | 10 | 0.88 | 1.00 | · / · | · | · | 05-28 18:30 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 18:28 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 18:27 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 18:25 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 18:23 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 18:21 |
| S1 — Signal Room | Grok 4.3 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · / · | · | · | 05-28 18:20 |
| S1 — Signal Room | Grok 4.3 | Gemini 3.1 | win | 14 | 0.81 | 0.84 | · / ✓ | · | · | 05-28 18:20 |
| S1 — Signal Room | Gemini 3.1 | DeepSeek V4 | win | 10 | 1.00 | 1.00 | · / · | · | · | 05-28 18:19 |
| S1 — Signal Room | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-28 18:18 |
| S1 — Signal Room | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.88 | 1.00 | · / · | · | · | 05-28 18:17 |
| S1 — Signal Room | DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.90 | 1.00 | · / · | · | · | 05-28 18:16 |
| S1 — Signal Room | DeepSeek V4 | Cohere Cmd-A | unstable | 31 | 0.82 | 1.00 | ✓ / · | 5⚠ | · | 05-28 18:11 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · / · | · | · | 05-28 18:10 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.89 | · / ✓ | · | · | 05-28 18:08 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · / · | · | · | 05-28 18:06 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · / · | · | · | 05-28 18:04 |
| S1 — Signal Room | Mistral Lg | Grok 4.3 | win | 12 | 0.78 | 0.90 | · / · | · | · | 05-28 17:59 |
| S1 — Signal Room | Mistral Lg | DeepSeek V4 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-28 17:58 |
| S1 — Signal Room | Mistral Lg | DeepSeek V4 | unstable | 28 | 0.49 | 0.96 | ✓ / · | 5⚠ | · | 05-28 17:54 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 16 | 0.84 | 0.88 | · / · | · | · | 05-28 17:52 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 17:50 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 17:48 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 17:47 |
| S1 — Signal Room | Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · / ✓ | · | · | 05-28 17:45 |
| S1 — Signal Room | Grok 4.3 | Gemini 3.1 | win | 14 | 0.71 | 0.73 | · / ✓ | · | · | 05-28 17:44 |
| S1 — Signal Room | Grok 4.3 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · / · | · | · | 05-28 17:44 |
| S1 — Signal Room | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-28 17:43 |
| S1 — Signal Room | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.86 | 0.88 | · / · | · | · | 05-28 17:42 |
| S1 — Signal Room | Gemini 3.1 | DeepSeek V4 | win | 10 | 0.86 | 0.88 | · / · | · | · | 05-28 17:42 |
| S1 — Signal Room | DeepSeek V4 | Cohere Cmd-A | win | 14 | 0.83 | 0.97 | · / ✓ | · | · | 05-28 17:40 |
| S1 — Signal Room | DeepSeek V4 | Cohere Cmd-A | win | 16 | 0.82 | 1.00 | ✓ / · | · | · | 05-28 17:39 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.74 | · / ✓ | · | · | 05-28 17:37 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.89 | · / ✓ | · | · | 05-28 17:35 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 14 | 1.00 | 0.83 | · / ✓ | · | · | 05-28 17:33 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | unstable | 32 | 0.80 | 0.60 | · / ✓ | 2⚠ | · | 05-28 17:29 |
| S1 — Signal Room | Cohere Cmd-A | Grok 4.3 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-28 17:24 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | active | 1 | 0.00 | 1.00 | · / · | · | · | 05-28 16:50 |
| S4b — Relay Transform | DeepSeek V4 | Claude S4.6 | win | 8 | 0.85 | 0.95 | · / ✓ | · | · | 05-28 16:49 |
| S4a — Relay Lookup | DeepSeek V4 | Claude S4.6 | win | 6 | 0.80 | 1.00 | · / · | · | · | 05-28 16:49 |
| S3 — Combined | DeepSeek V4 | Claude S4.6 | win | 12 | 0.57 | 0.73 | · / ✓ | · | · | 05-28 16:47 |
| S2 — Mirrored | DeepSeek V4 | Claude S4.6 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-28 16:46 |
| S1 — Signal Room | DeepSeek V4 | Claude S4.6 | win | 12 | 0.78 | 0.85 | · / ✓ | · | · | 05-28 16:45 |
| S4b — Relay Transform | DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.82 | · / ✓ | · | · | 05-28 16:44 |
| S4a — Relay Lookup | DeepSeek V4 | Claude S4.6 | win | 6 | 0.80 | 1.00 | · / · | · | · | 05-28 16:43 |
| S3 — Combined | DeepSeek V4 | Claude S4.6 | win | 14 | 0.53 | 0.77 | · / ✓ | · | · | 05-28 16:41 |
| S2 — Mirrored | DeepSeek V4 | Claude S4.6 | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-28 16:41 |
| S1 — Signal Room | DeepSeek V4 | Claude S4.6 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-28 16:40 |
| S4b — Relay Transform | DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.82 | · / ✓ | · | · | 05-28 16:39 |
| S4a — Relay Lookup | DeepSeek V4 | Claude S4.6 | win | 6 | 0.80 | 1.00 | · / · | · | · | 05-28 16:38 |
| S3 — Combined | DeepSeek V4 | Claude S4.6 | win | 14 | 0.54 | 0.77 | · / ✓ | · | · | 05-28 16:37 |
| S2 — Mirrored | DeepSeek V4 | Claude S4.6 | win | 11 | 0.78 | 0.88 | · / · | · | · | 05-28 16:35 |
| S1 — Signal Room | DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.86 | · / · | · | · | 05-28 16:35 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 | unstable | 16 | 1.00 | 0.12 | · / · | 2⚠ | · | 05-28 16:34 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-28 16:33 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 | win | 9 | 0.86 | 1.00 | · / · | · | · | 05-28 16:32 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 | win | 12 | 1.00 | 0.82 | · / ✓ | · | · | 05-28 16:32 |
| S4b — Relay Transform | GPT-5.5 | Gemini 3.1 | win | 8 | 0.85 | 0.95 | · / ✓ | · | · | 05-28 16:31 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:31 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 | win | 14 | 0.73 | 0.80 | · / · | · | · | 05-28 16:30 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-28 16:30 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 | win | 14 | 0.70 | 0.84 | · / ✓ | · | · | 05-28 16:29 |
| S4b — Relay Transform | GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:28 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 | win | 12 | 1.00 | 0.50 | · / · | 2⚠ | 1↺ | 05-28 16:28 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 | win | 14 | 0.60 | 0.91 | · / · | · | · | 05-28 16:27 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-28 16:26 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 | win | 10 | 0.88 | 0.86 | · / · | · | · | 05-28 16:26 |
| S4b — Relay Transform | Claude S4.6 | Grok 4.3 | win | 8 | 0.85 | 1.00 | · / · | · | · | 05-28 16:25 |
| S4a — Relay Lookup | Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:25 |
| S3 — Combined | Claude S4.6 | Grok 4.3 | win | 18 | 0.64 | 1.00 | ✓ / · | · | · | 05-28 16:23 |
| S2 — Mirrored | Claude S4.6 | Grok 4.3 | win | 11 | 0.73 | 0.72 | ✓ / · | · | · | 05-28 16:22 |
| S1 — Signal Room | Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-28 16:21 |
| S4b — Relay Transform | Claude S4.6 | Grok 4.3 | win | 8 | 0.85 | 1.00 | · / · | · | · | 05-28 16:20 |
| S4a — Relay Lookup | Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:20 |
| S3 — Combined | Claude S4.6 | Grok 4.3 | win | 18 | 0.64 | 1.00 | ✓ / · | · | · | 05-28 16:19 |
| S2 — Mirrored | Claude S4.6 | Grok 4.3 | win | 11 | 0.97 | 1.00 | ✓ / · | · | · | 05-28 16:18 |
| S1 — Signal Room | Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-28 16:17 |
| S4b — Relay Transform | Claude S4.6 | Grok 4.3 | win | 8 | 0.85 | 1.00 | · / · | · | · | 05-28 16:16 |
| S4a — Relay Lookup | Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:16 |
| S3 — Combined | Claude S4.6 | Grok 4.3 | win | 8 | 0.68 | 0.82 | · / · | · | · | 05-28 16:15 |
| S2 — Mirrored | Claude S4.6 | Grok 4.3 | win | 13 | 0.67 | 0.77 | ✓ / · | · | · | 05-28 16:14 |
| S1 — Signal Room | Claude S4.6 | Grok 4.3 | win | 8 | 1.00 | 0.85 | · / · | · | · | 05-28 16:14 |
| S4b — Relay Transform | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:13 |
| S4a — Relay Lookup | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:13 |
| S3 — Combined | Claude S4.6 | GPT-5.5 | win | 10 | 0.74 | 1.00 | · / · | · | · | 05-28 16:12 |
| S2 — Mirrored | Claude S4.6 | GPT-5.5 | win | 9 | 0.96 | 1.00 | ✓ / · | · | · | 05-28 16:11 |
| S1 — Signal Room | Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-28 16:11 |
| S4b — Relay Transform | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:10 |
| S4a — Relay Lookup | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:10 |
| S3 — Combined | Claude S4.6 | GPT-5.5 | win | 12 | 0.63 | 1.00 | ✓ / · | · | · | 05-28 16:09 |
| S2 — Mirrored | Claude S4.6 | GPT-5.5 | win | 11 | 0.73 | 0.72 | ✓ / · | · | · | 05-28 16:08 |
| S1 — Signal Room | Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-28 16:08 |
| S4b — Relay Transform | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:08 |
| S4a — Relay Lookup | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-28 16:07 |
| S3 — Combined | Claude S4.6 | GPT-5.5 | win | 8 | 0.68 | 0.82 | · / · | · | · | 05-28 16:06 |
| S2 — Mirrored | Claude S4.6 | GPT-5.5 | win | 9 | 0.82 | 1.00 | ✓ / · | · | · | 05-28 16:06 |
| S1 — Signal Room | Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-28 16:05 |
| S4b — Relay Transform | Cohere Cmd-A | Mistral Lg | unstable | 14 | 1.00 | 0.29 | · / · | 2⚠ | · | 05-27 22:02 |
| S4a — Relay Lookup | Cohere Cmd-A | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 22:01 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | · / · | · | · | 05-27 21:59 |
| S2 — Mirrored | Cohere Cmd-A | Mistral Lg | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-27 21:58 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.89 | · / ✓ | · | · | 05-27 21:55 |
| S4b — Relay Transform | Cohere Cmd-A | Mistral Lg | unstable | 13 | 1.00 | 0.33 | · / · | 4⚠ | · | 05-27 21:53 |
| S4a — Relay Lookup | Cohere Cmd-A | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 21:52 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.73 | · / · | · | · | 05-27 21:48 |
| S2 — Mirrored | Cohere Cmd-A | Mistral Lg | win | 13 | 0.83 | 1.00 | ✓ / · | · | · | 05-27 21:46 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · / · | · | · | 05-27 21:44 |
| S4b — Relay Transform | Cohere Cmd-A | Mistral Lg | unstable | 14 | 1.00 | 0.14 | · / · | 2⚠ | · | 05-27 21:42 |
| S4a — Relay Lookup | Cohere Cmd-A | Mistral Lg | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 21:41 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | unstable | 28 | 0.46 | 0.32 | · / ✓ | 5⚠ | 1↺ | 05-27 21:35 |
| S2 — Mirrored | Cohere Cmd-A | Mistral Lg | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-27 21:34 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · / · | · | · | 05-27 21:32 |
| S4b — Relay Transform | Cohere Cmd-A | Mistral Lg | active | 4 | 1.00 | 0.00 | · / · | · | · | 05-27 21:21 |
| S4a — Relay Lookup | Cohere Cmd-A | Mistral Lg | win | 11 | 0.85 | 1.00 | · / · | · | · | 05-27 21:20 |
| S3 — Combined | Cohere Cmd-A | Mistral Lg | win | 19 | 0.48 | 1.00 | · / · | · | · | 05-27 21:19 |
| S2 — Mirrored | Cohere Cmd-A | Mistral Lg | win | 12 | 0.98 | 1.00 | ✓ / · | · | · | 05-27 21:18 |
| S1 — Signal Room | Cohere Cmd-A | Mistral Lg | win | 16 | 0.89 | 0.88 | · / · | · | · | 05-27 21:17 |
| S4b — Relay Transform | DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.82 | · / ✓ | · | · | 05-27 21:16 |
| S4a — Relay Lookup | DeepSeek V4 | Claude S4.6 | win | 10 | 0.83 | 1.00 | · / · | · | · | 05-27 21:15 |
| S3 — Combined | DeepSeek V4 | Claude S4.6 | win | 11 | 0.62 | 0.85 | · / ✓ | · | · | 05-27 21:14 |
| S2 — Mirrored | DeepSeek V4 | Claude S4.6 | win | 13 | 0.80 | 0.80 | · / ✓ | · | · | 05-27 21:13 |
| S1 — Signal Room | DeepSeek V4 | Claude S4.6 | win | 13 | 0.77 | 0.67 | · / ✓ | · | · | 05-27 21:12 |
| S3 — Combined | DeepSeek V4 | Claude S4.6 | unstable | 29 | 0.70 | 0.52 | · / · | 7⚠ | 1↺ | 05-27 21:07 |
| S2 — Mirrored | DeepSeek V4 | Claude S4.6 | win | 11 | 0.90 | 0.88 | · / · | · | · | 05-27 21:07 |
| S1 — Signal Room | DeepSeek V4 | Claude S4.6 | win | 11 | 1.00 | 0.73 | · / ✓ | · | · | 05-27 21:06 |
| S3 — Combined | DeepSeek V4 | Claude S4.6 | unstable | 27 | 0.70 | 0.73 | · / · | 5⚠ | · | 05-27 21:02 |
| S2 — Mirrored | DeepSeek V4 | Claude S4.6 | win | 14 | 0.87 | 0.82 | ✓ / ✓ | · | · | 05-27 21:00 |
| S1 — Signal Room | DeepSeek V4 | Claude S4.6 | win | 18 | 0.61 | 0.64 | · / ✓ | · | · | 05-27 20:58 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 | unstable | 14 | 0.91 | 0.29 | · / · | 2⚠ | · | 05-27 20:58 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 | win | 12 | 0.57 | 0.90 | · / · | · | · | 05-27 20:57 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-27 20:57 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · / · | · | · | 05-27 20:56 |
| S4b — Relay Transform | GPT-5.5 | Gemini 3.1 | win | 8 | 1.00 | 0.95 | · / ✓ | · | · | 05-27 20:56 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:55 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 | win | 12 | 0.68 | 0.90 | · / · | · | · | 05-27 20:55 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · / · | · | · | 05-27 20:54 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 | win | 12 | 0.88 | 0.82 | · / ✓ | · | · | 05-27 20:53 |
| S4b — Relay Transform | GPT-5.5 | Gemini 3.1 | win | 8 | 1.00 | 0.95 | · / ✓ | · | · | 05-27 20:53 |
| S4a — Relay Lookup | GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:53 |
| S3 — Combined | GPT-5.5 | Gemini 3.1 | win | 12 | 0.55 | 0.85 | · / ✓ | · | · | 05-27 20:52 |
| S2 — Mirrored | GPT-5.5 | Gemini 3.1 | win | 9 | 0.88 | 1.00 | · / · | · | · | 05-27 20:52 |
| S1 — Signal Room | GPT-5.5 | Gemini 3.1 | win | 10 | 1.00 | 0.82 | · / ✓ | · | · | 05-27 20:51 |
| S4b — Relay Transform | Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:51 |
| S4a — Relay Lookup | Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:50 |
| S3 — Combined | Claude S4.6 | Grok 4.3 | win | 16 | 0.70 | 0.80 | ✓ / · | 1⚠ | · | 05-27 20:49 |
| S2 — Mirrored | Claude S4.6 | Grok 4.3 | win | 9 | 0.96 | 1.00 | ✓ / · | · | · | 05-27 20:49 |
| S1 — Signal Room | Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.86 | · / · | · | · | 05-27 20:48 |
| S4a — Relay Lookup | Claude S4.6 | Grok 4.3 | unstable | 18 | 0.98 | 0.33 | ✓ / · | 2⚠ | · | 05-27 20:47 |
| S3 — Combined | Claude S4.6 | Grok 4.3 | win | 14 | 0.59 | 1.00 | ✓ / · | · | · | 05-27 20:46 |
| S2 — Mirrored | Claude S4.6 | Grok 4.3 | win | 9 | 0.96 | 1.00 | ✓ / · | · | · | 05-27 20:45 |
| S1 — Signal Room | Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · / · | · | · | 05-27 20:44 |
| S4b — Relay Transform | Claude S4.6 | Grok 4.3 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-27 20:44 |
| S4a — Relay Lookup | Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:44 |
| S3 — Combined | Claude S4.6 | Grok 4.3 | win | 20 | 0.59 | 1.00 | ✓ / · | · | · | 05-27 20:42 |
| S2 — Mirrored | Claude S4.6 | Grok 4.3 | win | 9 | 0.96 | 1.00 | ✓ / · | · | · | 05-27 20:41 |
| S1 — Signal Room | Claude S4.6 | Grok 4.3 | win | 10 | 0.96 | 1.00 | ✓ / · | · | · | 05-27 20:41 |
| S4b — Relay Transform | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:41 |
| S4a — Relay Lookup | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:40 |
| S3 — Combined | Claude S4.6 | GPT-5.5 | win | 32 | 0.43 | 0.75 | ✓ / · | · | · | 05-27 20:38 |
| S2 — Mirrored | Claude S4.6 | GPT-5.5 | win | 11 | 0.97 | 1.00 | ✓ / · | · | · | 05-27 20:36 |
| S1 — Signal Room | Claude S4.6 | GPT-5.5 | win | 8 | 0.95 | 1.00 | ✓ / · | · | · | 05-27 20:36 |
| S4b — Relay Transform | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:35 |
| S4a — Relay Lookup | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:35 |
| S3 — Combined | Claude S4.6 | GPT-5.5 | win | 12 | 0.63 | 1.00 | ✓ / · | · | · | 05-27 20:34 |
| S2 — Mirrored | Claude S4.6 | GPT-5.5 | win | 11 | 0.97 | 1.00 | ✓ / · | · | · | 05-27 20:33 |
| S1 — Signal Room | Claude S4.6 | GPT-5.5 | win | 10 | 0.96 | 0.88 | ✓ / · | · | · | 05-27 20:33 |
| S4b — Relay Transform | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:32 |
| S4a — Relay Lookup | Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · / · | · | · | 05-27 20:32 |
| S3 — Combined | Claude S4.6 | GPT-5.5 | win | 12 | 0.63 | 1.00 | ✓ / · | · | · | 05-27 20:31 |
| S2 — Mirrored | Claude S4.6 | GPT-5.5 | win | 11 | 0.97 | 1.00 | ✓ / · | · | · | 05-27 20:30 |
| S1 — Signal Room | Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · / · | · | · | 05-27 20:29 |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Mistral Lg | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | GLM 5.1 | win | 10 | 0.88 | 0.88 | · | · |
| Cohere Cmd-A | Mistral Lg | unstable | 43 | 0.91 | 0.40 | 2×incomplete 1×recovered | 18×incomplete |
| DeepSeek V4 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | Mistral Lg | win | 14 | 0.83 | 0.89 | · | scratch |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 14 | 0.73 | 1.00 | 1×incomplete | · |
| Gemini 3.1 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Mistral Lg | win | 14 | 0.83 | 0.89 | · | scratch |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 10 | 0.88 | 1.00 | · | · |
| Grok 4.3 | GLM 5.1 | win | 10 | 0.88 | 0.88 | · | · |
| Grok 4.3 | Mistral Lg | win | 12 | 0.90 | 0.87 | · | scratch |
| Grok 4.3 | Cohere Cmd-A | win | 13 | 1.00 | 0.83 | · | scratch |
| Grok 4.3 | DeepSeek V4 Pro | win | 10 | 0.88 | 0.88 | · | · |
| Grok 4.3 | Gemini 3.1 Pro | win | 8 | 1.00 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 12 | 0.80 | 0.90 | · | · |
| GPT-5.5 | Mistral Lg | win | 10 | 0.88 | 0.88 | · | · |
| GPT-5.5 | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| GPT-5.5 | DeepSeek V4 Pro | win | 10 | 0.88 | 0.88 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 10 | 0.88 | 0.88 | · | · |
| GPT-5.5 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 10 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 10 | 0.88 | 1.00 | · | · |
| Claude Opus 4.8 | Cohere Cmd-A | win | 20 | 0.90 | 1.00 | memory | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 1.00 | 1.00 | · | · |
| GLM 5.1 | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 8 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GLM 5.1 | unknown | 0 | 0.00 | 0.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 20 | 0.82 | 0.90 | memory | · |
| Claude Opus 4.8 | Cohere Cmd-A | win | 16 | 0.81 | 1.00 | memory 1×incomplete | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 10 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 14 | 0.91 | 0.73 | · | scratch 1×incomplete |
| Grok 4.3 | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 1.00 | · | · |
| Mistral Lg | Grok 4.3 | win | 14 | 0.67 | 0.91 | 1×incomplete 1×impossible | · |
| Mistral Lg | DeepSeek V4 | win | 16 | 0.71 | 0.97 | 1×incomplete 1×impossible | scratch |
| Mistral Lg | DeepSeek V4 | win | 10 | 0.88 | 1.00 | · | · |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Grok 4.3 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · | 1×incomplete |
| Grok 4.3 | Gemini 3.1 | win | 14 | 0.81 | 0.84 | 1×incomplete | scratch 1×incomplete |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 1.00 | 1.00 | · | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.88 | 0.88 | · | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.88 | 1.00 | · | · |
| DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.90 | 1.00 | · | · |
| DeepSeek V4 | Cohere Cmd-A | unstable | 31 | 0.82 | 1.00 | scratch 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.89 | · | scratch |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · | · |
| Mistral Lg | Grok 4.3 | win | 12 | 0.78 | 0.90 | 1×incomplete | · |
| Mistral Lg | DeepSeek V4 | win | 10 | 0.88 | 0.88 | · | · |
| Mistral Lg | DeepSeek V4 | unstable | 28 | 0.49 | 0.96 | scratch 1×incomplete 5×impossible | · |
| Mistral Lg | Cohere Cmd-A | win | 16 | 0.84 | 0.88 | 1×incomplete | · |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Mistral Lg | Cohere Cmd-A | win | 12 | 0.90 | 0.97 | · | scratch |
| Grok 4.3 | Gemini 3.1 | win | 14 | 0.71 | 0.73 | 2×incomplete | scratch 1×incomplete |
| Grok 4.3 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · | 1×incomplete |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.88 | 0.88 | · | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.86 | 0.88 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.86 | 0.88 | 1×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 14 | 0.83 | 0.97 | · | scratch |
| DeepSeek V4 | Cohere Cmd-A | win | 16 | 0.82 | 1.00 | scratch | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.74 | · | scratch |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.89 | · | scratch |
| Cohere Cmd-A | Mistral Lg | win | 14 | 1.00 | 0.83 | · | scratch |
| Cohere Cmd-A | Mistral Lg | unstable | 32 | 0.80 | 0.60 | 2×incomplete | scratch |
| Cohere Cmd-A | Grok 4.3 | win | 10 | 0.88 | 0.88 | · | · |
| Cohere Cmd-A | Mistral Lg | unknown | 1 | 0.00 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 12 | 0.78 | 0.85 | 1×incomplete | scratch 1×incomplete |
| DeepSeek V4 | Claude S4.6 | win | 8 | 1.00 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.86 | · | 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 12 | 1.00 | 0.82 | · | scratch memory 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 14 | 0.70 | 0.84 | 3×incomplete | scratch memory 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 10 | 0.88 | 0.86 | · | 1×incomplete |
| Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · | · |
| Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · | · |
| Claude S4.6 | Grok 4.3 | win | 8 | 1.00 | 0.85 | · | · |
| Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.83 | 0.89 | · | scratch |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.80 | 0.90 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 16 | 0.89 | 0.88 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 13 | 0.77 | 0.67 | 2×incomplete | scratch 3×incomplete |
| DeepSeek V4 | Claude S4.6 | win | 11 | 1.00 | 0.73 | · | scratch 2×incomplete |
| DeepSeek V4 | Claude S4.6 | win | 18 | 0.61 | 0.64 | 5×incomplete | scratch 2×incomplete 1×impossible |
| GPT-5.5 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · | 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 12 | 0.88 | 0.82 | 1×incomplete | scratch memory 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 10 | 1.00 | 0.82 | · | scratch 1×incomplete |
| Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.86 | · | 1×incomplete |
| Claude S4.6 | Grok 4.3 | win | 10 | 0.88 | 0.88 | · | · |
| Claude S4.6 | Grok 4.3 | win | 10 | 0.96 | 1.00 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 8 | 0.95 | 1.00 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 10 | 0.96 | 0.88 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 8 | 1.00 | 1.00 | · | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Mistral Lg | GLM 5.1 | win | 21 | 0.56 | 0.94 | scratch | · |
| Cohere Cmd-A | GLM 5.1 | win | 9 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 9 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | GLM 5.1 | win | 11 | 0.90 | 0.88 | · | · |
| DeepSeek V4 Pro | Mistral Lg | win | 11 | 0.90 | 0.88 | · | · |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 7 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | GLM 5.1 | win | 9 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Mistral Lg | win | 9 | 0.88 | 1.00 | · | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 11 | 0.90 | 0.88 | · | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 11 | 0.90 | 0.88 | · | · |
| Grok 4.3 | GLM 5.1 | win | 11 | 0.90 | 0.88 | · | · |
| Grok 4.3 | Mistral Lg | win | 11 | 0.90 | 0.88 | · | · |
| Grok 4.3 | Cohere Cmd-A | win | 13 | 0.83 | 0.80 | · | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 15 | 0.68 | 0.73 | 2×incomplete | 1×incomplete |
| Grok 4.3 | Gemini 3.1 Pro | win | 9 | 0.88 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 9 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Mistral Lg | win | 7 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Cohere Cmd-A | win | 9 | 0.88 | 1.00 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 9 | 0.88 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 9 | 0.88 | 1.00 | · | · |
| GPT-5.5 | Grok 4.3 | win | 9 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 9 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 9 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Cohere Cmd-A | win | 7 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 9 | 0.96 | 1.00 | scratch | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 9 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 7 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 9 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 9 | 0.88 | 1.00 | · | · |
| Grok 4.3 | GLM 5.1 | win | 11 | 0.88 | 0.88 | 1×incomplete | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 9 | 0.88 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 7 | 1.00 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 9 | 1.00 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 9 | 0.88 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 11 | 0.78 | 0.88 | 1×incomplete | · |
| GPT-5.5 | Gemini 3.1 | win | 9 | 0.86 | 1.00 | 1×incomplete | · |
| GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 11 | 0.73 | 0.72 | scratch 2×incomplete | 2×incomplete |
| Claude S4.6 | Grok 4.3 | win | 11 | 0.97 | 1.00 | scratch | · |
| Claude S4.6 | Grok 4.3 | win | 13 | 0.67 | 0.77 | scratch 3×incomplete | 2×incomplete |
| Claude S4.6 | GPT-5.5 | win | 9 | 0.96 | 1.00 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 11 | 0.73 | 0.72 | scratch 2×incomplete | 2×incomplete |
| Claude S4.6 | GPT-5.5 | win | 9 | 0.82 | 1.00 | scratch 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 9 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 13 | 0.83 | 1.00 | scratch | · |
| Cohere Cmd-A | Mistral Lg | win | 9 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.98 | 1.00 | scratch | · |
| DeepSeek V4 | Claude S4.6 | win | 13 | 0.80 | 0.80 | · | scratch |
| DeepSeek V4 | Claude S4.6 | win | 11 | 0.90 | 0.88 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 14 | 0.87 | 0.82 | scratch | scratch |
| GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 9 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 9 | 0.88 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 9 | 0.96 | 1.00 | scratch | · |
| Claude S4.6 | Grok 4.3 | win | 9 | 0.96 | 1.00 | scratch | · |
| Claude S4.6 | Grok 4.3 | win | 9 | 0.96 | 1.00 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 11 | 0.97 | 1.00 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 11 | 0.97 | 1.00 | scratch | · |
| Claude S4.6 | GPT-5.5 | win | 11 | 0.97 | 1.00 | scratch | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Cohere Cmd-A | DeepSeek V4 Pro | unknown | 1 | 1.00 | 0.00 | · | · |
| Mistral Lg | GLM 5.1 | win | 8 | 0.65 | 1.00 | 2×incomplete | · |
| Cohere Cmd-A | GLM 5.1 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | unstable | 27 | 0.41 | 0.52 | 10×incomplete 1×recovered | scratch 1×incomplete 1×recovered |
| DeepSeek V4 Pro | GLM 5.1 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| DeepSeek V4 Pro | Mistral Lg | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 29 | 0.49 | 0.90 | 4×incomplete | 1×recovered |
| Gemini 3.1 Pro | GLM 5.1 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| Gemini 3.1 Pro | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 12 | 0.68 | 0.83 | 1×incomplete | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 12 | 0.57 | 0.87 | 2×incomplete | 1×premature |
| Grok 4.3 | GLM 5.1 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| Grok 4.3 | Mistral Lg | win | 12 | 0.57 | 0.83 | 2×incomplete | · |
| Grok 4.3 | Cohere Cmd-A | win | 12 | 0.57 | 0.83 | 2×incomplete | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Grok 4.3 | Gemini 3.1 Pro | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| GPT-5.5 | GLM 5.1 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| GPT-5.5 | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| GPT-5.5 | Cohere Cmd-A | win | 16 | 0.59 | 0.75 | 3×incomplete | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| GPT-5.5 | Grok 4.3 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Claude Opus 4.8 | GLM 5.1 | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Claude Opus 4.8 | Mistral Lg | win | 12 | 0.57 | 0.88 | 2×incomplete | 1×incomplete |
| Claude Opus 4.8 | Cohere Cmd-A | unstable | 22 | 0.76 | 0.45 | 2×incomplete | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Claude Opus 4.8 | Grok 4.3 | win | 14 | 0.53 | 0.91 | 3×incomplete | · |
| Claude Opus 4.8 | GPT-5.5 | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 14 | 0.60 | 0.86 | memory 2×incomplete | · |
| Grok 4.3 | GLM 5.1 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 13 | 0.53 | 1.00 | 3×incomplete | · |
| Claude Opus 4.8 | Mistral Lg | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Cohere Cmd-A | DeepSeek V4 Pro | unknown | 3 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | unknown | 1 | 1.00 | 0.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 1.00 | 3×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.51 | 6×incomplete 1×recovered | scratch 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | DeepSeek V4 | win | 18 | 0.48 | 0.87 | 5×incomplete | · |
| Grok 4.3 | DeepSeek V4 | unknown | 0 | 0.00 | 0.00 | · | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.90 | 4×incomplete | scratch |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.82 | 4×incomplete | scratch |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 20 | 0.56 | 0.87 | 6×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | 4×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | unstable | 24 | 0.49 | 0.57 | 7×incomplete | 5×premature |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.68 | 1.00 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 14 | 0.60 | 0.91 | scratch 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 18 | 0.48 | 0.84 | 5×incomplete | scratch |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 13 | 0.53 | 1.00 | 3×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | 4×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 14 | 0.49 | 0.91 | 2×incomplete 1×impossible | · |
| Gemini 3.1 | DeepSeek V4 | win | 11 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 13 | 0.53 | 1.00 | 3×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | 4×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 10 | 0.62 | 1.00 | 1×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 16 | 0.50 | 0.85 | 4×incomplete | · |
| Gemini 3.1 | DeepSeek V4 | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| DeepSeek V4 | Grok 4.3 | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 22 | 0.59 | 0.93 | 3×incomplete | 1×recovered |
| DeepSeek V4 | Cohere Cmd-A | win | 14 | 0.60 | 0.86 | scratch 2×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 19 | 0.60 | 0.92 | scratch 2×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 14 | 0.60 | 1.00 | scratch 2×incomplete | · |
| DeepSeek V4 | Cohere Cmd-A | win | 12 | 0.57 | 1.00 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | unstable | 28 | 0.46 | 0.56 | 9×incomplete | scratch |
| Cohere Cmd-A | Mistral Lg | win | 16 | 0.68 | 1.00 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 0.91 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.73 | 0.91 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 0.61 | 3×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.73 | 0.91 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.64 | 6×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 0.61 | 3×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.53 | 1.00 | 3×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 18 | 0.48 | 0.52 | 5×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 16 | 0.50 | 0.64 | 4×incomplete | scratch 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 0.91 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | unstable | 27 | 0.41 | 0.25 | 10×incomplete 1×recovered | 1×incomplete 1×recovered |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | unstable | 17 | 0.48 | 0.41 | 5×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 1.00 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 16 | 0.50 | 0.54 | 4×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 12 | 0.57 | 0.87 | 2×incomplete | scratch |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 0.91 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 18 | 0.63 | 1.00 | 3×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.88 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 18 | 0.48 | 0.48 | 5×incomplete | 1×incomplete 1×recovered |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.73 | 0.91 | 1×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.90 | 6×incomplete | scratch |
| Cohere Cmd-A | Mistral Lg | win | 19 | 0.74 | 0.89 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 28 | 0.41 | 0.38 | 10×incomplete | 1×incomplete 2×recovered |
| Cohere Cmd-A | Mistral Lg | win | 13 | 0.53 | 1.00 | 3×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 14 | 0.63 | 1.00 | 2×incomplete | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | DeepSeek V4 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| DeepSeek V4 | Claude S4.6 | win | 12 | 0.57 | 0.73 | 2×incomplete | scratch 2×incomplete |
| DeepSeek V4 | Claude S4.6 | win | 14 | 0.53 | 0.77 | 3×incomplete | scratch 2×incomplete |
| DeepSeek V4 | Claude S4.6 | win | 14 | 0.54 | 0.77 | 2×incomplete | scratch 2×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| GPT-5.5 | Gemini 3.1 | win | 14 | 0.73 | 0.80 | 1×incomplete | 1×premature |
| GPT-5.5 | Gemini 3.1 | win | 14 | 0.60 | 0.91 | memory 2×incomplete | · |
| Claude S4.6 | Grok 4.3 | win | 18 | 0.64 | 1.00 | scratch 2×incomplete | · |
| Claude S4.6 | Grok 4.3 | win | 18 | 0.64 | 1.00 | scratch 2×incomplete | · |
| Claude S4.6 | Grok 4.3 | win | 8 | 0.68 | 0.82 | 1×incomplete | 1×incomplete |
| Claude S4.6 | GPT-5.5 | win | 10 | 0.74 | 1.00 | 1×incomplete | · |
| Claude S4.6 | GPT-5.5 | win | 12 | 0.63 | 1.00 | scratch 2×incomplete | · |
| Claude S4.6 | GPT-5.5 | win | 8 | 0.68 | 0.82 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.62 | 0.86 | 1×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 20 | 0.46 | 0.73 | 6×incomplete | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | unstable | 28 | 0.46 | 0.32 | 9×incomplete 1×recovered | scratch 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 19 | 0.48 | 1.00 | 8×incomplete | · |
| DeepSeek V4 | Claude S4.6 | win | 11 | 0.62 | 0.85 | 1×incomplete | scratch 1×incomplete |
| DeepSeek V4 | Claude S4.6 | unstable | 29 | 0.70 | 0.52 | · | 17×incomplete 1×recovered |
| DeepSeek V4 | Claude S4.6 | unstable | 27 | 0.70 | 0.73 | · | 9×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 12 | 0.57 | 0.90 | 2×incomplete | · |
| GPT-5.5 | Gemini 3.1 | win | 12 | 0.68 | 0.90 | 1×incomplete | · |
| GPT-5.5 | Gemini 3.1 | win | 12 | 0.55 | 0.85 | 3×incomplete | scratch 1×incomplete |
| Claude S4.6 | Grok 4.3 | win | 16 | 0.70 | 0.80 | scratch 2×incomplete | 2×premature |
| Claude S4.6 | Grok 4.3 | win | 14 | 0.59 | 1.00 | scratch 3×incomplete | · |
| Claude S4.6 | Grok 4.3 | win | 20 | 0.59 | 1.00 | scratch 3×incomplete | · |
| Claude S4.6 | GPT-5.5 | win | 32 | 0.43 | 0.75 | scratch 1×incomplete 7×impossible | 5×premature |
| Claude S4.6 | GPT-5.5 | win | 12 | 0.63 | 1.00 | scratch 2×incomplete | · |
| Claude S4.6 | GPT-5.5 | win | 12 | 0.63 | 1.00 | scratch 2×incomplete | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Mistral Lg | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | GLM 5.1 | win | 10 | 0.76 | 0.88 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.76 | 0.88 | · | · |
| DeepSeek V4 Pro | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 8 | 0.85 | 0.85 | · | · |
| Gemini 3.1 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| Grok 4.3 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| Grok 4.3 | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Grok 4.3 | Cohere Cmd-A | win | 8 | 0.85 | 0.85 | · | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 9 | 0.76 | 1.00 | · | · |
| Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Cohere Cmd-A | unstable | 14 | 1.00 | 0.29 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Cohere Cmd-A | unstable | 18 | 1.00 | 0.44 | · | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 12 | 1.00 | 0.73 | · | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · | · |
| Grok 4.3 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | unstable | 14 | 1.00 | 0.29 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 6 | 0.80 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 6 | 0.80 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 6 | 0.80 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | unstable | 16 | 1.00 | 0.12 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 12 | 1.00 | 0.50 | · | 1×recovered |
| Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 11 | 0.85 | 1.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 10 | 0.83 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | unstable | 14 | 0.91 | 0.29 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | unstable | 18 | 0.98 | 0.33 | scratch | · |
| Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Mistral Lg | GLM 5.1 | win | 8 | 1.00 | 0.95 | · | scratch |
| Cohere Cmd-A | GLM 5.1 | win | 10 | 0.76 | 0.72 | · | 2×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.76 | 0.76 | · | · |
| DeepSeek V4 Pro | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 14 | 0.73 | 0.91 | 1×incomplete | · |
| Gemini 3.1 Pro | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Mistral Lg | win | 8 | 1.00 | 0.95 | · | scratch |
| Gemini 3.1 Pro | Cohere Cmd-A | unstable | 14 | 1.00 | 0.29 | · | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| Grok 4.3 | GLM 5.1 | win | 8 | 0.70 | 0.82 | · | 1×incomplete |
| Grok 4.3 | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Grok 4.3 | Cohere Cmd-A | win | 6 | 1.00 | 1.00 | · | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · | · |
| Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 8 | 0.85 | 0.82 | · | 1×incomplete |
| GPT-5.5 | Mistral Lg | unstable | 14 | 1.00 | 0.29 | · | · |
| GPT-5.5 | Cohere Cmd-A | unstable | 18 | 0.80 | 0.33 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Grok 4.3 | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 10 | 0.88 | 0.82 | · | scratch 1×incomplete |
| Claude Opus 4.8 | Mistral Lg | win | 8 | 0.85 | 0.85 | · | · |
| Claude Opus 4.8 | Cohere Cmd-A | win | 10 | 0.88 | 0.88 | · | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 10 | 0.88 | 1.00 | · | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 8 | 0.85 | 0.85 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 8 | 0.70 | 0.85 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 8 | 0.85 | 0.82 | · | 1×incomplete |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 12 | 1.00 | 0.83 | · | scratch |
| Grok 4.3 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 8 | 0.85 | 0.85 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 8 | 0.85 | 0.95 | · | scratch |
| DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.82 | · | scratch 1×incomplete |
| DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.82 | · | scratch 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 8 | 0.85 | 0.95 | · | scratch |
| GPT-5.5 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 8 | 0.85 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 8 | 0.85 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 8 | 0.85 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | Mistral Lg | unstable | 14 | 1.00 | 0.29 | · | · |
| Cohere Cmd-A | Mistral Lg | unstable | 13 | 1.00 | 0.33 | · | · |
| Cohere Cmd-A | Mistral Lg | unstable | 14 | 1.00 | 0.14 | · | · |
| Cohere Cmd-A | Mistral Lg | unknown | 4 | 1.00 | 0.00 | · | · |
| DeepSeek V4 | Claude S4.6 | win | 10 | 0.88 | 0.82 | · | scratch 1×incomplete |
| GPT-5.5 | Gemini 3.1 | win | 8 | 1.00 | 0.95 | · | scratch |
| GPT-5.5 | Gemini 3.1 | win | 8 | 1.00 | 0.95 | · | scratch |
| Claude S4.6 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | Grok 4.3 | win | 8 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Claude S4.6 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Mistral Lg | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| Cohere Cmd-A | GLM 5.1 | win | 10 | 0.64 | 0.86 | · | 1×incomplete |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.76 | 0.72 | · | 2×incomplete |
| DeepSeek V4 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| DeepSeek V4 Pro | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 10 | 0.76 | 0.88 | · | · |
| Gemini 3.1 Pro | GLM 5.1 | win | 14 | 1.00 | 0.91 | · | · |
| Gemini 3.1 Pro | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 6 | 1.00 | 1.00 | · | · |
| Grok 4.3 | GLM 5.1 | win | 8 | 1.00 | 0.85 | · | · |
| Grok 4.3 | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| Grok 4.3 | Cohere Cmd-A | win | 10 | 0.88 | 0.88 | · | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 6 | 1.00 | 0.77 | · | 1×incomplete |
| Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| GPT-5.5 | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 8 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| Claude Opus 4.8 | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 6 | 1.00 | 0.80 | · | · |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| GLM 5.1 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · | · |
| Llama 3.3 70B | Llama 3.3 70B | unknown | 0 | 0.00 | 0.00 | · | · |
| Llama 3.3 70B | Llama 3.3 70B | unknown | 0 | 0.00 | 0.00 | · | · |
| Llama 3.3 70B | Llama 3.3 70B | unknown | 0 | 0.00 | 0.00 | · | · |
| Llama 3.3 70B | Mistral Lg | unknown | 1 | 0.00 | 1.00 | · | · |
| DeepSeek V4 | Gemini 3.1 | win | 12 | 0.70 | 0.78 | · | 1×incomplete |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Mistral Lg | GLM 5.1 | win | 10 | 0.96 | 0.86 | scratch | 1×incomplete |
| Cohere Cmd-A | GLM 5.1 | win | 18 | 0.67 | 0.93 | · | · |
| Cohere Cmd-A | Mistral Lg | win | 10 | 0.64 | 1.00 | · | · |
| DeepSeek V4 Pro | GLM 5.1 | win | 10 | 1.00 | 0.74 | · | 1×incomplete |
| DeepSeek V4 Pro | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| DeepSeek V4 Pro | Cohere Cmd-A | win | 10 | 0.88 | 0.88 | · | · |
| Gemini 3.1 Pro | GLM 5.1 | win | 8 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 10 | 1.00 | 0.88 | · | · |
| Grok 4.3 | GLM 5.1 | win | 12 | 1.00 | 0.90 | · | · |
| Grok 4.3 | Mistral Lg | win | 6 | 1.00 | 1.00 | · | · |
| Grok 4.3 | Cohere Cmd-A | win | 16 | 1.00 | 0.93 | · | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 8 | 1.00 | 1.00 | · | · |
| Grok 4.3 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| GPT-5.5 | Cohere Cmd-A | win | 10 | 1.00 | 0.88 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 6 | 1.00 | 0.80 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 10 | 1.00 | 0.88 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 0.72 | · | 2×incomplete |
| Claude Opus 4.8 | Cohere Cmd-A | win | 8 | 1.00 | 0.85 | · | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 8 | 1.00 | 0.82 | · | 1×incomplete |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GPT-5.5 | win | 6 | 1.00 | 1.00 | · | · |
| GLM 5.1 | Gemini 3.1 | win | 8 | 1.00 | 1.00 | · | · |
| DeepSeek V4 | Gemini 3.1 | win | 8 | 0.85 | 0.82 | · | 1×incomplete |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| Gemini 3.1 Pro | GLM 5.1 | unknown | 9 | 0.00 | 0.84 | · | 2×incomplete |
| Gemini 3.1 Pro | Mistral Lg | unstable | 48 | 1.00 | 0.72 | · | 17×incomplete 3×recovered |
| Gemini 3.1 Pro | Cohere Cmd-A | win | 16 | 0.93 | 0.93 | · | · |
| Gemini 3.1 Pro | DeepSeek V4 Pro | win | 10 | 1.00 | 1.00 | · | · |
| Grok 4.3 | GLM 5.1 | win | 12 | 1.00 | 0.97 | · | scratch |
| Grok 4.3 | Mistral Lg | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| Grok 4.3 | Cohere Cmd-A | win | 16 | 0.93 | 0.93 | · | · |
| Grok 4.3 | DeepSeek V4 Pro | win | 12 | 1.00 | 1.00 | · | · |
| Grok 4.3 | Gemini 3.1 Pro | win | 10 | 1.00 | 1.00 | · | · |
| GPT-5.5 | GLM 5.1 | win | 14 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Mistral Lg | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| GPT-5.5 | Cohere Cmd-A | win | 16 | 0.85 | 0.93 | · | · |
| GPT-5.5 | DeepSeek V4 Pro | win | 10 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Gemini 3.1 Pro | win | 10 | 1.00 | 1.00 | · | · |
| GPT-5.5 | Grok 4.3 | win | 10 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | GLM 5.1 | win | 10 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Mistral Lg | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| Claude Opus 4.8 | Cohere Cmd-A | win | 16 | 0.93 | 0.93 | · | · |
| Claude Opus 4.8 | DeepSeek V4 Pro | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| Claude Opus 4.8 | Gemini 3.1 Pro | win | 10 | 1.00 | 1.00 | · | · |
| Claude Opus 4.8 | Grok 4.3 | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| Claude Opus 4.8 | GPT-5.5 | win | 10 | 1.00 | 0.86 | · | 1×incomplete |
| GLM 5.1 | Gemini 3.1 | win | 12 | 1.00 | 0.78 | · | 1×incomplete |
| DeepSeek V4 | Gemini 3.1 | win | 8 | 1.00 | 0.82 | · | 1×incomplete |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |
| Agent A | Agent B | Outcome | Turns | A Score | B Score | A Behaviour | B Behaviour |
|---|---|---|---|---|---|---|---|
| GLM 5.1 | Gemini 3.1 | win | 6 | 1.00 | 1.00 | · | · |
| DeepSeek V4 | Gemini 3.1 | win | 8 | 0.85 | 0.82 | · | 1×incomplete |
| GPT-5.5 | Grok 4.3 | win | 6 | 1.00 | 1.00 | · | · |