// synthetic data engine — research output

Agentic Collaboration
Behavioural Report

Generated 2026-05-30 22:11  ·  525 sessions across 9 scenarios
01 //

Overview

525
Total Sessions
487
Win States Reached
93%
Win Rate
0.86
Avg Training Score
0
Timeouts
0
Instability Closes
02 //

Run Summary

Run Label Run ID Scenarios Wins Unstable Total Turns Avg Score Started
epoch_cohere_deepseek 046b0253 1 0 0 1 1.00 05-30 16:25
- e7ece5c0 1 1 0 8 1.00 05-30 09:54
FAIR_MATRIX_epoch_001_perturbation_standard_relay_chamber_5c_gemini_zai 7974f123 8 208 9 2200 0.90 05-30 09:53
fair_matrix_epoch_001_normal_signal_room_v1_anthropic_zai e63d6813 1 6 0 70 0.94 05-30 09:40
comparison_gemini_cohere 8052ef50 5 5 0 57 0.87 05-29 20:21
comparison_grok_zai 2efc8fb6 5 5 0 41 0.94 05-29 20:17
comparison_gpt_deepseek eeaf3b28 5 5 0 46 0.93 05-29 20:11
comparison_claude_mistral 2c01bd7d 5 4 1 51 0.86 05-29 20:05
epoch_cohere_deepseek 22d2628b 1 0 0 3 1.00 05-29 19:54
- 4af94623 4 4 0 32 0.97 05-29 19:36
- 5e0f6aee 1 0 0 0 0.00 05-29 19:22
- 0e05ca2c 1 0 0 0 0.00 05-29 19:20
- 55f18d1a 1 0 0 0 0.00 05-29 19:15
- 96c6e641 1 0 0 1 1.00 05-29 19:13
- e67d6c75 4 4 0 36 0.83 05-29 19:07
- f3901c2c 4 4 0 24 1.00 05-29 18:59
epoch_cohere_mistral 22c07706 1 0 0 1 1.00 05-29 08:37
epoch_cohere_mistral fbbb6542 1 1 0 14 0.77 05-29 08:35
epoch_cohere_mistral 9f2a94bc 1 1 0 10 0.74 05-29 08:34
epoch_cohere_mistral b4dfb05e 1 1 0 10 0.74 05-29 08:32
epoch_cohere_mistral 4b5f1575 1 1 0 20 0.48 05-29 08:30
epoch_cohere_mistral 8a4dc955 1 1 0 10 0.74 05-29 08:28
epoch_cohere_mistral a8a58996 1 1 0 10 0.74 05-29 08:27
epoch_cohere_mistral 05b9ba96 1 1 0 10 0.81 05-29 08:26
epoch_cohere_mistral 54124a8a 1 1 0 10 0.74 05-29 08:24
epoch_cohere_deepseek cbab34dd 1 1 0 18 0.68 05-29 08:23
epoch_grok_deepseek 797d56e8 1 0 0 0 0.00 05-28 21:00
epoch_gemini_deepseek 081e643c 1 1 0 12 0.78 05-28 20:59
epoch_gemini_deepseek 92af76d1 1 1 0 16 0.70 05-28 20:58
epoch_gemini_deepseek d5bc2533 1 1 0 12 0.78 05-28 20:57
epoch_gemini_deepseek 036ccd08 1 1 0 16 0.66 05-28 20:55
epoch_gemini_deepseek b191889f 1 1 0 12 0.78 05-28 20:54
epoch_gemini_deepseek ca035fa3 1 1 0 20 0.72 05-28 20:52
epoch_gemini_deepseek 240d6201 1 1 0 16 0.68 05-28 20:50
epoch_gemini_deepseek 3b769242 1 0 1 24 0.53 05-28 20:48
epoch_gemini_deepseek eae08873 1 1 0 12 0.78 05-28 20:48
epoch_gemini_deepseek f2cc183d 1 1 0 12 0.84 05-28 20:47
epoch_gemini_deepseek ad228094 1 1 0 14 0.76 05-28 20:45
epoch_gemini_deepseek d2f54c19 1 1 0 18 0.66 05-28 20:44
epoch_gemini_deepseek 019761c2 1 1 0 12 0.78 05-28 20:43
epoch_gemini_deepseek fb6f7db4 1 1 0 10 0.75 05-28 20:42
epoch_gemini_deepseek e133c8de 1 1 0 13 0.77 05-28 20:41
epoch_gemini_deepseek 58ac31f1 1 1 0 12 0.78 05-28 20:40
epoch_gemini_deepseek cbef1a8c 1 1 0 16 0.68 05-28 20:38
epoch_gemini_deepseek 2f7f1c18 1 1 0 12 0.78 05-28 20:38
epoch_gemini_deepseek 3eec062f 1 1 0 12 0.78 05-28 20:37
epoch_gemini_deepseek 825725ec 1 1 0 12 0.73 05-28 20:36
epoch_gemini_deepseek 7129e914 1 1 0 14 0.70 05-28 20:34
epoch_gemini_deepseek 96093854 1 1 0 11 0.78 05-28 20:33
epoch_gemini_deepseek 198adf1b 1 1 0 12 0.78 05-28 20:32
epoch_gemini_deepseek 2e2716ba 1 1 0 10 0.81 05-28 20:32
epoch_gemini_deepseek 1d61b8c2 1 1 0 12 0.78 05-28 20:31
epoch_gemini_deepseek 7eea2821 1 1 0 10 0.81 05-28 20:30
epoch_gemini_deepseek 813027b7 1 1 0 12 0.73 05-28 20:29
epoch_gemini_deepseek 7bef6cb4 1 1 0 12 0.78 05-28 20:28
epoch_gemini_deepseek dbcea488 1 1 0 12 0.78 05-28 20:27
epoch_gemini_deepseek 58016d69 1 1 0 13 0.77 05-28 20:26
epoch_gemini_deepseek e964000f 1 1 0 16 0.68 05-28 20:24
epoch_gemini_deepseek fc7d23e3 1 1 0 10 0.81 05-28 20:23
epoch_gemini_deepseek e41a0ff9 1 1 0 10 0.81 05-28 20:23
epoch_gemini_deepseek fca74501 1 1 0 16 0.68 05-28 20:21
epoch_gemini_deepseek 6aa592d2 1 1 0 12 0.78 05-28 20:21
epoch_deepseek_grok 65d241f5 1 1 0 10 0.75 05-28 20:20
epoch_deepseek_cohere df1b0d33 1 1 0 22 0.76 05-28 20:15
epoch_deepseek_cohere f4e79c6b 1 1 0 14 0.73 05-28 20:14
epoch_deepseek_cohere 6d958adb 1 1 0 19 0.76 05-28 20:10
epoch_deepseek_cohere 8ee9f8fa 1 1 0 12 0.78 05-28 20:09
epoch_deepseek_cohere 2d2751d2 1 1 0 12 0.78 05-28 20:08
epoch_deepseek_cohere 5f98d552 1 1 0 14 0.80 05-28 20:07
epoch_deepseek_cohere d7cef8f6 1 1 0 12 0.78 05-28 20:06
epoch_cohere_mistral 63ab75b7 1 0 1 28 0.51 05-28 20:02
epoch_cohere_mistral 52e0fc5a 1 1 0 16 0.84 05-28 19:59
epoch_cohere_mistral 7e1860de 1 1 0 14 0.77 05-28 19:57
epoch_cohere_mistral ebe14f06 1 1 0 14 0.82 05-28 19:54
epoch_cohere_mistral 0fe0b1c8 1 1 0 14 0.57 05-28 19:52
epoch_cohere_mistral 18c8ea4e 1 1 0 14 0.82 05-28 19:50
epoch_cohere_mistral 1d4644d4 1 1 0 10 0.75 05-28 19:49
epoch_cohere_mistral 2c3f60c7 1 1 0 20 0.55 05-28 19:46
epoch_cohere_mistral 674cb1a7 1 1 0 14 0.57 05-28 19:44
epoch_cohere_mistral 9cb1610a 1 1 0 14 0.77 05-28 19:42
epoch_cohere_mistral 52b966d8 1 1 0 18 0.50 05-28 19:40
epoch_cohere_mistral eae7e83c 1 1 0 16 0.57 05-28 19:38
epoch_cohere_mistral c0b2f935 1 1 0 14 0.77 05-28 19:37
epoch_cohere_mistral 5470370d 1 0 1 27 0.33 05-28 19:33
epoch_cohere_mistral e7285cd9 1 1 0 10 0.75 05-28 19:32
epoch_cohere_mistral 0bae673e 1 0 1 17 0.44 05-28 19:29
epoch_cohere_mistral 22a39a5e 1 1 0 14 0.81 05-28 19:28
epoch_cohere_mistral afaa514d 1 1 0 10 0.75 05-28 19:26
epoch_cohere_mistral e5a11a9c 1 1 0 10 0.75 05-28 19:25
epoch_cohere_mistral 65e32232 1 1 0 16 0.52 05-28 19:23
epoch_cohere_mistral 507eb857 1 1 0 10 0.75 05-28 19:21
epoch_cohere_mistral 3ceea561 1 1 0 12 0.72 05-28 19:19
epoch_cohere_mistral 53c4b995 1 1 0 14 0.77 05-28 19:18
epoch_cohere_mistral b26e96cf 1 1 0 18 0.81 05-28 19:15
epoch_cohere_mistral 37ab60f8 1 1 0 10 0.75 05-28 19:14
epoch_cohere_mistral d6aee988 1 1 0 18 0.48 05-28 19:12
epoch_cohere_mistral 07b5c403 1 1 0 14 0.82 05-28 19:10
epoch_cohere_mistral 85532b11 1 1 0 20 0.68 05-28 19:07
epoch_cohere_mistral aaa17628 1 1 0 19 0.81 05-28 19:03
epoch_cohere_mistral 92858264 1 1 0 28 0.40 05-28 18:58
epoch_cohere_mistral e8c5442d 1 1 0 13 0.77 05-28 18:54
epoch_cohere_mistral 3ca45fa3 1 1 0 14 0.81 05-28 18:53
epoch_cohere_mistral 362e82cf 1 1 0 10 0.74 05-28 18:52
epoch_cohere_deepseek 8b4cb5b2 1 1 0 12 0.73 05-28 18:51
epoch_mistral_grok 5bf0da81 1 1 0 14 0.79 05-28 18:34
epoch_mistral_deepseek 4a5de436 1 1 0 16 0.84 05-28 18:32
epoch_mistral_deepseek 4d0f2123 1 1 0 10 0.94 05-28 18:30
epoch_mistral_cohere d321ea10 1 1 0 12 0.94 05-28 18:28
epoch_mistral_cohere d8d2cd81 1 1 0 12 0.94 05-28 18:27
epoch_mistral_cohere d30a97be 1 1 0 12 0.94 05-28 18:25
epoch_mistral_cohere 50e277b2 1 1 0 12 0.94 05-28 18:23
epoch_mistral_cohere 598ac173 1 1 0 12 0.94 05-28 18:21
epoch_grok_gemini 3ef5f0a0 1 1 0 8 0.91 05-28 18:20
epoch_grok_gemini 0db16168 1 1 0 14 0.82 05-28 18:20
epoch_gemini_deepseek b632c521 1 1 0 10 1.00 05-28 18:19
epoch_gemini_deepseek 39a24cab 1 1 0 10 0.88 05-28 18:18
epoch_gemini_deepseek dc116a62 1 1 0 10 0.94 05-28 18:17
epoch_deepseek_cohere 15fad396 1 1 0 12 0.95 05-28 18:16
epoch_deepseek_cohere a197fdb5 1 0 1 31 0.91 05-28 18:11
epoch_cohere_mistral 2159c0a8 1 1 0 12 0.85 05-28 18:10
epoch_cohere_mistral 7a15ae13 1 1 0 14 0.86 05-28 18:08
epoch_cohere_mistral eaef1895 1 1 0 12 0.85 05-28 18:06
epoch_cohere_mistral d8c2fd37 1 1 0 12 0.85 05-28 18:04
epoch_mistral_grok a62cc66b 1 1 0 12 0.84 05-28 17:59
epoch_mistral_deepseek c4c4c5c0 1 1 0 10 0.88 05-28 17:58
epoch_mistral_deepseek 781b189c 1 0 1 28 0.72 05-28 17:54
epoch_mistral_cohere 9db80718 1 1 0 16 0.86 05-28 17:52
epoch_mistral_cohere c7bf358e 1 1 0 12 0.94 05-28 17:50
epoch_mistral_cohere 7938228e 1 1 0 12 0.94 05-28 17:48
epoch_mistral_cohere 632c504b 1 1 0 12 0.94 05-28 17:47
epoch_mistral_cohere bce7f5e9 1 1 0 12 0.94 05-28 17:45
epoch_grok_gemini a62b43c0 1 1 0 14 0.72 05-28 17:44
epoch_grok_gemini 625ceac0 1 1 0 8 0.91 05-28 17:44
epoch_gemini_deepseek 8cb33f8d 1 1 0 10 0.88 05-28 17:43
epoch_gemini_deepseek 9d67f187 1 1 0 10 0.87 05-28 17:42
epoch_gemini_deepseek d23f99a8 1 1 0 10 0.87 05-28 17:42
epoch_deepseek_cohere a0fecad8 1 1 0 14 0.90 05-28 17:40
epoch_deepseek_cohere d8f47e9a 1 1 0 16 0.91 05-28 17:39
epoch_cohere_mistral 93b50a39 1 1 0 14 0.78 05-28 17:37
epoch_cohere_mistral 9205fede 1 1 0 14 0.86 05-28 17:35
epoch_cohere_mistral dd52d925 1 1 0 14 0.92 05-28 17:33
epoch_cohere_mistral db759cf0 1 0 1 32 0.70 05-28 17:29
- ca752992 1 1 0 10 0.88 05-28 17:24
epoch_cohere_mistral_001 95c9577e 1 0 0 1 1.00 05-28 16:50
epoch_deepseek_claude_003 2fd76d96 5 5 0 47 0.85 05-28 16:45
epoch_deepseek_claude_002 3c4eab20 5 5 0 47 0.87 05-28 16:40
epoch_deepseek_claude_001 a171b6b3 5 5 0 51 0.82 05-28 16:35
epoch_gpt_gemini_003 836185ac 4 3 1 49 0.78 05-28 16:32
epoch_gpt_gemini_002 e575559a 5 5 0 51 0.89 05-28 16:29
epoch_gpt_gemini_001 d2d25db4 5 5 0 51 0.88 05-28 16:26
epoch_claude_grok_003 a7b3a7bd 5 5 0 53 0.87 05-28 16:21
epoch_claude_grok_002 a118c9b4 5 5 0 53 0.92 05-28 16:17
epoch_claude_grok_001 38ebf1f8 5 5 0 43 0.86 05-28 16:14
epoch_claude_gpt_003 e8877710 5 5 0 39 0.97 05-28 16:11
epoch_claude_gpt_002 8ceafb14 5 5 0 43 0.91 05-28 16:08
epoch_claude_gpt_001 630a49ba 5 5 0 37 0.93 05-28 16:05
comparison_cohere_mistral_003 b2384ca1 5 4 1 53 0.85 05-27 21:55
comparison_cohere_mistral_002 af299df9 5 4 1 64 0.81 05-27 21:44
comparison_cohere_mistral_001 4ef1dbd1 5 3 2 69 0.76 05-27 21:32
epoch_cohere_mistral_001 cf1de6b6 5 4 0 62 0.81 05-27 21:17
epoch_deepseek_claude_003 8244a1bc 5 5 0 57 0.80 05-27 21:12
epoch_deepseek_claude_002 de618f05 3 2 1 51 0.79 05-27 21:06
epoch_deepseek_claude_001 25c996e8 3 2 1 59 0.73 05-27 20:58
epoch_gpt_gemini_003 c3a6b228 4 3 1 43 0.81 05-27 20:56
epoch_gpt_gemini_002 b008409b 5 5 0 47 0.92 05-27 20:53
epoch_gpt_gemini_001 a016322a 5 5 0 45 0.91 05-27 20:51
epoch_claude_grok_003 b57d88b2 5 5 0 47 0.92 05-27 20:48
epoch_claude_grok_002 50213929 4 3 1 51 0.83 05-27 20:44
epoch_claude_grok_001 ffac0184 5 5 0 53 0.95 05-27 20:41
epoch_claude_gpt_003 62cc15b5 5 5 0 63 0.91 05-27 20:36
epoch_claude_gpt_002 bf9b97fb 5 5 0 45 0.94 05-27 20:33
epoch_claude_gpt_001 4b8fd274 5 5 0 43 0.96 05-27 20:29
03 //

Model Progress

Claude Opus 4.8

67 agent sessions  Â·  64 wins  Â·  latest minus first: +0.00
1.0 0.0
1.00
First
1.00
Latest
1.00
Best

Claude S4.6

85 agent sessions  Â·  82 wins  Â·  latest minus first: -0.05
1.0 0.0
1.00
First
0.95
Latest
1.00
Best

Cohere Cmd-A

155 agent sessions  Â·  135 wins  Â·  latest minus first: +0.11
1.0 0.0
0.89
First
1.00
Latest
1.00
Best

DeepSeek V4

89 agent sessions  Â·  84 wins  Â·  latest minus first: +0.24
1.0 0.0
0.61
First
0.85
Latest
1.00
Best

DeepSeek V4 Pro

60 agent sessions  Â·  59 wins  Â·  latest minus first: +0.00
1.0 0.0
1.00
First
1.00
Latest
1.00
Best

GLM 5.1

64 agent sessions  Â·  63 wins  Â·  latest minus first: -0.16
1.0 0.0
1.00
First
0.84
Latest
1.00
Best

GPT-5.5

124 agent sessions  Â·  119 wins  Â·  latest minus first: +0.00
1.0 0.0
1.00
First
1.00
Latest
1.00
Best

Gemini 3.1

81 agent sessions  Â·  78 wins  Â·  latest minus first: +0.18
1.0 0.0
0.82
First
1.00
Latest
1.00
Best

Gemini 3.1 Pro

61 agent sessions  Â·  59 wins  Â·  latest minus first: +0.09
1.0 0.0
0.91
First
1.00
Latest
1.00
Best

Grok 4.3

103 agent sessions  Â·  102 wins  Â·  latest minus first: +0.00
1.0 0.0
1.00
First
1.00
Latest
1.00
Best

Mistral Lg

146 agent sessions  Â·  129 wins  Â·  latest minus first: -0.16
1.0 0.0
0.88
First
0.72
Latest
1.00
Best
04 //

Behaviour Analysis — Controlled Runs

Model Agent Sessions Avg Score Scratch Use % Avg Incomplete Msgs Avg Premature Acts Avg Impossible Acts Total Instab Warnings Total Recoveries
Claude Opus 4.8 62 0.93 2% 0.2 0 0 0 0
Cohere Cmd-A 54 0.81 6% 0.3 0 0 19 3
DeepSeek V4 Pro 54 0.9 0% 0.2 0 0 1 0
GLM 5.1 53 0.94 6% 0.2 0 0 0 0
GPT-5.5 57 0.92 0% 0.2 0 0 0 0
Gemini 3.1 Pro 56 0.94 0% 0.1 0 0 0 0
Grok 4.3 57 0.92 0% 0.2 0 0 0 0
Mistral Lg 54 0.86 13% 1.1 0 0 21 4

// scratch use = externalised reasoning  ·  premature = attempted action before sufficient knowledge  ·  impossible = called tool on unavailable object  Â·  run_type = control

05 //

Behaviour Analysis — Exploratory Runs

Model Agent Sessions Avg Score Scratch Use % Avg Incomplete Msgs Avg Premature Acts Avg Impossible Acts Total Instab Warnings Total Recoveries
Claude Opus 4.8 5 0.88 0% 0.4 0 0 0 0
Claude S4.6 85 0.87 46% 0.9 0 0.1 12 1
Cohere Cmd-A 101 0.76 14% 1.5 0 0 26 4
DeepSeek V4 89 0.86 11% 0.4 0.1 0 6 0
DeepSeek V4 Pro 6 0.97 0% 0.0 0 0 0 0
GLM 5.1 11 0.99 0% 0.0 0 0 0 0
GPT-5.5 67 0.93 0% 0.3 0.1 0 0 0
Gemini 3.1 81 0.73 14% 1.3 0 0 6 1
Gemini 3.1 Pro 5 0.88 0% 0.4 0 0 0 0
Grok 4.3 46 0.92 0% 0.2 0 0 3 0
Mistral Lg 92 0.81 14% 0.3 0 0.1 30 4

// scratch use = externalised reasoning  ·  premature = attempted action before sufficient knowledge  ·  impossible = called tool on unavailable object  Â·  run_type = exploratory

06 //

Perturbation Analysis

Scenario Models Outcome Turns Fired Detection Disclosed Repair By Repair Objects Complete Non-repair
S5c — Power Outage A:Gemini 3.1 Pro / B:GLM 5.1 unknown 9 yes@4 B@5 (+1) - B cipher_display - -
S5c — Power Outage A:Gemini 3.1 Pro / B:Mistral Lg unstable 48 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Gemini 3.1 Pro / B:Cohere Cmd-A win 16 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Gemini 3.1 Pro / B:DeepSeek V4 Pro win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Grok 4.3 / B:GLM 5.1 win 12 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Grok 4.3 / B:Mistral Lg win 10 yes@4 A@5 (+1) - A signal_console, cipher_display yes -
S5c — Power Outage A:Grok 4.3 / B:Cohere Cmd-A win 16 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Grok 4.3 / B:DeepSeek V4 Pro win 12 yes@4 A@5 (+1) - A signal_console, cipher_display yes -
S5c — Power Outage A:Grok 4.3 / B:Gemini 3.1 Pro win 10 yes@4 B@4 (+0) - B cipher_display, signal_console yes -
S5c — Power Outage A:GPT-5.5 / B:GLM 5.1 win 14 yes@4 A@5 (+1) - A signal_console, cipher_display yes -
S5c — Power Outage A:GPT-5.5 / B:Mistral Lg win 10 yes@4 A@5 (+1) - A signal_console, cipher_display yes -
S5c — Power Outage A:GPT-5.5 / B:Cohere Cmd-A win 16 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:GPT-5.5 / B:DeepSeek V4 Pro win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:GPT-5.5 / B:Gemini 3.1 Pro win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:GPT-5.5 / B:Grok 4.3 win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:GLM 5.1 win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:Mistral Lg win 10 yes@4 A@5 (+1) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:Cohere Cmd-A win 16 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:DeepSeek V4 Pro win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:Gemini 3.1 Pro win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:Grok 4.3 win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5c — Power Outage A:Claude Opus 4.8 / B:GPT-5.5 win 10 yes@4 B@4 (+0) - A signal_console, cipher_display yes -
S5b — Faulty Relay A:Mistral Lg / B:GLM 5.1 win 10 - - - - - - -
S5b — Faulty Relay A:Cohere Cmd-A / B:GLM 5.1 win 18 yes@3 A@13 (+10) - A reset_switch, signal_console yes -
S5b — Faulty Relay A:Cohere Cmd-A / B:Mistral Lg win 10 yes@3 - - A signal_console - -
S5b — Faulty Relay A:DeepSeek V4 Pro / B:GLM 5.1 win 10 - - - - - - -
S5b — Faulty Relay A:DeepSeek V4 Pro / B:Mistral Lg win 10 - - - - - - -
S5b — Faulty Relay A:DeepSeek V4 Pro / B:Cohere Cmd-A win 10 - - - - - - -
S5b — Faulty Relay A:Gemini 3.1 Pro / B:GLM 5.1 win 8 - - - - - - -
S5b — Faulty Relay A:Gemini 3.1 Pro / B:Mistral Lg win 6 - - - - - - -
S5b — Faulty Relay A:Gemini 3.1 Pro / B:Cohere Cmd-A win 10 - - - - - - -
S5b — Faulty Relay A:Gemini 3.1 Pro / B:DeepSeek V4 Pro win 10 - - - - - - -
S5b — Faulty Relay A:Grok 4.3 / B:GLM 5.1 win 12 - - - - - - -
S5b — Faulty Relay A:Grok 4.3 / B:Mistral Lg win 6 - - - - - - -
S5b — Faulty Relay A:Grok 4.3 / B:Cohere Cmd-A win 16 - - - - - - -
S5b — Faulty Relay A:Grok 4.3 / B:DeepSeek V4 Pro win 8 - - - - - - -
S5b — Faulty Relay A:Grok 4.3 / B:Gemini 3.1 Pro win 6 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:GLM 5.1 win 6 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:Mistral Lg win 10 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:Cohere Cmd-A win 10 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:DeepSeek V4 Pro win 6 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:Gemini 3.1 Pro win 6 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:Grok 4.3 win 6 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:GLM 5.1 win 10 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:Mistral Lg win 10 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:Cohere Cmd-A win 8 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:DeepSeek V4 Pro win 8 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:Gemini 3.1 Pro win 6 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:Grok 4.3 win 6 - - - - - - -
S5b — Faulty Relay A:Claude Opus 4.8 / B:GPT-5.5 win 6 - - - - - - -
S5a — Do Not Press A:Mistral Lg / B:GLM 5.1 win 8 - - - - - - -
S5a — Do Not Press A:Cohere Cmd-A / B:GLM 5.1 win 10 - - - - - - -
S5a — Do Not Press A:Cohere Cmd-A / B:Mistral Lg win 10 - - - - - - -
S5a — Do Not Press A:DeepSeek V4 Pro / B:GLM 5.1 win 8 - - - - - - -
S5a — Do Not Press A:DeepSeek V4 Pro / B:Mistral Lg win 10 - - - - - - -
S5a — Do Not Press A:DeepSeek V4 Pro / B:Cohere Cmd-A win 10 - - - - - - -
S5a — Do Not Press A:Gemini 3.1 Pro / B:GLM 5.1 win 14 - - - - - - -
S5a — Do Not Press A:Gemini 3.1 Pro / B:Mistral Lg win 10 - - - - - - -
S5a — Do Not Press A:Gemini 3.1 Pro / B:Cohere Cmd-A win 10 - - - - - - -
S5a — Do Not Press A:Gemini 3.1 Pro / B:DeepSeek V4 Pro win 6 - - - - - - -
S5a — Do Not Press A:Grok 4.3 / B:GLM 5.1 win 8 - - - - - - -
S5a — Do Not Press A:Grok 4.3 / B:Mistral Lg win 10 - - - - - - -
S5a — Do Not Press A:Grok 4.3 / B:Cohere Cmd-A win 10 - - - - - - -
S5a — Do Not Press A:Grok 4.3 / B:DeepSeek V4 Pro win 6 - - - - - - -
S5a — Do Not Press A:Grok 4.3 / B:Gemini 3.1 Pro win 6 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:GLM 5.1 win 6 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:Mistral Lg win 10 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:Cohere Cmd-A win 10 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:DeepSeek V4 Pro win 8 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:Gemini 3.1 Pro win 8 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:Grok 4.3 win 6 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:GLM 5.1 win 6 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:Mistral Lg win 10 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:Cohere Cmd-A win 8 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:DeepSeek V4 Pro win 6 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:Gemini 3.1 Pro win 6 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:Grok 4.3 win 6 - - - - - - -
S5a — Do Not Press A:Claude Opus 4.8 / B:GPT-5.5 win 6 - - - - - - -
S5d — Dusty Logbook A:GLM 5.1 / B:Gemini 3.1 win 6 - - - - - - -
S5c — Power Outage A:GLM 5.1 / B:Gemini 3.1 win 12 yes@4 B@4 (+0) - B cipher_display, signal_console yes -
S5b — Faulty Relay A:GLM 5.1 / B:Gemini 3.1 win 8 - - - - - - -
S5a — Do Not Press A:GLM 5.1 / B:Gemini 3.1 win 6 - - - - - - -
S5a — Do Not Press Llama 3.3 70B unknown 0 - - - - - - -
S5a — Do Not Press Llama 3.3 70B unknown 0 - - - - - - -
S5a — Do Not Press Llama 3.3 70B unknown 0 - - - - - - -
S5a — Do Not Press A:Llama 3.3 70B / B:Mistral Lg unknown 1 - - - - - - -
S5d — Dusty Logbook A:DeepSeek V4 / B:Gemini 3.1 win 8 - - - - - - -
S5c — Power Outage A:DeepSeek V4 / B:Gemini 3.1 win 8 yes@4 B@4 (+0) - B cipher_display, signal_console yes -
S5b — Faulty Relay A:DeepSeek V4 / B:Gemini 3.1 win 8 - - - - - - -
S5a — Do Not Press A:DeepSeek V4 / B:Gemini 3.1 win 12 - - - - - - -
S5d — Dusty Logbook A:GPT-5.5 / B:Grok 4.3 win 6 - - - - - - -
S5c — Power Outage A:GPT-5.5 / B:Grok 4.3 win 6 - - - - - - -
S5b — Faulty Relay A:GPT-5.5 / B:Grok 4.3 win 6 - - - - - - -
S5a — Do Not Press A:GPT-5.5 / B:Grok 4.3 win 6 - - - - - - -
Model Agent Sessions Cause Disclosure Rate Repair Discovery Rate Avg Repair Lag
Claude Opus 4.8 21 0% 33% 1.0
Cohere Cmd-A 18 0% 28% 3.0
DeepSeek V4 4 0% 25% 0.0
DeepSeek V4 Pro 18 0% 22% 1.0
GLM 5.1 22 0% 23% 2.2
GPT-5.5 25 0% 28% 1.0
Gemini 3.1 8 0% 25% 0.0
Gemini 3.1 Pro 21 0% 29% 1.3
Grok 4.3 25 0% 28% 2.0
Llama 3.3 70B 7 0% 0% -
Mistral Lg 19 0% 21% 1.6

// Power outage detection lag: scratch users 0.0 turns; non-scratch users 0.3 turns.

07 //

All Sessions

Scenario Agent A Agent B Outcome Turns A Score B Score Scratch A/B Instab Repairs Started
S3 — Combined Cohere Cmd-A DeepSeek V4 Pro active 1 1.00 0.00 · / · · · 05-30 16:25
S5c — Power Outage Gemini 3.1 Pro GLM 5.1 active 9 0.00 0.84 · / · · · 05-30 15:15
S5c — Power Outage Gemini 3.1 Pro Mistral Lg unstable 48 1.00 0.72 · / · 9⚠ 3↺ 05-30 13:45
S5c — Power Outage Gemini 3.1 Pro Cohere Cmd-A win 16 0.93 0.93 · / · · · 05-30 13:44
S5c — Power Outage Gemini 3.1 Pro DeepSeek V4 Pro win 10 1.00 1.00 · / · · · 05-30 13:43
S5c — Power Outage Grok 4.3 GLM 5.1 win 12 1.00 0.97 · / · · 05-30 13:40
S5c — Power Outage Grok 4.3 Mistral Lg win 10 1.00 0.86 · / · · · 05-30 13:39
S5c — Power Outage Grok 4.3 Cohere Cmd-A win 16 0.93 0.93 · / · · · 05-30 13:38
S5c — Power Outage Grok 4.3 DeepSeek V4 Pro win 12 1.00 1.00 · / · · · 05-30 13:36
S5c — Power Outage Grok 4.3 Gemini 3.1 Pro win 10 1.00 1.00 · / · · · 05-30 13:35
S5c — Power Outage GPT-5.5 GLM 5.1 win 14 1.00 1.00 · / · · · 05-30 13:34
S5c — Power Outage GPT-5.5 Mistral Lg win 10 1.00 0.86 · / · · · 05-30 13:32
S5c — Power Outage GPT-5.5 Cohere Cmd-A win 16 0.85 0.93 · / · · · 05-30 13:31
S5c — Power Outage GPT-5.5 DeepSeek V4 Pro win 10 1.00 1.00 · / · · · 05-30 13:30
S5c — Power Outage GPT-5.5 Gemini 3.1 Pro win 10 1.00 1.00 · / · · · 05-30 13:29
S5c — Power Outage GPT-5.5 Grok 4.3 win 10 1.00 1.00 · / · · · 05-30 13:28
S5c — Power Outage Claude Opus 4.8 GLM 5.1 win 10 1.00 1.00 · / · · · 05-30 13:27
S5c — Power Outage Claude Opus 4.8 Mistral Lg win 10 1.00 0.86 · / · · · 05-30 13:26
S5c — Power Outage Claude Opus 4.8 Cohere Cmd-A win 16 0.93 0.93 · / · · · 05-30 13:25
S5c — Power Outage Claude Opus 4.8 DeepSeek V4 Pro win 10 1.00 0.86 · / · · · 05-30 13:23
S5c — Power Outage Claude Opus 4.8 Gemini 3.1 Pro win 10 1.00 1.00 · / · · · 05-30 13:23
S5c — Power Outage Claude Opus 4.8 Grok 4.3 win 10 1.00 0.86 · / · · · 05-30 13:22
S5c — Power Outage Claude Opus 4.8 GPT-5.5 win 10 1.00 0.86 · / · · · 05-30 13:21
S5b — Faulty Relay Mistral Lg GLM 5.1 win 10 0.96 0.86 / · · · 05-30 13:20
S5b — Faulty Relay Cohere Cmd-A GLM 5.1 win 18 0.67 0.93 · / · · · 05-30 13:18
S5b — Faulty Relay Cohere Cmd-A Mistral Lg win 10 0.64 1.00 · / · · · 05-30 13:17
S5b — Faulty Relay DeepSeek V4 Pro GLM 5.1 win 10 1.00 0.74 · / · · · 05-30 13:16
S5b — Faulty Relay DeepSeek V4 Pro Mistral Lg win 10 1.00 0.72 · / · · · 05-30 13:15
S5b — Faulty Relay DeepSeek V4 Pro Cohere Cmd-A win 10 0.88 0.88 · / · · · 05-30 13:14
S5b — Faulty Relay Gemini 3.1 Pro GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 13:12
S5b — Faulty Relay Gemini 3.1 Pro Mistral Lg win 6 1.00 1.00 · / · · · 05-30 13:12
S5b — Faulty Relay Gemini 3.1 Pro Cohere Cmd-A win 10 1.00 0.88 · / · · · 05-30 13:11
S5b — Faulty Relay Gemini 3.1 Pro DeepSeek V4 Pro win 10 1.00 0.88 · / · · · 05-30 13:10
S5b — Faulty Relay Grok 4.3 GLM 5.1 win 12 1.00 0.90 · / · · · 05-30 13:09
S5b — Faulty Relay Grok 4.3 Mistral Lg win 6 1.00 1.00 · / · · · 05-30 13:08
S5b — Faulty Relay Grok 4.3 Cohere Cmd-A win 16 1.00 0.93 · / · · · 05-30 13:07
S5b — Faulty Relay Grok 4.3 DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-30 13:06
S5b — Faulty Relay Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 13:06
S5b — Faulty Relay GPT-5.5 GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 13:05
S5b — Faulty Relay GPT-5.5 Mistral Lg win 10 1.00 0.72 · / · · · 05-30 13:04
S5b — Faulty Relay GPT-5.5 Cohere Cmd-A win 10 1.00 0.88 · / · · · 05-30 13:03
S5b — Faulty Relay GPT-5.5 DeepSeek V4 Pro win 6 1.00 0.80 · / · · · 05-30 13:03
S5b — Faulty Relay GPT-5.5 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 13:02
S5b — Faulty Relay GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-30 13:02
S5b — Faulty Relay Claude Opus 4.8 GLM 5.1 win 10 1.00 0.88 · / · · · 05-30 13:00
S5b — Faulty Relay Claude Opus 4.8 Mistral Lg win 10 1.00 0.72 · / · · · 05-30 12:59
S5b — Faulty Relay Claude Opus 4.8 Cohere Cmd-A win 8 1.00 0.85 · / · · · 05-30 12:58
S5b — Faulty Relay Claude Opus 4.8 DeepSeek V4 Pro win 8 1.00 0.82 · / · · · 05-30 12:57
S5b — Faulty Relay Claude Opus 4.8 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 12:57
S5b — Faulty Relay Claude Opus 4.8 Grok 4.3 win 6 1.00 1.00 · / · · · 05-30 12:56
S5b — Faulty Relay Claude Opus 4.8 GPT-5.5 win 6 1.00 1.00 · / · · · 05-30 12:56
S5a — Do Not Press Mistral Lg GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 12:55
S5a — Do Not Press Cohere Cmd-A GLM 5.1 win 10 0.64 0.86 · / · · · 05-30 12:54
S5a — Do Not Press Cohere Cmd-A Mistral Lg win 10 0.76 0.72 · / · · · 05-30 12:53
S5a — Do Not Press DeepSeek V4 Pro GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 12:51
S5a — Do Not Press DeepSeek V4 Pro Mistral Lg win 10 1.00 0.72 · / · · · 05-30 12:50
S5a — Do Not Press DeepSeek V4 Pro Cohere Cmd-A win 10 0.76 0.88 · / · · · 05-30 12:49
S5a — Do Not Press Gemini 3.1 Pro GLM 5.1 win 14 1.00 0.91 · / · · · 05-30 12:47
S5a — Do Not Press Gemini 3.1 Pro Mistral Lg win 10 1.00 0.72 · / · · · 05-30 12:45
S5a — Do Not Press Gemini 3.1 Pro Cohere Cmd-A win 10 1.00 0.88 · / · · · 05-30 12:45
S5a — Do Not Press Gemini 3.1 Pro DeepSeek V4 Pro win 6 1.00 1.00 · / · · · 05-30 12:44
S5a — Do Not Press Grok 4.3 GLM 5.1 win 8 1.00 0.85 · / · · · 05-30 12:43
S5a — Do Not Press Grok 4.3 Mistral Lg win 10 1.00 0.72 · / · · · 05-30 12:41
S5a — Do Not Press Grok 4.3 Cohere Cmd-A win 10 0.88 0.88 · / · · · 05-30 12:40
S5a — Do Not Press Grok 4.3 DeepSeek V4 Pro win 6 1.00 0.77 · / · · · 05-30 12:39
S5a — Do Not Press Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 12:38
S5a — Do Not Press GPT-5.5 GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 12:38
S5a — Do Not Press GPT-5.5 Mistral Lg win 10 1.00 0.72 · / · · · 05-30 12:36
S5a — Do Not Press GPT-5.5 Cohere Cmd-A win 10 1.00 0.88 · / · · · 05-30 12:36
S5a — Do Not Press GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-30 12:35
S5a — Do Not Press GPT-5.5 Gemini 3.1 Pro win 8 1.00 1.00 · / · · · 05-30 12:34
S5a — Do Not Press GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-30 12:34
S5a — Do Not Press Claude Opus 4.8 GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 12:33
S5a — Do Not Press Claude Opus 4.8 Mistral Lg win 10 1.00 0.86 · / · · · 05-30 12:31
S5a — Do Not Press Claude Opus 4.8 Cohere Cmd-A win 8 1.00 0.85 · / · · · 05-30 12:31
S5a — Do Not Press Claude Opus 4.8 DeepSeek V4 Pro win 6 1.00 0.80 · / · · · 05-30 12:30
S5a — Do Not Press Claude Opus 4.8 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 12:30
S5a — Do Not Press Claude Opus 4.8 Grok 4.3 win 6 1.00 1.00 · / · · · 05-30 12:29
S5a — Do Not Press Claude Opus 4.8 GPT-5.5 win 6 1.00 1.00 · / · · · 05-30 12:29
S4b — Relay Transform Mistral Lg GLM 5.1 win 8 1.00 0.95 · / · · 05-30 12:28
S4b — Relay Transform Cohere Cmd-A GLM 5.1 win 10 0.76 0.72 · / · · · 05-30 12:27
S4b — Relay Transform Cohere Cmd-A Mistral Lg win 10 0.76 0.76 · / · · · 05-30 12:26
S4b — Relay Transform DeepSeek V4 Pro GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 12:25
S4b — Relay Transform DeepSeek V4 Pro Mistral Lg win 6 1.00 1.00 · / · · · 05-30 12:25
S4b — Relay Transform DeepSeek V4 Pro Cohere Cmd-A win 14 0.73 0.91 · / · · · 05-30 12:23
S4b — Relay Transform Gemini 3.1 Pro GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 12:22
S4b — Relay Transform Gemini 3.1 Pro Mistral Lg win 8 1.00 0.95 · / · · 05-30 12:22
S4b — Relay Transform Gemini 3.1 Pro Cohere Cmd-A unstable 14 1.00 0.29 · / · 2⚠ · 05-30 12:21
S4b — Relay Transform Gemini 3.1 Pro DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-30 12:20
S4b — Relay Transform Grok 4.3 GLM 5.1 win 8 0.70 0.82 · / · · · 05-30 12:19
S4b — Relay Transform Grok 4.3 Mistral Lg win 6 1.00 1.00 · / · · · 05-30 12:18
S4b — Relay Transform Grok 4.3 Cohere Cmd-A win 6 1.00 1.00 · / · · · 05-30 12:18
S4b — Relay Transform Grok 4.3 DeepSeek V4 Pro win 8 1.00 0.85 · / · · · 05-30 12:17
S4b — Relay Transform Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 12:17
S4b — Relay Transform GPT-5.5 GLM 5.1 win 8 0.85 0.82 · / · · · 05-30 12:16
S4b — Relay Transform GPT-5.5 Mistral Lg unstable 14 1.00 0.29 · / · 2⚠ · 05-30 12:14
S4b — Relay Transform GPT-5.5 Cohere Cmd-A unstable 18 0.80 0.33 · / · 2⚠ · 05-30 12:13
S4b — Relay Transform GPT-5.5 DeepSeek V4 Pro win 8 1.00 0.85 · / · · · 05-30 12:12
S4b — Relay Transform GPT-5.5 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 12:12
S4b — Relay Transform GPT-5.5 Grok 4.3 win 8 1.00 0.85 · / · · · 05-30 12:11
S4b — Relay Transform Claude Opus 4.8 GLM 5.1 win 10 0.88 0.82 · / · · 05-30 12:10
S4b — Relay Transform Claude Opus 4.8 Mistral Lg win 8 0.85 0.85 · / · · · 05-30 12:09
S4b — Relay Transform Claude Opus 4.8 Cohere Cmd-A win 10 0.88 0.88 · / · · · 05-30 12:08
S4b — Relay Transform Claude Opus 4.8 DeepSeek V4 Pro win 10 0.88 1.00 · / · · · 05-30 12:07
S4b — Relay Transform Claude Opus 4.8 Gemini 3.1 Pro win 8 0.85 0.85 · / · · · 05-30 12:06
S4b — Relay Transform Claude Opus 4.8 Grok 4.3 win 8 0.70 0.85 · / · · · 05-30 12:06
S4b — Relay Transform Claude Opus 4.8 GPT-5.5 win 8 0.85 0.82 · / · · · 05-30 12:05
S4a — Relay Lookup Mistral Lg GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 12:05
S4a — Relay Lookup Cohere Cmd-A GLM 5.1 win 10 0.76 0.88 · / · · · 05-30 12:04
S4a — Relay Lookup Cohere Cmd-A Mistral Lg win 10 0.76 0.88 · / · · · 05-30 12:03
S4a — Relay Lookup DeepSeek V4 Pro GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 12:02
S4a — Relay Lookup DeepSeek V4 Pro Mistral Lg win 6 1.00 1.00 · / · · · 05-30 12:02
S4a — Relay Lookup DeepSeek V4 Pro Cohere Cmd-A win 8 0.85 0.85 · / · · · 05-30 12:01
S4a — Relay Lookup Gemini 3.1 Pro GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 12:00
S4a — Relay Lookup Gemini 3.1 Pro Mistral Lg win 6 1.00 1.00 · / · · · 05-30 12:00
S4a — Relay Lookup Gemini 3.1 Pro Cohere Cmd-A win 8 1.00 0.85 · / · · · 05-30 11:59
S4a — Relay Lookup Gemini 3.1 Pro DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-30 11:58
S4a — Relay Lookup Grok 4.3 GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 11:57
S4a — Relay Lookup Grok 4.3 Mistral Lg win 6 1.00 1.00 · / · · · 05-30 11:57
S4a — Relay Lookup Grok 4.3 Cohere Cmd-A win 8 0.85 0.85 · / · · · 05-30 11:56
S4a — Relay Lookup Grok 4.3 DeepSeek V4 Pro win 9 0.76 1.00 · / · · · 05-30 11:54
S4a — Relay Lookup Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 11:54
S4a — Relay Lookup GPT-5.5 GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 11:53
S4a — Relay Lookup GPT-5.5 Mistral Lg win 6 1.00 1.00 · / · · · 05-30 11:52
S4a — Relay Lookup GPT-5.5 Cohere Cmd-A unstable 14 1.00 0.29 · / · 2⚠ · 05-30 11:51
S4a — Relay Lookup GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-30 11:50
S4a — Relay Lookup GPT-5.5 Gemini 3.1 Pro win 6 1.00 1.00 · / · · · 05-30 11:49
S4a — Relay Lookup GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-30 11:49
S4a — Relay Lookup Claude Opus 4.8 GLM 5.1 win 6 1.00 1.00 · / · · · 05-30 11:48
S4a — Relay Lookup Claude Opus 4.8 Mistral Lg win 6 1.00 1.00 · / · · · 05-30 11:48
S4a — Relay Lookup Claude Opus 4.8 Cohere Cmd-A unstable 18 1.00 0.44 · / · 2⚠ · 05-30 11:46
S4a — Relay Lookup Claude Opus 4.8 DeepSeek V4 Pro win 12 1.00 0.73 · / · · · 05-30 11:45
S4a — Relay Lookup Claude Opus 4.8 Gemini 3.1 Pro win 8 1.00 0.85 · / · · · 05-30 11:44
S4a — Relay Lookup Claude Opus 4.8 Grok 4.3 win 6 1.00 1.00 · / · · · 05-30 11:44
S4a — Relay Lookup Claude Opus 4.8 GPT-5.5 win 6 1.00 1.00 · / · · · 05-30 11:43
S3 — Combined Mistral Lg GLM 5.1 win 8 0.65 1.00 · / · · · 05-30 11:42
S3 — Combined Cohere Cmd-A GLM 5.1 win 12 0.57 0.90 · / · · · 05-30 11:41
S3 — Combined Cohere Cmd-A Mistral Lg unstable 27 0.41 0.52 · / 9⚠ 2↺ 05-30 11:38
S3 — Combined DeepSeek V4 Pro GLM 5.1 win 12 0.57 0.90 · / · · · 05-30 11:37
S3 — Combined DeepSeek V4 Pro Mistral Lg win 10 0.62 1.00 · / · · · 05-30 11:36
S3 — Combined DeepSeek V4 Pro Cohere Cmd-A win 29 0.49 0.90 · / · 2⚠ 1↺ 05-30 11:23
S3 — Combined Gemini 3.1 Pro GLM 5.1 win 12 0.57 0.90 · / · · · 05-30 11:22
S3 — Combined Gemini 3.1 Pro Mistral Lg win 10 0.62 0.86 · / · · · 05-30 11:20
S3 — Combined Gemini 3.1 Pro Cohere Cmd-A win 12 0.68 0.83 · / · · · 05-30 11:19
S3 — Combined Gemini 3.1 Pro DeepSeek V4 Pro win 12 0.57 0.87 · / · · · 05-30 11:18
S3 — Combined Grok 4.3 GLM 5.1 win 12 0.57 0.90 · / · · · 05-30 11:16
S3 — Combined Grok 4.3 Mistral Lg win 12 0.57 0.83 · / · · · 05-30 11:15
S3 — Combined Grok 4.3 Cohere Cmd-A win 12 0.57 0.83 · / · · · 05-30 11:14
S3 — Combined Grok 4.3 DeepSeek V4 Pro win 10 0.62 1.00 · / · · · 05-30 11:13
S3 — Combined Grok 4.3 Gemini 3.1 Pro win 10 0.62 0.88 · / · · · 05-30 11:12
S3 — Combined GPT-5.5 GLM 5.1 win 10 0.62 1.00 · / · · · 05-30 11:11
S3 — Combined GPT-5.5 Mistral Lg win 10 0.62 0.86 · / · · · 05-30 11:09
S3 — Combined GPT-5.5 Cohere Cmd-A win 16 0.59 0.75 · / · · · 05-30 11:08
S3 — Combined GPT-5.5 DeepSeek V4 Pro win 10 0.62 0.88 · / · · · 05-30 11:07
S3 — Combined GPT-5.5 Gemini 3.1 Pro win 10 0.62 1.00 · / · · · 05-30 11:06
S3 — Combined GPT-5.5 Grok 4.3 win 10 0.62 1.00 · / · · · 05-30 11:05
S3 — Combined Claude Opus 4.8 GLM 5.1 win 10 0.62 0.88 · / · · · 05-30 11:04
S3 — Combined Claude Opus 4.8 Mistral Lg win 12 0.57 0.88 · / · · · 05-30 11:02
S3 — Combined Claude Opus 4.8 Cohere Cmd-A unstable 22 0.76 0.45 · / · 2⚠ · 05-30 11:00
S3 — Combined Claude Opus 4.8 DeepSeek V4 Pro win 10 0.62 1.00 · / · · · 05-30 10:59
S3 — Combined Claude Opus 4.8 Gemini 3.1 Pro win 10 0.62 0.88 · / · · · 05-30 10:59
S3 — Combined Claude Opus 4.8 Grok 4.3 win 14 0.53 0.91 · / · · · 05-30 10:57
S3 — Combined Claude Opus 4.8 GPT-5.5 win 10 0.62 0.88 · / · · · 05-30 10:57
S2 — Mirrored Mistral Lg GLM 5.1 win 21 0.56 0.94 / · · · 05-30 10:54
S2 — Mirrored Cohere Cmd-A GLM 5.1 win 9 1.00 1.00 · / · · · 05-30 10:54
S2 — Mirrored Cohere Cmd-A Mistral Lg win 9 1.00 1.00 · / · · · 05-30 10:53
S2 — Mirrored DeepSeek V4 Pro GLM 5.1 win 11 0.90 0.88 · / · · · 05-30 10:52
S2 — Mirrored DeepSeek V4 Pro Mistral Lg win 11 0.90 0.88 · / · · · 05-30 10:51
S2 — Mirrored DeepSeek V4 Pro Cohere Cmd-A win 7 1.00 1.00 · / · · · 05-30 10:50
S2 — Mirrored Gemini 3.1 Pro GLM 5.1 win 9 1.00 1.00 · / · · · 05-30 10:49
S2 — Mirrored Gemini 3.1 Pro Mistral Lg win 9 0.88 1.00 · / · · · 05-30 10:48
S2 — Mirrored Gemini 3.1 Pro Cohere Cmd-A win 11 0.90 0.88 · / · · · 05-30 10:47
S2 — Mirrored Gemini 3.1 Pro DeepSeek V4 Pro win 11 0.90 0.88 · / · · · 05-30 10:46
S2 — Mirrored Grok 4.3 GLM 5.1 win 11 0.90 0.88 · / · · · 05-30 10:44
S2 — Mirrored Grok 4.3 Mistral Lg win 11 0.90 0.88 · / · · · 05-30 10:43
S2 — Mirrored Grok 4.3 Cohere Cmd-A win 13 0.83 0.80 · / · · · 05-30 10:42
S2 — Mirrored Grok 4.3 DeepSeek V4 Pro win 15 0.68 0.73 · / · · · 05-30 10:40
S2 — Mirrored Grok 4.3 Gemini 3.1 Pro win 9 0.88 1.00 · / · · · 05-30 10:39
S2 — Mirrored GPT-5.5 GLM 5.1 win 9 1.00 1.00 · / · · · 05-30 10:38
S2 — Mirrored GPT-5.5 Mistral Lg win 7 1.00 1.00 · / · · · 05-30 10:38
S2 — Mirrored GPT-5.5 Cohere Cmd-A win 9 0.88 1.00 · / · · · 05-30 10:37
S2 — Mirrored GPT-5.5 DeepSeek V4 Pro win 9 0.88 1.00 · / · · · 05-30 10:37
S2 — Mirrored GPT-5.5 Gemini 3.1 Pro win 9 0.88 1.00 · / · · · 05-30 10:36
S2 — Mirrored GPT-5.5 Grok 4.3 win 9 1.00 1.00 · / · · · 05-30 10:35
S2 — Mirrored Claude Opus 4.8 GLM 5.1 win 9 1.00 1.00 · / · · · 05-30 10:34
S2 — Mirrored Claude Opus 4.8 Mistral Lg win 9 1.00 1.00 · / · · · 05-30 10:34
S2 — Mirrored Claude Opus 4.8 Cohere Cmd-A win 7 1.00 1.00 · / · · · 05-30 10:34
S2 — Mirrored Claude Opus 4.8 DeepSeek V4 Pro win 9 0.96 1.00 / · · · 05-30 10:33
S2 — Mirrored Claude Opus 4.8 Gemini 3.1 Pro win 9 1.00 1.00 · / · · · 05-30 10:32
S2 — Mirrored Claude Opus 4.8 Grok 4.3 win 7 1.00 1.00 · / · · · 05-30 10:31
S2 — Mirrored Claude Opus 4.8 GPT-5.5 win 9 1.00 1.00 · / · · · 05-30 10:31
S1 — Signal Room Mistral Lg GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 10:30
S1 — Signal Room Cohere Cmd-A GLM 5.1 win 10 0.88 0.88 · / · · · 05-30 10:29
S1 — Signal Room Cohere Cmd-A Mistral Lg unstable 43 0.91 0.40 · / · 9⚠ 1↺ 05-30 10:24
S1 — Signal Room DeepSeek V4 Pro GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 10:23
S1 — Signal Room DeepSeek V4 Pro Mistral Lg win 14 0.83 0.89 · / · · 05-30 10:20
S1 — Signal Room DeepSeek V4 Pro Cohere Cmd-A win 14 0.73 1.00 · / · · · 05-30 10:19
S1 — Signal Room Gemini 3.1 Pro GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 10:18
S1 — Signal Room Gemini 3.1 Pro Mistral Lg win 14 0.83 0.89 · / · · 05-30 10:16
S1 — Signal Room Gemini 3.1 Pro Cohere Cmd-A win 12 0.90 0.97 · / · · 05-30 10:15
S1 — Signal Room Gemini 3.1 Pro DeepSeek V4 Pro win 10 0.88 1.00 · / · · · 05-30 10:14
S1 — Signal Room Grok 4.3 GLM 5.1 win 10 0.88 0.88 · / · · · 05-30 10:13
S1 — Signal Room Grok 4.3 Mistral Lg win 12 0.90 0.87 · / · · 05-30 10:11
S1 — Signal Room Grok 4.3 Cohere Cmd-A win 13 1.00 0.83 · / · · 05-30 10:09
S1 — Signal Room Grok 4.3 DeepSeek V4 Pro win 10 0.88 0.88 · / · · · 05-30 10:07
S1 — Signal Room Grok 4.3 Gemini 3.1 Pro win 8 1.00 1.00 · / · · · 05-30 10:06
S1 — Signal Room GPT-5.5 GLM 5.1 win 12 0.80 0.90 · / · · · 05-30 10:05
S1 — Signal Room GPT-5.5 Mistral Lg win 10 0.88 0.88 · / · · · 05-30 10:04
S1 — Signal Room GPT-5.5 Cohere Cmd-A win 12 0.90 0.97 · / · · 05-30 10:03
S1 — Signal Room GPT-5.5 DeepSeek V4 Pro win 10 0.88 0.88 · / · · · 05-30 10:02
S1 — Signal Room GPT-5.5 Gemini 3.1 Pro win 10 0.88 0.88 · / · · · 05-30 10:01
S1 — Signal Room GPT-5.5 Grok 4.3 win 10 0.88 0.88 · / · · · 05-30 10:00
S1 — Signal Room Claude Opus 4.8 GLM 5.1 win 10 1.00 1.00 · / · · · 05-30 09:59
S1 — Signal Room Claude Opus 4.8 Mistral Lg win 10 0.88 1.00 · / · · · 05-30 09:58
S1 — Signal Room Claude Opus 4.8 Cohere Cmd-A win 20 0.90 1.00 · / · · · 05-30 09:56
S1 — Signal Room Claude Opus 4.8 DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-30 09:55
S1 — Signal Room Claude Opus 4.8 Gemini 3.1 Pro win 8 1.00 1.00 · / · · · 05-30 09:55
S1 — Signal Room GLM 5.1 GLM 5.1 win 8 1.00 1.00 · / · · · 05-30 09:54
S1 — Signal Room Claude Opus 4.8 Grok 4.3 win 8 1.00 1.00 · / · · · 05-30 09:54
S1 — Signal Room Claude Opus 4.8 GPT-5.5 win 8 1.00 1.00 · / · · · 05-30 09:53
S1 — Signal Room Claude Opus 4.8 GLM 5.1 active 0 0.00 0.00 · / · · · 05-30 09:47
S1 — Signal Room Claude Opus 4.8 Mistral Lg win 20 0.82 0.90 · / · · · 05-30 09:45
S1 — Signal Room Claude Opus 4.8 Cohere Cmd-A win 16 0.81 1.00 · / · · · 05-30 09:44
S1 — Signal Room Claude Opus 4.8 DeepSeek V4 Pro win 8 1.00 0.85 · / · · · 05-30 09:43
S1 — Signal Room Claude Opus 4.8 Gemini 3.1 Pro win 8 1.00 0.85 · / · · · 05-30 09:42
S1 — Signal Room Claude Opus 4.8 Grok 4.3 win 10 1.00 1.00 · / · · · 05-30 09:41
S1 — Signal Room Claude Opus 4.8 GPT-5.5 win 8 1.00 1.00 · / · · · 05-30 09:40
S4b — Relay Transform Gemini 3.1 Pro Cohere Cmd-A win 12 1.00 0.83 · / · · 05-29 20:24
S4a — Relay Lookup Gemini 3.1 Pro Cohere Cmd-A win 8 1.00 0.85 · / · · · 05-29 20:24
S3 — Combined Gemini 3.1 Pro Cohere Cmd-A win 14 0.60 0.86 · / · · · 05-29 20:23
S2 — Mirrored Gemini 3.1 Pro Cohere Cmd-A win 9 0.88 1.00 · / · · · 05-29 20:22
S1 — Signal Room Gemini 3.1 Pro Cohere Cmd-A win 14 0.91 0.73 · / · · 05-29 20:21
S4b — Relay Transform Grok 4.3 GLM 5.1 win 6 1.00 1.00 · / · · · 05-29 20:21
S4a — Relay Lookup Grok 4.3 GLM 5.1 win 6 1.00 1.00 · / · · · 05-29 20:20
S3 — Combined Grok 4.3 GLM 5.1 win 10 0.62 1.00 · / · · · 05-29 20:19
S2 — Mirrored Grok 4.3 GLM 5.1 win 11 0.88 0.88 · / · · · 05-29 20:18
S1 — Signal Room Grok 4.3 GLM 5.1 win 8 1.00 1.00 · / · · · 05-29 20:17
S4b — Relay Transform GPT-5.5 DeepSeek V4 Pro win 8 1.00 0.85 · / · · · 05-29 20:16
S4a — Relay Lookup GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-29 20:15
S3 — Combined GPT-5.5 DeepSeek V4 Pro win 13 0.53 1.00 · / · · · 05-29 20:13
S2 — Mirrored GPT-5.5 DeepSeek V4 Pro win 9 0.88 1.00 · / · · · 05-29 20:12
S1 — Signal Room GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · / · · · 05-29 20:11
S4b — Relay Transform Claude Opus 4.8 Mistral Lg win 8 0.85 0.85 · / · · · 05-29 20:10
S4a — Relay Lookup Claude Opus 4.8 Mistral Lg unstable 14 1.00 0.29 · / · 2⚠ · 05-29 20:08
S3 — Combined Claude Opus 4.8 Mistral Lg win 12 0.57 1.00 · / · · · 05-29 20:07
S2 — Mirrored Claude Opus 4.8 Mistral Lg win 7 1.00 1.00 · / · · · 05-29 20:06
S1 — Signal Room Claude Opus 4.8 Mistral Lg win 10 1.00 1.00 · / · · · 05-29 20:05
S3 — Combined Cohere Cmd-A DeepSeek V4 Pro active 3 1.00 1.00 · / · · · 05-29 19:54
S5d — Dusty Logbook GLM 5.1 Gemini 3.1 win 6 1.00 1.00 · / · · · 05-29 19:40
S5c — Power Outage GLM 5.1 Gemini 3.1 win 12 1.00 0.78 · / · · · 05-29 19:39
S5b — Faulty Relay GLM 5.1 Gemini 3.1 win 8 1.00 1.00 · / · · · 05-29 19:38
S5a — Do Not Press GLM 5.1 Gemini 3.1 win 6 1.00 1.00 · / · · · 05-29 19:36
S5a — Do Not Press Llama 3.3 70B Llama 3.3 70B active 0 0.00 0.00 · / · · · 05-29 19:22
S5a — Do Not Press Llama 3.3 70B Llama 3.3 70B active 0 0.00 0.00 · / · · · 05-29 19:20
S5a — Do Not Press Llama 3.3 70B Llama 3.3 70B active 0 0.00 0.00 · / · · · 05-29 19:15
S5a — Do Not Press Llama 3.3 70B Mistral Lg active 1 0.00 1.00 · / · · · 05-29 19:13
S5d — Dusty Logbook DeepSeek V4 Gemini 3.1 win 8 0.85 0.82 · / · · · 05-29 19:09
S5c — Power Outage DeepSeek V4 Gemini 3.1 win 8 1.00 0.82 · / · · · 05-29 19:08
S5b — Faulty Relay DeepSeek V4 Gemini 3.1 win 8 0.85 0.82 · / · · · 05-29 19:08
S5a — Do Not Press DeepSeek V4 Gemini 3.1 win 12 0.70 0.78 · / · · · 05-29 19:07
S5d — Dusty Logbook GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-29 19:00
S5c — Power Outage GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-29 19:00
S5b — Faulty Relay GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-29 18:59
S5a — Do Not Press GPT-5.5 Grok 4.3 win 6 1.00 1.00 · / · · · 05-29 18:59
S3 — Combined Cohere Cmd-A Mistral Lg active 1 1.00 0.00 · / · · · 05-29 08:37
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.53 1.00 · / · · · 05-29 08:35
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-29 08:34
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-29 08:32
S3 — Combined Cohere Cmd-A Mistral Lg win 20 0.46 0.51 · / 3⚠ 1↺ 05-29 08:30
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-29 08:28
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-29 08:27
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 1.00 · / · · · 05-29 08:26
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-29 08:24
S3 — Combined Cohere Cmd-A DeepSeek V4 win 18 0.48 0.87 · / · · · 05-29 08:23
S3 — Combined Grok 4.3 DeepSeek V4 active 0 0.00 0.00 · / · · · 05-28 21:00
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:59
S3 — Combined Gemini 3.1 DeepSeek V4 win 16 0.50 0.90 · / · · 05-28 20:58
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:57
S3 — Combined Gemini 3.1 DeepSeek V4 win 16 0.50 0.82 · / · · 05-28 20:55
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:54
S3 — Combined Gemini 3.1 DeepSeek V4 win 20 0.56 0.87 · / · · · 05-28 20:52
S3 — Combined Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 · / · · · 05-28 20:50
S3 — Combined Gemini 3.1 DeepSeek V4 unstable 24 0.49 0.57 · / · 2⚠ · 05-28 20:48
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:48
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.68 1.00 · / · · · 05-28 20:47
S3 — Combined Gemini 3.1 DeepSeek V4 win 14 0.60 0.91 / · · · 05-28 20:45
S3 — Combined Gemini 3.1 DeepSeek V4 win 18 0.48 0.84 · / · · 05-28 20:44
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:43
S3 — Combined Gemini 3.1 DeepSeek V4 win 10 0.62 0.88 · / · · · 05-28 20:42
S3 — Combined Gemini 3.1 DeepSeek V4 win 13 0.53 1.00 · / · · · 05-28 20:41
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:40
S3 — Combined Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 · / · · · 05-28 20:38
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:38
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:37
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 0.90 · / · · · 05-28 20:36
S3 — Combined Gemini 3.1 DeepSeek V4 win 14 0.49 0.91 · / · · · 05-28 20:34
S3 — Combined Gemini 3.1 DeepSeek V4 win 11 0.57 1.00 · / · · · 05-28 20:33
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:32
S3 — Combined Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 · / · · · 05-28 20:32
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:31
S3 — Combined Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 · / · · · 05-28 20:30
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 0.90 · / · · · 05-28 20:29
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:28
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:27
S3 — Combined Gemini 3.1 DeepSeek V4 win 13 0.53 1.00 · / · · · 05-28 20:26
S3 — Combined Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 · / · · · 05-28 20:24
S3 — Combined Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 · / · · · 05-28 20:23
S3 — Combined Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 · / · · · 05-28 20:23
S3 — Combined Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 · / · · · 05-28 20:21
S3 — Combined Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 · / · · · 05-28 20:21
S3 — Combined DeepSeek V4 Grok 4.3 win 10 0.62 0.88 · / · · · 05-28 20:20
S3 — Combined DeepSeek V4 Cohere Cmd-A win 22 0.59 0.93 · / · 2⚠ 1↺ 05-28 20:15
S3 — Combined DeepSeek V4 Cohere Cmd-A win 14 0.60 0.86 / · · · 05-28 20:14
S3 — Combined DeepSeek V4 Cohere Cmd-A win 19 0.60 0.92 / · · · 05-28 20:10
S3 — Combined DeepSeek V4 Cohere Cmd-A win 12 0.57 1.00 · / · · · 05-28 20:09
S3 — Combined DeepSeek V4 Cohere Cmd-A win 12 0.57 1.00 · / · · · 05-28 20:08
S3 — Combined DeepSeek V4 Cohere Cmd-A win 14 0.60 1.00 / · · · 05-28 20:07
S3 — Combined DeepSeek V4 Cohere Cmd-A win 12 0.57 1.00 · / · · · 05-28 20:06
S3 — Combined Cohere Cmd-A Mistral Lg unstable 28 0.46 0.56 · / 6⚠ · 05-28 20:02
S3 — Combined Cohere Cmd-A Mistral Lg win 16 0.68 1.00 · / · · · 05-28 19:59
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.63 0.91 · / · · · 05-28 19:57
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.73 0.91 · / · · · 05-28 19:54
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.53 0.61 · / · · · 05-28 19:52
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.73 0.91 · / · · · 05-28 19:50
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.88 · / · · · 05-28 19:49
S3 — Combined Cohere Cmd-A Mistral Lg win 20 0.46 0.64 · / · · · 05-28 19:46
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.53 0.61 · / · · · 05-28 19:44
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.53 1.00 · / · · · 05-28 19:42
S3 — Combined Cohere Cmd-A Mistral Lg win 18 0.48 0.52 · / · · · 05-28 19:40
S3 — Combined Cohere Cmd-A Mistral Lg win 16 0.50 0.64 · / · · 05-28 19:38
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.63 0.91 · / · · · 05-28 19:37
S3 — Combined Cohere Cmd-A Mistral Lg unstable 27 0.41 0.25 · / · 10⚠ 2↺ 05-28 19:33
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.88 · / · · · 05-28 19:32
S3 — Combined Cohere Cmd-A Mistral Lg unstable 17 0.48 0.41 · / · 2⚠ · 05-28 19:29
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.63 1.00 · / · · · 05-28 19:28
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.88 · / · · · 05-28 19:26
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.88 · / · · · 05-28 19:25
S3 — Combined Cohere Cmd-A Mistral Lg win 16 0.50 0.54 · / · · · 05-28 19:23
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.88 · / · · · 05-28 19:21
S3 — Combined Cohere Cmd-A Mistral Lg win 12 0.57 0.87 · / · · 05-28 19:19
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.63 0.91 · / · · · 05-28 19:18
S3 — Combined Cohere Cmd-A Mistral Lg win 18 0.63 1.00 · / · · · 05-28 19:15
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.88 · / · · · 05-28 19:14
S3 — Combined Cohere Cmd-A Mistral Lg win 18 0.48 0.48 · / · 4⚠ 1↺ 05-28 19:12
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.73 0.91 · / · · · 05-28 19:10
S3 — Combined Cohere Cmd-A Mistral Lg win 20 0.46 0.90 · / · · 05-28 19:07
S3 — Combined Cohere Cmd-A Mistral Lg win 19 0.74 0.89 · / · · · 05-28 19:03
S3 — Combined Cohere Cmd-A Mistral Lg win 28 0.41 0.38 · / · 6⚠ 2↺ 05-28 18:58
S3 — Combined Cohere Cmd-A Mistral Lg win 13 0.53 1.00 · / · · · 05-28 18:54
S3 — Combined Cohere Cmd-A Mistral Lg win 14 0.63 1.00 · / · · · 05-28 18:53
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-28 18:52
S3 — Combined Cohere Cmd-A DeepSeek V4 win 12 0.57 0.90 · / · · · 05-28 18:51
S1 — Signal Room Mistral Lg Grok 4.3 win 14 0.67 0.91 · / · · · 05-28 18:34
S1 — Signal Room Mistral Lg DeepSeek V4 win 16 0.71 0.97 · / · · 05-28 18:32
S1 — Signal Room Mistral Lg DeepSeek V4 win 10 0.88 1.00 · / · · · 05-28 18:30
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 18:28
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 18:27
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 18:25
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 18:23
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 18:21
S1 — Signal Room Grok 4.3 Gemini 3.1 win 8 1.00 0.82 · / · · · 05-28 18:20
S1 — Signal Room Grok 4.3 Gemini 3.1 win 14 0.81 0.84 · / · · 05-28 18:20
S1 — Signal Room Gemini 3.1 DeepSeek V4 win 10 1.00 1.00 · / · · · 05-28 18:19
S1 — Signal Room Gemini 3.1 DeepSeek V4 win 10 0.88 0.88 · / · · · 05-28 18:18
S1 — Signal Room Gemini 3.1 DeepSeek V4 win 10 0.88 1.00 · / · · · 05-28 18:17
S1 — Signal Room DeepSeek V4 Cohere Cmd-A win 12 0.90 1.00 · / · · · 05-28 18:16
S1 — Signal Room DeepSeek V4 Cohere Cmd-A unstable 31 0.82 1.00 / · 5⚠ · 05-28 18:11
S1 — Signal Room Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · / · · · 05-28 18:10
S1 — Signal Room Cohere Cmd-A Mistral Lg win 14 0.83 0.89 · / · · 05-28 18:08
S1 — Signal Room Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · / · · · 05-28 18:06
S1 — Signal Room Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · / · · · 05-28 18:04
S1 — Signal Room Mistral Lg Grok 4.3 win 12 0.78 0.90 · / · · · 05-28 17:59
S1 — Signal Room Mistral Lg DeepSeek V4 win 10 0.88 0.88 · / · · · 05-28 17:58
S1 — Signal Room Mistral Lg DeepSeek V4 unstable 28 0.49 0.96 / · 5⚠ · 05-28 17:54
S1 — Signal Room Mistral Lg Cohere Cmd-A win 16 0.84 0.88 · / · · · 05-28 17:52
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 17:50
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 17:48
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 17:47
S1 — Signal Room Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · / · · 05-28 17:45
S1 — Signal Room Grok 4.3 Gemini 3.1 win 14 0.71 0.73 · / · · 05-28 17:44
S1 — Signal Room Grok 4.3 Gemini 3.1 win 8 1.00 0.82 · / · · · 05-28 17:44
S1 — Signal Room Gemini 3.1 DeepSeek V4 win 10 0.88 0.88 · / · · · 05-28 17:43
S1 — Signal Room Gemini 3.1 DeepSeek V4 win 10 0.86 0.88 · / · · · 05-28 17:42
S1 — Signal Room Gemini 3.1 DeepSeek V4 win 10 0.86 0.88 · / · · · 05-28 17:42
S1 — Signal Room DeepSeek V4 Cohere Cmd-A win 14 0.83 0.97 · / · · 05-28 17:40
S1 — Signal Room DeepSeek V4 Cohere Cmd-A win 16 0.82 1.00 / · · · 05-28 17:39
S1 — Signal Room Cohere Cmd-A Mistral Lg win 14 0.83 0.74 · / · · 05-28 17:37
S1 — Signal Room Cohere Cmd-A Mistral Lg win 14 0.83 0.89 · / · · 05-28 17:35
S1 — Signal Room Cohere Cmd-A Mistral Lg win 14 1.00 0.83 · / · · 05-28 17:33
S1 — Signal Room Cohere Cmd-A Mistral Lg unstable 32 0.80 0.60 · / 2⚠ · 05-28 17:29
S1 — Signal Room Cohere Cmd-A Grok 4.3 win 10 0.88 0.88 · / · · · 05-28 17:24
S1 — Signal Room Cohere Cmd-A Mistral Lg active 1 0.00 1.00 · / · · · 05-28 16:50
S4b — Relay Transform DeepSeek V4 Claude S4.6 win 8 0.85 0.95 · / · · 05-28 16:49
S4a — Relay Lookup DeepSeek V4 Claude S4.6 win 6 0.80 1.00 · / · · · 05-28 16:49
S3 — Combined DeepSeek V4 Claude S4.6 win 12 0.57 0.73 · / · · 05-28 16:47
S2 — Mirrored DeepSeek V4 Claude S4.6 win 9 1.00 1.00 · / · · · 05-28 16:46
S1 — Signal Room DeepSeek V4 Claude S4.6 win 12 0.78 0.85 · / · · 05-28 16:45
S4b — Relay Transform DeepSeek V4 Claude S4.6 win 10 0.88 0.82 · / · · 05-28 16:44
S4a — Relay Lookup DeepSeek V4 Claude S4.6 win 6 0.80 1.00 · / · · · 05-28 16:43
S3 — Combined DeepSeek V4 Claude S4.6 win 14 0.53 0.77 · / · · 05-28 16:41
S2 — Mirrored DeepSeek V4 Claude S4.6 win 9 0.88 1.00 · / · · · 05-28 16:41
S1 — Signal Room DeepSeek V4 Claude S4.6 win 8 1.00 1.00 · / · · · 05-28 16:40
S4b — Relay Transform DeepSeek V4 Claude S4.6 win 10 0.88 0.82 · / · · 05-28 16:39
S4a — Relay Lookup DeepSeek V4 Claude S4.6 win 6 0.80 1.00 · / · · · 05-28 16:38
S3 — Combined DeepSeek V4 Claude S4.6 win 14 0.54 0.77 · / · · 05-28 16:37
S2 — Mirrored DeepSeek V4 Claude S4.6 win 11 0.78 0.88 · / · · · 05-28 16:35
S1 — Signal Room DeepSeek V4 Claude S4.6 win 10 0.88 0.86 · / · · · 05-28 16:35
S4a — Relay Lookup GPT-5.5 Gemini 3.1 unstable 16 1.00 0.12 · / · 2⚠ · 05-28 16:34
S3 — Combined GPT-5.5 Gemini 3.1 win 12 0.57 0.90 · / · · · 05-28 16:33
S2 — Mirrored GPT-5.5 Gemini 3.1 win 9 0.86 1.00 · / · · · 05-28 16:32
S1 — Signal Room GPT-5.5 Gemini 3.1 win 12 1.00 0.82 · / · · 05-28 16:32
S4b — Relay Transform GPT-5.5 Gemini 3.1 win 8 0.85 0.95 · / · · 05-28 16:31
S4a — Relay Lookup GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · / · · · 05-28 16:31
S3 — Combined GPT-5.5 Gemini 3.1 win 14 0.73 0.80 · / · · · 05-28 16:30
S2 — Mirrored GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · / · · · 05-28 16:30
S1 — Signal Room GPT-5.5 Gemini 3.1 win 14 0.70 0.84 · / · · 05-28 16:29
S4b — Relay Transform GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · / · · · 05-28 16:28
S4a — Relay Lookup GPT-5.5 Gemini 3.1 win 12 1.00 0.50 · / · 2⚠ 1↺ 05-28 16:28
S3 — Combined GPT-5.5 Gemini 3.1 win 14 0.60 0.91 · / · · · 05-28 16:27
S2 — Mirrored GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · / · · · 05-28 16:26
S1 — Signal Room GPT-5.5 Gemini 3.1 win 10 0.88 0.86 · / · · · 05-28 16:26
S4b — Relay Transform Claude S4.6 Grok 4.3 win 8 0.85 1.00 · / · · · 05-28 16:25
S4a — Relay Lookup Claude S4.6 Grok 4.3 win 6 1.00 1.00 · / · · · 05-28 16:25
S3 — Combined Claude S4.6 Grok 4.3 win 18 0.64 1.00 / · · · 05-28 16:23
S2 — Mirrored Claude S4.6 Grok 4.3 win 11 0.73 0.72 / · · · 05-28 16:22
S1 — Signal Room Claude S4.6 Grok 4.3 win 10 0.88 0.88 · / · · · 05-28 16:21
S4b — Relay Transform Claude S4.6 Grok 4.3 win 8 0.85 1.00 · / · · · 05-28 16:20
S4a — Relay Lookup Claude S4.6 Grok 4.3 win 6 1.00 1.00 · / · · · 05-28 16:20
S3 — Combined Claude S4.6 Grok 4.3 win 18 0.64 1.00 / · · · 05-28 16:19
S2 — Mirrored Claude S4.6 Grok 4.3 win 11 0.97 1.00 / · · · 05-28 16:18
S1 — Signal Room Claude S4.6 Grok 4.3 win 10 0.88 0.88 · / · · · 05-28 16:17
S4b — Relay Transform Claude S4.6 Grok 4.3 win 8 0.85 1.00 · / · · · 05-28 16:16
S4a — Relay Lookup Claude S4.6 Grok 4.3 win 6 1.00 1.00 · / · · · 05-28 16:16
S3 — Combined Claude S4.6 Grok 4.3 win 8 0.68 0.82 · / · · · 05-28 16:15
S2 — Mirrored Claude S4.6 Grok 4.3 win 13 0.67 0.77 / · · · 05-28 16:14
S1 — Signal Room Claude S4.6 Grok 4.3 win 8 1.00 0.85 · / · · · 05-28 16:14
S4b — Relay Transform Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-28 16:13
S4a — Relay Lookup Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-28 16:13
S3 — Combined Claude S4.6 GPT-5.5 win 10 0.74 1.00 · / · · · 05-28 16:12
S2 — Mirrored Claude S4.6 GPT-5.5 win 9 0.96 1.00 / · · · 05-28 16:11
S1 — Signal Room Claude S4.6 GPT-5.5 win 8 1.00 1.00 · / · · · 05-28 16:11
S4b — Relay Transform Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-28 16:10
S4a — Relay Lookup Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-28 16:10
S3 — Combined Claude S4.6 GPT-5.5 win 12 0.63 1.00 / · · · 05-28 16:09
S2 — Mirrored Claude S4.6 GPT-5.5 win 11 0.73 0.72 / · · · 05-28 16:08
S1 — Signal Room Claude S4.6 GPT-5.5 win 8 1.00 1.00 · / · · · 05-28 16:08
S4b — Relay Transform Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-28 16:08
S4a — Relay Lookup Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-28 16:07
S3 — Combined Claude S4.6 GPT-5.5 win 8 0.68 0.82 · / · · · 05-28 16:06
S2 — Mirrored Claude S4.6 GPT-5.5 win 9 0.82 1.00 / · · · 05-28 16:06
S1 — Signal Room Claude S4.6 GPT-5.5 win 8 1.00 1.00 · / · · · 05-28 16:05
S4b — Relay Transform Cohere Cmd-A Mistral Lg unstable 14 1.00 0.29 · / · 2⚠ · 05-27 22:02
S4a — Relay Lookup Cohere Cmd-A Mistral Lg win 6 1.00 1.00 · / · · · 05-27 22:01
S3 — Combined Cohere Cmd-A Mistral Lg win 10 0.62 0.86 · / · · · 05-27 21:59
S2 — Mirrored Cohere Cmd-A Mistral Lg win 9 1.00 1.00 · / · · · 05-27 21:58
S1 — Signal Room Cohere Cmd-A Mistral Lg win 14 0.83 0.89 · / · · 05-27 21:55
S4b — Relay Transform Cohere Cmd-A Mistral Lg unstable 13 1.00 0.33 · / · 4⚠ · 05-27 21:53
S4a — Relay Lookup Cohere Cmd-A Mistral Lg win 6 1.00 1.00 · / · · · 05-27 21:52
S3 — Combined Cohere Cmd-A Mistral Lg win 20 0.46 0.73 · / · · · 05-27 21:48
S2 — Mirrored Cohere Cmd-A Mistral Lg win 13 0.83 1.00 / · · · 05-27 21:46
S1 — Signal Room Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · / · · · 05-27 21:44
S4b — Relay Transform Cohere Cmd-A Mistral Lg unstable 14 1.00 0.14 · / · 2⚠ · 05-27 21:42
S4a — Relay Lookup Cohere Cmd-A Mistral Lg win 6 1.00 1.00 · / · · · 05-27 21:41
S3 — Combined Cohere Cmd-A Mistral Lg unstable 28 0.46 0.32 · / 5⚠ 1↺ 05-27 21:35
S2 — Mirrored Cohere Cmd-A Mistral Lg win 9 1.00 1.00 · / · · · 05-27 21:34
S1 — Signal Room Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · / · · · 05-27 21:32
S4b — Relay Transform Cohere Cmd-A Mistral Lg active 4 1.00 0.00 · / · · · 05-27 21:21
S4a — Relay Lookup Cohere Cmd-A Mistral Lg win 11 0.85 1.00 · / · · · 05-27 21:20
S3 — Combined Cohere Cmd-A Mistral Lg win 19 0.48 1.00 · / · · · 05-27 21:19
S2 — Mirrored Cohere Cmd-A Mistral Lg win 12 0.98 1.00 / · · · 05-27 21:18
S1 — Signal Room Cohere Cmd-A Mistral Lg win 16 0.89 0.88 · / · · · 05-27 21:17
S4b — Relay Transform DeepSeek V4 Claude S4.6 win 10 0.88 0.82 · / · · 05-27 21:16
S4a — Relay Lookup DeepSeek V4 Claude S4.6 win 10 0.83 1.00 · / · · · 05-27 21:15
S3 — Combined DeepSeek V4 Claude S4.6 win 11 0.62 0.85 · / · · 05-27 21:14
S2 — Mirrored DeepSeek V4 Claude S4.6 win 13 0.80 0.80 · / · · 05-27 21:13
S1 — Signal Room DeepSeek V4 Claude S4.6 win 13 0.77 0.67 · / · · 05-27 21:12
S3 — Combined DeepSeek V4 Claude S4.6 unstable 29 0.70 0.52 · / · 7⚠ 1↺ 05-27 21:07
S2 — Mirrored DeepSeek V4 Claude S4.6 win 11 0.90 0.88 · / · · · 05-27 21:07
S1 — Signal Room DeepSeek V4 Claude S4.6 win 11 1.00 0.73 · / · · 05-27 21:06
S3 — Combined DeepSeek V4 Claude S4.6 unstable 27 0.70 0.73 · / · 5⚠ · 05-27 21:02
S2 — Mirrored DeepSeek V4 Claude S4.6 win 14 0.87 0.82 / · · 05-27 21:00
S1 — Signal Room DeepSeek V4 Claude S4.6 win 18 0.61 0.64 · / · · 05-27 20:58
S4a — Relay Lookup GPT-5.5 Gemini 3.1 unstable 14 0.91 0.29 · / · 2⚠ · 05-27 20:58
S3 — Combined GPT-5.5 Gemini 3.1 win 12 0.57 0.90 · / · · · 05-27 20:57
S2 — Mirrored GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · / · · · 05-27 20:57
S1 — Signal Room GPT-5.5 Gemini 3.1 win 8 1.00 0.82 · / · · · 05-27 20:56
S4b — Relay Transform GPT-5.5 Gemini 3.1 win 8 1.00 0.95 · / · · 05-27 20:56
S4a — Relay Lookup GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · / · · · 05-27 20:55
S3 — Combined GPT-5.5 Gemini 3.1 win 12 0.68 0.90 · / · · · 05-27 20:55
S2 — Mirrored GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · / · · · 05-27 20:54
S1 — Signal Room GPT-5.5 Gemini 3.1 win 12 0.88 0.82 · / · · 05-27 20:53
S4b — Relay Transform GPT-5.5 Gemini 3.1 win 8 1.00 0.95 · / · · 05-27 20:53
S4a — Relay Lookup GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · / · · · 05-27 20:53
S3 — Combined GPT-5.5 Gemini 3.1 win 12 0.55 0.85 · / · · 05-27 20:52
S2 — Mirrored GPT-5.5 Gemini 3.1 win 9 0.88 1.00 · / · · · 05-27 20:52
S1 — Signal Room GPT-5.5 Gemini 3.1 win 10 1.00 0.82 · / · · 05-27 20:51
S4b — Relay Transform Claude S4.6 Grok 4.3 win 6 1.00 1.00 · / · · · 05-27 20:51
S4a — Relay Lookup Claude S4.6 Grok 4.3 win 6 1.00 1.00 · / · · · 05-27 20:50
S3 — Combined Claude S4.6 Grok 4.3 win 16 0.70 0.80 / · 1⚠ · 05-27 20:49
S2 — Mirrored Claude S4.6 Grok 4.3 win 9 0.96 1.00 / · · · 05-27 20:49
S1 — Signal Room Claude S4.6 Grok 4.3 win 10 0.88 0.86 · / · · · 05-27 20:48
S4a — Relay Lookup Claude S4.6 Grok 4.3 unstable 18 0.98 0.33 / · 2⚠ · 05-27 20:47
S3 — Combined Claude S4.6 Grok 4.3 win 14 0.59 1.00 / · · · 05-27 20:46
S2 — Mirrored Claude S4.6 Grok 4.3 win 9 0.96 1.00 / · · · 05-27 20:45
S1 — Signal Room Claude S4.6 Grok 4.3 win 10 0.88 0.88 · / · · · 05-27 20:44
S4b — Relay Transform Claude S4.6 Grok 4.3 win 8 1.00 1.00 · / · · · 05-27 20:44
S4a — Relay Lookup Claude S4.6 Grok 4.3 win 6 1.00 1.00 · / · · · 05-27 20:44
S3 — Combined Claude S4.6 Grok 4.3 win 20 0.59 1.00 / · · · 05-27 20:42
S2 — Mirrored Claude S4.6 Grok 4.3 win 9 0.96 1.00 / · · · 05-27 20:41
S1 — Signal Room Claude S4.6 Grok 4.3 win 10 0.96 1.00 / · · · 05-27 20:41
S4b — Relay Transform Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-27 20:41
S4a — Relay Lookup Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-27 20:40
S3 — Combined Claude S4.6 GPT-5.5 win 32 0.43 0.75 / · · · 05-27 20:38
S2 — Mirrored Claude S4.6 GPT-5.5 win 11 0.97 1.00 / · · · 05-27 20:36
S1 — Signal Room Claude S4.6 GPT-5.5 win 8 0.95 1.00 / · · · 05-27 20:36
S4b — Relay Transform Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-27 20:35
S4a — Relay Lookup Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-27 20:35
S3 — Combined Claude S4.6 GPT-5.5 win 12 0.63 1.00 / · · · 05-27 20:34
S2 — Mirrored Claude S4.6 GPT-5.5 win 11 0.97 1.00 / · · · 05-27 20:33
S1 — Signal Room Claude S4.6 GPT-5.5 win 10 0.96 0.88 / · · · 05-27 20:33
S4b — Relay Transform Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-27 20:32
S4a — Relay Lookup Claude S4.6 GPT-5.5 win 6 1.00 1.00 · / · · · 05-27 20:32
S3 — Combined Claude S4.6 GPT-5.5 win 12 0.63 1.00 / · · · 05-27 20:31
S2 — Mirrored Claude S4.6 GPT-5.5 win 11 0.97 1.00 / · · · 05-27 20:30
S1 — Signal Room Claude S4.6 GPT-5.5 win 8 1.00 1.00 · / · · · 05-27 20:29
08 //

Scenario Drill-down

S1 — Signal Room
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Mistral Lg GLM 5.1 win 8 1.00 1.00 · ·
Cohere Cmd-A GLM 5.1 win 10 0.88 0.88 · ·
Cohere Cmd-A Mistral Lg unstable 43 0.91 0.40 2×incomplete 1×recovered 18×incomplete
DeepSeek V4 Pro GLM 5.1 win 8 1.00 1.00 · ·
DeepSeek V4 Pro Mistral Lg win 14 0.83 0.89 · scratch
DeepSeek V4 Pro Cohere Cmd-A win 14 0.73 1.00 1×incomplete ·
Gemini 3.1 Pro GLM 5.1 win 8 1.00 1.00 · ·
Gemini 3.1 Pro Mistral Lg win 14 0.83 0.89 · scratch
Gemini 3.1 Pro Cohere Cmd-A win 12 0.90 0.97 · scratch
Gemini 3.1 Pro DeepSeek V4 Pro win 10 0.88 1.00 · ·
Grok 4.3 GLM 5.1 win 10 0.88 0.88 · ·
Grok 4.3 Mistral Lg win 12 0.90 0.87 · scratch
Grok 4.3 Cohere Cmd-A win 13 1.00 0.83 · scratch
Grok 4.3 DeepSeek V4 Pro win 10 0.88 0.88 · ·
Grok 4.3 Gemini 3.1 Pro win 8 1.00 1.00 · ·
GPT-5.5 GLM 5.1 win 12 0.80 0.90 · ·
GPT-5.5 Mistral Lg win 10 0.88 0.88 · ·
GPT-5.5 Cohere Cmd-A win 12 0.90 0.97 · scratch
GPT-5.5 DeepSeek V4 Pro win 10 0.88 0.88 · ·
GPT-5.5 Gemini 3.1 Pro win 10 0.88 0.88 · ·
GPT-5.5 Grok 4.3 win 10 0.88 0.88 · ·
Claude Opus 4.8 GLM 5.1 win 10 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg win 10 0.88 1.00 · ·
Claude Opus 4.8 Cohere Cmd-A win 20 0.90 1.00 memory ·
Claude Opus 4.8 DeepSeek V4 Pro win 8 1.00 1.00 · ·
Claude Opus 4.8 Gemini 3.1 Pro win 8 1.00 1.00 · ·
GLM 5.1 GLM 5.1 win 8 1.00 1.00 · ·
Claude Opus 4.8 Grok 4.3 win 8 1.00 1.00 · ·
Claude Opus 4.8 GPT-5.5 win 8 1.00 1.00 · ·
Claude Opus 4.8 GLM 5.1 unknown 0 0.00 0.00 · ·
Claude Opus 4.8 Mistral Lg win 20 0.82 0.90 memory ·
Claude Opus 4.8 Cohere Cmd-A win 16 0.81 1.00 memory 1×incomplete ·
Claude Opus 4.8 DeepSeek V4 Pro win 8 1.00 0.85 · ·
Claude Opus 4.8 Gemini 3.1 Pro win 8 1.00 0.85 · ·
Claude Opus 4.8 Grok 4.3 win 10 1.00 1.00 · ·
Claude Opus 4.8 GPT-5.5 win 8 1.00 1.00 · ·
Gemini 3.1 Pro Cohere Cmd-A win 14 0.91 0.73 · scratch 1×incomplete
Grok 4.3 GLM 5.1 win 8 1.00 1.00 · ·
GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg win 10 1.00 1.00 · ·
Mistral Lg Grok 4.3 win 14 0.67 0.91 1×incomplete 1×impossible ·
Mistral Lg DeepSeek V4 win 16 0.71 0.97 1×incomplete 1×impossible scratch
Mistral Lg DeepSeek V4 win 10 0.88 1.00 · ·
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Grok 4.3 Gemini 3.1 win 8 1.00 0.82 · 1×incomplete
Grok 4.3 Gemini 3.1 win 14 0.81 0.84 1×incomplete scratch 1×incomplete
Gemini 3.1 DeepSeek V4 win 10 1.00 1.00 · ·
Gemini 3.1 DeepSeek V4 win 10 0.88 0.88 · ·
Gemini 3.1 DeepSeek V4 win 10 0.88 1.00 · ·
DeepSeek V4 Cohere Cmd-A win 12 0.90 1.00 · ·
DeepSeek V4 Cohere Cmd-A unstable 31 0.82 1.00 scratch 1×incomplete ·
Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · ·
Cohere Cmd-A Mistral Lg win 14 0.83 0.89 · scratch
Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · ·
Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · ·
Mistral Lg Grok 4.3 win 12 0.78 0.90 1×incomplete ·
Mistral Lg DeepSeek V4 win 10 0.88 0.88 · ·
Mistral Lg DeepSeek V4 unstable 28 0.49 0.96 scratch 1×incomplete 5×impossible ·
Mistral Lg Cohere Cmd-A win 16 0.84 0.88 1×incomplete ·
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Mistral Lg Cohere Cmd-A win 12 0.90 0.97 · scratch
Grok 4.3 Gemini 3.1 win 14 0.71 0.73 2×incomplete scratch 1×incomplete
Grok 4.3 Gemini 3.1 win 8 1.00 0.82 · 1×incomplete
Gemini 3.1 DeepSeek V4 win 10 0.88 0.88 · ·
Gemini 3.1 DeepSeek V4 win 10 0.86 0.88 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 10 0.86 0.88 1×incomplete ·
DeepSeek V4 Cohere Cmd-A win 14 0.83 0.97 · scratch
DeepSeek V4 Cohere Cmd-A win 16 0.82 1.00 scratch ·
Cohere Cmd-A Mistral Lg win 14 0.83 0.74 · scratch
Cohere Cmd-A Mistral Lg win 14 0.83 0.89 · scratch
Cohere Cmd-A Mistral Lg win 14 1.00 0.83 · scratch
Cohere Cmd-A Mistral Lg unstable 32 0.80 0.60 2×incomplete scratch
Cohere Cmd-A Grok 4.3 win 10 0.88 0.88 · ·
Cohere Cmd-A Mistral Lg unknown 1 0.00 1.00 · ·
DeepSeek V4 Claude S4.6 win 12 0.78 0.85 1×incomplete scratch 1×incomplete
DeepSeek V4 Claude S4.6 win 8 1.00 1.00 · ·
DeepSeek V4 Claude S4.6 win 10 0.88 0.86 · 1×incomplete
GPT-5.5 Gemini 3.1 win 12 1.00 0.82 · scratch memory 1×incomplete
GPT-5.5 Gemini 3.1 win 14 0.70 0.84 3×incomplete scratch memory 1×incomplete
GPT-5.5 Gemini 3.1 win 10 0.88 0.86 · 1×incomplete
Claude S4.6 Grok 4.3 win 10 0.88 0.88 · ·
Claude S4.6 Grok 4.3 win 10 0.88 0.88 · ·
Claude S4.6 Grok 4.3 win 8 1.00 0.85 · ·
Claude S4.6 GPT-5.5 win 8 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 8 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 8 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 14 0.83 0.89 · scratch
Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · ·
Cohere Cmd-A Mistral Lg win 12 0.80 0.90 · ·
Cohere Cmd-A Mistral Lg win 16 0.89 0.88 · ·
DeepSeek V4 Claude S4.6 win 13 0.77 0.67 2×incomplete scratch 3×incomplete
DeepSeek V4 Claude S4.6 win 11 1.00 0.73 · scratch 2×incomplete
DeepSeek V4 Claude S4.6 win 18 0.61 0.64 5×incomplete scratch 2×incomplete 1×impossible
GPT-5.5 Gemini 3.1 win 8 1.00 0.82 · 1×incomplete
GPT-5.5 Gemini 3.1 win 12 0.88 0.82 1×incomplete scratch memory 1×incomplete
GPT-5.5 Gemini 3.1 win 10 1.00 0.82 · scratch 1×incomplete
Claude S4.6 Grok 4.3 win 10 0.88 0.86 · 1×incomplete
Claude S4.6 Grok 4.3 win 10 0.88 0.88 · ·
Claude S4.6 Grok 4.3 win 10 0.96 1.00 scratch ·
Claude S4.6 GPT-5.5 win 8 0.95 1.00 scratch ·
Claude S4.6 GPT-5.5 win 10 0.96 0.88 scratch ·
Claude S4.6 GPT-5.5 win 8 1.00 1.00 · ·
S2 — Mirrored
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Mistral Lg GLM 5.1 win 21 0.56 0.94 scratch ·
Cohere Cmd-A GLM 5.1 win 9 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 9 1.00 1.00 · ·
DeepSeek V4 Pro GLM 5.1 win 11 0.90 0.88 · ·
DeepSeek V4 Pro Mistral Lg win 11 0.90 0.88 · ·
DeepSeek V4 Pro Cohere Cmd-A win 7 1.00 1.00 · ·
Gemini 3.1 Pro GLM 5.1 win 9 1.00 1.00 · ·
Gemini 3.1 Pro Mistral Lg win 9 0.88 1.00 · ·
Gemini 3.1 Pro Cohere Cmd-A win 11 0.90 0.88 · ·
Gemini 3.1 Pro DeepSeek V4 Pro win 11 0.90 0.88 · ·
Grok 4.3 GLM 5.1 win 11 0.90 0.88 · ·
Grok 4.3 Mistral Lg win 11 0.90 0.88 · ·
Grok 4.3 Cohere Cmd-A win 13 0.83 0.80 · ·
Grok 4.3 DeepSeek V4 Pro win 15 0.68 0.73 2×incomplete 1×incomplete
Grok 4.3 Gemini 3.1 Pro win 9 0.88 1.00 · ·
GPT-5.5 GLM 5.1 win 9 1.00 1.00 · ·
GPT-5.5 Mistral Lg win 7 1.00 1.00 · ·
GPT-5.5 Cohere Cmd-A win 9 0.88 1.00 · ·
GPT-5.5 DeepSeek V4 Pro win 9 0.88 1.00 · ·
GPT-5.5 Gemini 3.1 Pro win 9 0.88 1.00 · ·
GPT-5.5 Grok 4.3 win 9 1.00 1.00 · ·
Claude Opus 4.8 GLM 5.1 win 9 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg win 9 1.00 1.00 · ·
Claude Opus 4.8 Cohere Cmd-A win 7 1.00 1.00 · ·
Claude Opus 4.8 DeepSeek V4 Pro win 9 0.96 1.00 scratch ·
Claude Opus 4.8 Gemini 3.1 Pro win 9 1.00 1.00 · ·
Claude Opus 4.8 Grok 4.3 win 7 1.00 1.00 · ·
Claude Opus 4.8 GPT-5.5 win 9 1.00 1.00 · ·
Gemini 3.1 Pro Cohere Cmd-A win 9 0.88 1.00 · ·
Grok 4.3 GLM 5.1 win 11 0.88 0.88 1×incomplete ·
GPT-5.5 DeepSeek V4 Pro win 9 0.88 1.00 · ·
Claude Opus 4.8 Mistral Lg win 7 1.00 1.00 · ·
DeepSeek V4 Claude S4.6 win 9 1.00 1.00 · ·
DeepSeek V4 Claude S4.6 win 9 0.88 1.00 · ·
DeepSeek V4 Claude S4.6 win 11 0.78 0.88 1×incomplete ·
GPT-5.5 Gemini 3.1 win 9 0.86 1.00 1×incomplete ·
GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · ·
Claude S4.6 Grok 4.3 win 11 0.73 0.72 scratch 2×incomplete 2×incomplete
Claude S4.6 Grok 4.3 win 11 0.97 1.00 scratch ·
Claude S4.6 Grok 4.3 win 13 0.67 0.77 scratch 3×incomplete 2×incomplete
Claude S4.6 GPT-5.5 win 9 0.96 1.00 scratch ·
Claude S4.6 GPT-5.5 win 11 0.73 0.72 scratch 2×incomplete 2×incomplete
Claude S4.6 GPT-5.5 win 9 0.82 1.00 scratch 1×incomplete ·
Cohere Cmd-A Mistral Lg win 9 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 13 0.83 1.00 scratch ·
Cohere Cmd-A Mistral Lg win 9 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 12 0.98 1.00 scratch ·
DeepSeek V4 Claude S4.6 win 13 0.80 0.80 · scratch
DeepSeek V4 Claude S4.6 win 11 0.90 0.88 · ·
DeepSeek V4 Claude S4.6 win 14 0.87 0.82 scratch scratch
GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 win 9 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 win 9 0.88 1.00 · ·
Claude S4.6 Grok 4.3 win 9 0.96 1.00 scratch ·
Claude S4.6 Grok 4.3 win 9 0.96 1.00 scratch ·
Claude S4.6 Grok 4.3 win 9 0.96 1.00 scratch ·
Claude S4.6 GPT-5.5 win 11 0.97 1.00 scratch ·
Claude S4.6 GPT-5.5 win 11 0.97 1.00 scratch ·
Claude S4.6 GPT-5.5 win 11 0.97 1.00 scratch ·
S3 — Combined
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Cohere Cmd-A DeepSeek V4 Pro unknown 1 1.00 0.00 · ·
Mistral Lg GLM 5.1 win 8 0.65 1.00 2×incomplete ·
Cohere Cmd-A GLM 5.1 win 12 0.57 0.90 2×incomplete ·
Cohere Cmd-A Mistral Lg unstable 27 0.41 0.52 10×incomplete 1×recovered scratch 1×incomplete 1×recovered
DeepSeek V4 Pro GLM 5.1 win 12 0.57 0.90 2×incomplete ·
DeepSeek V4 Pro Mistral Lg win 10 0.62 1.00 1×incomplete ·
DeepSeek V4 Pro Cohere Cmd-A win 29 0.49 0.90 4×incomplete 1×recovered
Gemini 3.1 Pro GLM 5.1 win 12 0.57 0.90 2×incomplete ·
Gemini 3.1 Pro Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Gemini 3.1 Pro Cohere Cmd-A win 12 0.68 0.83 1×incomplete ·
Gemini 3.1 Pro DeepSeek V4 Pro win 12 0.57 0.87 2×incomplete 1×premature
Grok 4.3 GLM 5.1 win 12 0.57 0.90 2×incomplete ·
Grok 4.3 Mistral Lg win 12 0.57 0.83 2×incomplete ·
Grok 4.3 Cohere Cmd-A win 12 0.57 0.83 2×incomplete ·
Grok 4.3 DeepSeek V4 Pro win 10 0.62 1.00 1×incomplete ·
Grok 4.3 Gemini 3.1 Pro win 10 0.62 0.88 1×incomplete ·
GPT-5.5 GLM 5.1 win 10 0.62 1.00 1×incomplete ·
GPT-5.5 Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
GPT-5.5 Cohere Cmd-A win 16 0.59 0.75 3×incomplete ·
GPT-5.5 DeepSeek V4 Pro win 10 0.62 0.88 1×incomplete ·
GPT-5.5 Gemini 3.1 Pro win 10 0.62 1.00 1×incomplete ·
GPT-5.5 Grok 4.3 win 10 0.62 1.00 1×incomplete ·
Claude Opus 4.8 GLM 5.1 win 10 0.62 0.88 1×incomplete ·
Claude Opus 4.8 Mistral Lg win 12 0.57 0.88 2×incomplete 1×incomplete
Claude Opus 4.8 Cohere Cmd-A unstable 22 0.76 0.45 2×incomplete ·
Claude Opus 4.8 DeepSeek V4 Pro win 10 0.62 1.00 1×incomplete ·
Claude Opus 4.8 Gemini 3.1 Pro win 10 0.62 0.88 1×incomplete ·
Claude Opus 4.8 Grok 4.3 win 14 0.53 0.91 3×incomplete ·
Claude Opus 4.8 GPT-5.5 win 10 0.62 0.88 1×incomplete ·
Gemini 3.1 Pro Cohere Cmd-A win 14 0.60 0.86 memory 2×incomplete ·
Grok 4.3 GLM 5.1 win 10 0.62 1.00 1×incomplete ·
GPT-5.5 DeepSeek V4 Pro win 13 0.53 1.00 3×incomplete ·
Claude Opus 4.8 Mistral Lg win 12 0.57 1.00 2×incomplete ·
Cohere Cmd-A DeepSeek V4 Pro unknown 3 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg unknown 1 1.00 0.00 · ·
Cohere Cmd-A Mistral Lg win 14 0.53 1.00 3×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 20 0.46 0.51 6×incomplete 1×recovered scratch 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.62 1.00 1×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A DeepSeek V4 win 18 0.48 0.87 5×incomplete ·
Grok 4.3 DeepSeek V4 unknown 0 0.00 0.00 · ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 16 0.50 0.90 4×incomplete scratch
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 16 0.50 0.82 4×incomplete scratch
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 20 0.56 0.87 6×incomplete ·
Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 4×incomplete ·
Gemini 3.1 DeepSeek V4 unstable 24 0.49 0.57 7×incomplete 5×premature
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.68 1.00 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 14 0.60 0.91 scratch 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 18 0.48 0.84 5×incomplete scratch
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 10 0.62 0.88 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 13 0.53 1.00 3×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 4×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 0.90 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 14 0.49 0.91 2×incomplete 1×impossible ·
Gemini 3.1 DeepSeek V4 win 11 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 0.90 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
Gemini 3.1 DeepSeek V4 win 13 0.53 1.00 3×incomplete ·
Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 4×incomplete ·
Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 10 0.62 1.00 1×incomplete ·
Gemini 3.1 DeepSeek V4 win 16 0.50 0.85 4×incomplete ·
Gemini 3.1 DeepSeek V4 win 12 0.57 1.00 2×incomplete ·
DeepSeek V4 Grok 4.3 win 10 0.62 0.88 1×incomplete ·
DeepSeek V4 Cohere Cmd-A win 22 0.59 0.93 3×incomplete 1×recovered
DeepSeek V4 Cohere Cmd-A win 14 0.60 0.86 scratch 2×incomplete ·
DeepSeek V4 Cohere Cmd-A win 19 0.60 0.92 scratch 2×incomplete ·
DeepSeek V4 Cohere Cmd-A win 12 0.57 1.00 2×incomplete ·
DeepSeek V4 Cohere Cmd-A win 12 0.57 1.00 2×incomplete ·
DeepSeek V4 Cohere Cmd-A win 14 0.60 1.00 scratch 2×incomplete ·
DeepSeek V4 Cohere Cmd-A win 12 0.57 1.00 2×incomplete ·
Cohere Cmd-A Mistral Lg unstable 28 0.46 0.56 9×incomplete scratch
Cohere Cmd-A Mistral Lg win 16 0.68 1.00 2×incomplete ·
Cohere Cmd-A Mistral Lg win 14 0.63 0.91 2×incomplete ·
Cohere Cmd-A Mistral Lg win 14 0.73 0.91 1×incomplete ·
Cohere Cmd-A Mistral Lg win 14 0.53 0.61 3×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 14 0.73 0.91 1×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.88 1×incomplete ·
Cohere Cmd-A Mistral Lg win 20 0.46 0.64 6×incomplete ·
Cohere Cmd-A Mistral Lg win 14 0.53 0.61 3×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 14 0.53 1.00 3×incomplete ·
Cohere Cmd-A Mistral Lg win 18 0.48 0.52 5×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 16 0.50 0.64 4×incomplete scratch 1×incomplete
Cohere Cmd-A Mistral Lg win 14 0.63 0.91 2×incomplete ·
Cohere Cmd-A Mistral Lg unstable 27 0.41 0.25 10×incomplete 1×recovered 1×incomplete 1×recovered
Cohere Cmd-A Mistral Lg win 10 0.62 0.88 1×incomplete ·
Cohere Cmd-A Mistral Lg unstable 17 0.48 0.41 5×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 14 0.63 1.00 2×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.88 1×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.88 1×incomplete ·
Cohere Cmd-A Mistral Lg win 16 0.50 0.54 4×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.62 0.88 1×incomplete ·
Cohere Cmd-A Mistral Lg win 12 0.57 0.87 2×incomplete scratch
Cohere Cmd-A Mistral Lg win 14 0.63 0.91 2×incomplete ·
Cohere Cmd-A Mistral Lg win 18 0.63 1.00 3×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.88 1×incomplete ·
Cohere Cmd-A Mistral Lg win 18 0.48 0.48 5×incomplete 1×incomplete 1×recovered
Cohere Cmd-A Mistral Lg win 14 0.73 0.91 1×incomplete ·
Cohere Cmd-A Mistral Lg win 20 0.46 0.90 6×incomplete scratch
Cohere Cmd-A Mistral Lg win 19 0.74 0.89 2×incomplete ·
Cohere Cmd-A Mistral Lg win 28 0.41 0.38 10×incomplete 1×incomplete 2×recovered
Cohere Cmd-A Mistral Lg win 13 0.53 1.00 3×incomplete ·
Cohere Cmd-A Mistral Lg win 14 0.63 1.00 2×incomplete ·
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A DeepSeek V4 win 12 0.57 0.90 2×incomplete ·
DeepSeek V4 Claude S4.6 win 12 0.57 0.73 2×incomplete scratch 2×incomplete
DeepSeek V4 Claude S4.6 win 14 0.53 0.77 3×incomplete scratch 2×incomplete
DeepSeek V4 Claude S4.6 win 14 0.54 0.77 2×incomplete scratch 2×incomplete
GPT-5.5 Gemini 3.1 win 12 0.57 0.90 2×incomplete ·
GPT-5.5 Gemini 3.1 win 14 0.73 0.80 1×incomplete 1×premature
GPT-5.5 Gemini 3.1 win 14 0.60 0.91 memory 2×incomplete ·
Claude S4.6 Grok 4.3 win 18 0.64 1.00 scratch 2×incomplete ·
Claude S4.6 Grok 4.3 win 18 0.64 1.00 scratch 2×incomplete ·
Claude S4.6 Grok 4.3 win 8 0.68 0.82 1×incomplete 1×incomplete
Claude S4.6 GPT-5.5 win 10 0.74 1.00 1×incomplete ·
Claude S4.6 GPT-5.5 win 12 0.63 1.00 scratch 2×incomplete ·
Claude S4.6 GPT-5.5 win 8 0.68 0.82 1×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.62 0.86 1×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg win 20 0.46 0.73 6×incomplete 1×incomplete
Cohere Cmd-A Mistral Lg unstable 28 0.46 0.32 9×incomplete 1×recovered scratch 1×incomplete
Cohere Cmd-A Mistral Lg win 19 0.48 1.00 8×incomplete ·
DeepSeek V4 Claude S4.6 win 11 0.62 0.85 1×incomplete scratch 1×incomplete
DeepSeek V4 Claude S4.6 unstable 29 0.70 0.52 · 17×incomplete 1×recovered
DeepSeek V4 Claude S4.6 unstable 27 0.70 0.73 · 9×incomplete
GPT-5.5 Gemini 3.1 win 12 0.57 0.90 2×incomplete ·
GPT-5.5 Gemini 3.1 win 12 0.68 0.90 1×incomplete ·
GPT-5.5 Gemini 3.1 win 12 0.55 0.85 3×incomplete scratch 1×incomplete
Claude S4.6 Grok 4.3 win 16 0.70 0.80 scratch 2×incomplete 2×premature
Claude S4.6 Grok 4.3 win 14 0.59 1.00 scratch 3×incomplete ·
Claude S4.6 Grok 4.3 win 20 0.59 1.00 scratch 3×incomplete ·
Claude S4.6 GPT-5.5 win 32 0.43 0.75 scratch 1×incomplete 7×impossible 5×premature
Claude S4.6 GPT-5.5 win 12 0.63 1.00 scratch 2×incomplete ·
Claude S4.6 GPT-5.5 win 12 0.63 1.00 scratch 2×incomplete ·
S4a — Relay Lookup
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Mistral Lg GLM 5.1 win 6 1.00 1.00 · ·
Cohere Cmd-A GLM 5.1 win 10 0.76 0.88 · ·
Cohere Cmd-A Mistral Lg win 10 0.76 0.88 · ·
DeepSeek V4 Pro GLM 5.1 win 6 1.00 1.00 · ·
DeepSeek V4 Pro Mistral Lg win 6 1.00 1.00 · ·
DeepSeek V4 Pro Cohere Cmd-A win 8 0.85 0.85 · ·
Gemini 3.1 Pro GLM 5.1 win 8 1.00 1.00 · ·
Gemini 3.1 Pro Mistral Lg win 6 1.00 1.00 · ·
Gemini 3.1 Pro Cohere Cmd-A win 8 1.00 0.85 · ·
Gemini 3.1 Pro DeepSeek V4 Pro win 8 1.00 1.00 · ·
Grok 4.3 GLM 5.1 win 6 1.00 1.00 · ·
Grok 4.3 Mistral Lg win 6 1.00 1.00 · ·
Grok 4.3 Cohere Cmd-A win 8 0.85 0.85 · ·
Grok 4.3 DeepSeek V4 Pro win 9 0.76 1.00 · ·
Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 GLM 5.1 win 8 1.00 1.00 · ·
GPT-5.5 Mistral Lg win 6 1.00 1.00 · ·
GPT-5.5 Cohere Cmd-A unstable 14 1.00 0.29 · ·
GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·
Claude Opus 4.8 GLM 5.1 win 6 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg win 6 1.00 1.00 · ·
Claude Opus 4.8 Cohere Cmd-A unstable 18 1.00 0.44 · ·
Claude Opus 4.8 DeepSeek V4 Pro win 12 1.00 0.73 · ·
Claude Opus 4.8 Gemini 3.1 Pro win 8 1.00 0.85 · ·
Claude Opus 4.8 Grok 4.3 win 6 1.00 1.00 · ·
Claude Opus 4.8 GPT-5.5 win 6 1.00 1.00 · ·
Gemini 3.1 Pro Cohere Cmd-A win 8 1.00 0.85 · ·
Grok 4.3 GLM 5.1 win 6 1.00 1.00 · ·
GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg unstable 14 1.00 0.29 · ·
DeepSeek V4 Claude S4.6 win 6 0.80 1.00 · ·
DeepSeek V4 Claude S4.6 win 6 0.80 1.00 · ·
DeepSeek V4 Claude S4.6 win 6 0.80 1.00 · ·
GPT-5.5 Gemini 3.1 unstable 16 1.00 0.12 · ·
GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 win 12 1.00 0.50 · 1×recovered
Claude S4.6 Grok 4.3 win 6 1.00 1.00 · ·
Claude S4.6 Grok 4.3 win 6 1.00 1.00 · ·
Claude S4.6 Grok 4.3 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 6 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 6 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 6 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg win 11 0.85 1.00 · ·
DeepSeek V4 Claude S4.6 win 10 0.83 1.00 · ·
GPT-5.5 Gemini 3.1 unstable 14 0.91 0.29 · ·
GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · ·
Claude S4.6 Grok 4.3 win 6 1.00 1.00 · ·
Claude S4.6 Grok 4.3 unstable 18 0.98 0.33 scratch ·
Claude S4.6 Grok 4.3 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
S4b — Relay Transform
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Mistral Lg GLM 5.1 win 8 1.00 0.95 · scratch
Cohere Cmd-A GLM 5.1 win 10 0.76 0.72 · 2×incomplete
Cohere Cmd-A Mistral Lg win 10 0.76 0.76 · ·
DeepSeek V4 Pro GLM 5.1 win 6 1.00 1.00 · ·
DeepSeek V4 Pro Mistral Lg win 6 1.00 1.00 · ·
DeepSeek V4 Pro Cohere Cmd-A win 14 0.73 0.91 1×incomplete ·
Gemini 3.1 Pro GLM 5.1 win 6 1.00 1.00 · ·
Gemini 3.1 Pro Mistral Lg win 8 1.00 0.95 · scratch
Gemini 3.1 Pro Cohere Cmd-A unstable 14 1.00 0.29 · ·
Gemini 3.1 Pro DeepSeek V4 Pro win 8 1.00 1.00 · ·
Grok 4.3 GLM 5.1 win 8 0.70 0.82 · 1×incomplete
Grok 4.3 Mistral Lg win 6 1.00 1.00 · ·
Grok 4.3 Cohere Cmd-A win 6 1.00 1.00 · ·
Grok 4.3 DeepSeek V4 Pro win 8 1.00 0.85 · ·
Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 GLM 5.1 win 8 0.85 0.82 · 1×incomplete
GPT-5.5 Mistral Lg unstable 14 1.00 0.29 · ·
GPT-5.5 Cohere Cmd-A unstable 18 0.80 0.33 · ·
GPT-5.5 DeepSeek V4 Pro win 8 1.00 0.85 · ·
GPT-5.5 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 Grok 4.3 win 8 1.00 0.85 · ·
Claude Opus 4.8 GLM 5.1 win 10 0.88 0.82 · scratch 1×incomplete
Claude Opus 4.8 Mistral Lg win 8 0.85 0.85 · ·
Claude Opus 4.8 Cohere Cmd-A win 10 0.88 0.88 · ·
Claude Opus 4.8 DeepSeek V4 Pro win 10 0.88 1.00 · ·
Claude Opus 4.8 Gemini 3.1 Pro win 8 0.85 0.85 · ·
Claude Opus 4.8 Grok 4.3 win 8 0.70 0.85 · ·
Claude Opus 4.8 GPT-5.5 win 8 0.85 0.82 · 1×incomplete
Gemini 3.1 Pro Cohere Cmd-A win 12 1.00 0.83 · scratch
Grok 4.3 GLM 5.1 win 6 1.00 1.00 · ·
GPT-5.5 DeepSeek V4 Pro win 8 1.00 0.85 · ·
Claude Opus 4.8 Mistral Lg win 8 0.85 0.85 · ·
DeepSeek V4 Claude S4.6 win 8 0.85 0.95 · scratch
DeepSeek V4 Claude S4.6 win 10 0.88 0.82 · scratch 1×incomplete
DeepSeek V4 Claude S4.6 win 10 0.88 0.82 · scratch 1×incomplete
GPT-5.5 Gemini 3.1 win 8 0.85 0.95 · scratch
GPT-5.5 Gemini 3.1 win 6 1.00 1.00 · ·
Claude S4.6 Grok 4.3 win 8 0.85 1.00 · ·
Claude S4.6 Grok 4.3 win 8 0.85 1.00 · ·
Claude S4.6 Grok 4.3 win 8 0.85 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Cohere Cmd-A Mistral Lg unstable 14 1.00 0.29 · ·
Cohere Cmd-A Mistral Lg unstable 13 1.00 0.33 · ·
Cohere Cmd-A Mistral Lg unstable 14 1.00 0.14 · ·
Cohere Cmd-A Mistral Lg unknown 4 1.00 0.00 · ·
DeepSeek V4 Claude S4.6 win 10 0.88 0.82 · scratch 1×incomplete
GPT-5.5 Gemini 3.1 win 8 1.00 0.95 · scratch
GPT-5.5 Gemini 3.1 win 8 1.00 0.95 · scratch
Claude S4.6 Grok 4.3 win 6 1.00 1.00 · ·
Claude S4.6 Grok 4.3 win 8 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
Claude S4.6 GPT-5.5 win 6 1.00 1.00 · ·
S5a — Do Not Press
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Mistral Lg GLM 5.1 win 8 1.00 1.00 · ·
Cohere Cmd-A GLM 5.1 win 10 0.64 0.86 · 1×incomplete
Cohere Cmd-A Mistral Lg win 10 0.76 0.72 · 2×incomplete
DeepSeek V4 Pro GLM 5.1 win 8 1.00 1.00 · ·
DeepSeek V4 Pro Mistral Lg win 10 1.00 0.72 · 2×incomplete
DeepSeek V4 Pro Cohere Cmd-A win 10 0.76 0.88 · ·
Gemini 3.1 Pro GLM 5.1 win 14 1.00 0.91 · ·
Gemini 3.1 Pro Mistral Lg win 10 1.00 0.72 · 2×incomplete
Gemini 3.1 Pro Cohere Cmd-A win 10 1.00 0.88 · ·
Gemini 3.1 Pro DeepSeek V4 Pro win 6 1.00 1.00 · ·
Grok 4.3 GLM 5.1 win 8 1.00 0.85 · ·
Grok 4.3 Mistral Lg win 10 1.00 0.72 · 2×incomplete
Grok 4.3 Cohere Cmd-A win 10 0.88 0.88 · ·
Grok 4.3 DeepSeek V4 Pro win 6 1.00 0.77 · 1×incomplete
Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 GLM 5.1 win 6 1.00 1.00 · ·
GPT-5.5 Mistral Lg win 10 1.00 0.72 · 2×incomplete
GPT-5.5 Cohere Cmd-A win 10 1.00 0.88 · ·
GPT-5.5 DeepSeek V4 Pro win 8 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 Pro win 8 1.00 1.00 · ·
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·
Claude Opus 4.8 GLM 5.1 win 6 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg win 10 1.00 0.86 · 1×incomplete
Claude Opus 4.8 Cohere Cmd-A win 8 1.00 0.85 · ·
Claude Opus 4.8 DeepSeek V4 Pro win 6 1.00 0.80 · ·
Claude Opus 4.8 Gemini 3.1 Pro win 6 1.00 1.00 · ·
Claude Opus 4.8 Grok 4.3 win 6 1.00 1.00 · ·
Claude Opus 4.8 GPT-5.5 win 6 1.00 1.00 · ·
GLM 5.1 Gemini 3.1 win 6 1.00 1.00 · ·
Llama 3.3 70B Llama 3.3 70B unknown 0 0.00 0.00 · ·
Llama 3.3 70B Llama 3.3 70B unknown 0 0.00 0.00 · ·
Llama 3.3 70B Llama 3.3 70B unknown 0 0.00 0.00 · ·
Llama 3.3 70B Mistral Lg unknown 1 0.00 1.00 · ·
DeepSeek V4 Gemini 3.1 win 12 0.70 0.78 · 1×incomplete
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·
S5b — Faulty Relay
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Mistral Lg GLM 5.1 win 10 0.96 0.86 scratch 1×incomplete
Cohere Cmd-A GLM 5.1 win 18 0.67 0.93 · ·
Cohere Cmd-A Mistral Lg win 10 0.64 1.00 · ·
DeepSeek V4 Pro GLM 5.1 win 10 1.00 0.74 · 1×incomplete
DeepSeek V4 Pro Mistral Lg win 10 1.00 0.72 · 2×incomplete
DeepSeek V4 Pro Cohere Cmd-A win 10 0.88 0.88 · ·
Gemini 3.1 Pro GLM 5.1 win 8 1.00 1.00 · ·
Gemini 3.1 Pro Mistral Lg win 6 1.00 1.00 · ·
Gemini 3.1 Pro Cohere Cmd-A win 10 1.00 0.88 · ·
Gemini 3.1 Pro DeepSeek V4 Pro win 10 1.00 0.88 · ·
Grok 4.3 GLM 5.1 win 12 1.00 0.90 · ·
Grok 4.3 Mistral Lg win 6 1.00 1.00 · ·
Grok 4.3 Cohere Cmd-A win 16 1.00 0.93 · ·
Grok 4.3 DeepSeek V4 Pro win 8 1.00 1.00 · ·
Grok 4.3 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 GLM 5.1 win 6 1.00 1.00 · ·
GPT-5.5 Mistral Lg win 10 1.00 0.72 · 2×incomplete
GPT-5.5 Cohere Cmd-A win 10 1.00 0.88 · ·
GPT-5.5 DeepSeek V4 Pro win 6 1.00 0.80 · ·
GPT-5.5 Gemini 3.1 Pro win 6 1.00 1.00 · ·
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·
Claude Opus 4.8 GLM 5.1 win 10 1.00 0.88 · ·
Claude Opus 4.8 Mistral Lg win 10 1.00 0.72 · 2×incomplete
Claude Opus 4.8 Cohere Cmd-A win 8 1.00 0.85 · ·
Claude Opus 4.8 DeepSeek V4 Pro win 8 1.00 0.82 · 1×incomplete
Claude Opus 4.8 Gemini 3.1 Pro win 6 1.00 1.00 · ·
Claude Opus 4.8 Grok 4.3 win 6 1.00 1.00 · ·
Claude Opus 4.8 GPT-5.5 win 6 1.00 1.00 · ·
GLM 5.1 Gemini 3.1 win 8 1.00 1.00 · ·
DeepSeek V4 Gemini 3.1 win 8 0.85 0.82 · 1×incomplete
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·
S5c — Power Outage
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
Gemini 3.1 Pro GLM 5.1 unknown 9 0.00 0.84 · 2×incomplete
Gemini 3.1 Pro Mistral Lg unstable 48 1.00 0.72 · 17×incomplete 3×recovered
Gemini 3.1 Pro Cohere Cmd-A win 16 0.93 0.93 · ·
Gemini 3.1 Pro DeepSeek V4 Pro win 10 1.00 1.00 · ·
Grok 4.3 GLM 5.1 win 12 1.00 0.97 · scratch
Grok 4.3 Mistral Lg win 10 1.00 0.86 · 1×incomplete
Grok 4.3 Cohere Cmd-A win 16 0.93 0.93 · ·
Grok 4.3 DeepSeek V4 Pro win 12 1.00 1.00 · ·
Grok 4.3 Gemini 3.1 Pro win 10 1.00 1.00 · ·
GPT-5.5 GLM 5.1 win 14 1.00 1.00 · ·
GPT-5.5 Mistral Lg win 10 1.00 0.86 · 1×incomplete
GPT-5.5 Cohere Cmd-A win 16 0.85 0.93 · ·
GPT-5.5 DeepSeek V4 Pro win 10 1.00 1.00 · ·
GPT-5.5 Gemini 3.1 Pro win 10 1.00 1.00 · ·
GPT-5.5 Grok 4.3 win 10 1.00 1.00 · ·
Claude Opus 4.8 GLM 5.1 win 10 1.00 1.00 · ·
Claude Opus 4.8 Mistral Lg win 10 1.00 0.86 · 1×incomplete
Claude Opus 4.8 Cohere Cmd-A win 16 0.93 0.93 · ·
Claude Opus 4.8 DeepSeek V4 Pro win 10 1.00 0.86 · 1×incomplete
Claude Opus 4.8 Gemini 3.1 Pro win 10 1.00 1.00 · ·
Claude Opus 4.8 Grok 4.3 win 10 1.00 0.86 · 1×incomplete
Claude Opus 4.8 GPT-5.5 win 10 1.00 0.86 · 1×incomplete
GLM 5.1 Gemini 3.1 win 12 1.00 0.78 · 1×incomplete
DeepSeek V4 Gemini 3.1 win 8 1.00 0.82 · 1×incomplete
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·
S5d — Dusty Logbook
Agent A Agent B Outcome Turns A Score B Score A Behaviour B Behaviour
GLM 5.1 Gemini 3.1 win 6 1.00 1.00 · ·
DeepSeek V4 Gemini 3.1 win 8 0.85 0.82 · 1×incomplete
GPT-5.5 Grok 4.3 win 6 1.00 1.00 · ·