WeijianQi1999 commited on
Commit
0a1e9b6
·
1 Parent(s): d57548f
auto_o4-mini_Mind2Web-Online - Leaderboard_data.csv CHANGED
@@ -5,6 +5,7 @@ Browser Use,gpt-4o-2024-08-06,Browser Use,[OSU NLP](https://arxiv.org/abs/2504.0
5
  Claude Computer Use 3.5,Claude-3-5-sonnet-20241022,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,16.1,8.1,24,2025-5-11,True,,2024-10
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,23.1,6.8,27,2025-5-11,True,,2024-07
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),75.9,41.3,27,47.3,2025-5-11,True,,2025-02
 
8
  Eko-V2,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),95.0,76.0,70.0,78.0,2025-5-24,False,Unknown evaluation method,2025-05
9
  Eko-V1,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),-,-,-,31.0,2025-5-24,False,Unknown evaluation method,2025-05
10
  Seed1.5-VL,Seed1.5-VL,ByteDance,[ByteDance](https://arxiv.org/pdf/2505.07062),-,-,-,76.4,2025-5-11,False,Evaluated by WebJudge(GPT-4o),2025-05
 
5
  Claude Computer Use 3.5,Claude-3-5-sonnet-20241022,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,16.1,8.1,24,2025-5-11,True,,2024-10
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,23.1,6.8,27,2025-5-11,True,,2024-07
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),75.9,41.3,27,47.3,2025-5-11,True,,2025-02
8
+ ACT-1,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,[Enhans](https://www.enhans.ai/),53.7,39.2,24.3,39.5,2025-7-16,True,,2025-07
9
  Eko-V2,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),95.0,76.0,70.0,78.0,2025-5-24,False,Unknown evaluation method,2025-05
10
  Eko-V1,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),-,-,-,31.0,2025-5-24,False,Unknown evaluation method,2025-05
11
  Seed1.5-VL,Seed1.5-VL,ByteDance,[ByteDance](https://arxiv.org/pdf/2505.07062),-,-,-,76.4,2025-5-11,False,Evaluated by WebJudge(GPT-4o),2025-05
human_Mind2Web-Online - Leaderboard_data.csv CHANGED
@@ -4,4 +4,5 @@ SeeAct,gpt-4o-2024-08-06,OSU,OSU NLP,60.2,25.2,8.1,30.7,2025-3-22
4
  Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,55.4,26.6,8.1,30.0,2025-3-22
5
  Claude Computer Use 3.5,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,56.6,20.3,14.9,29.0,2025-3-22
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
7
- Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
 
 
4
  Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,55.4,26.6,8.1,30.0,2025-3-22
5
  Claude Computer Use 3.5,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,56.6,20.3,14.9,29.0,2025-3-22
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
7
+ Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
8
+ ACT-1,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,65.1,46.2,23.0,45.7,2025-7-16