Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13

Running on CPU Upgrade

216

216

MMLU-Pro Leaderboard

🥇

More advanced and challenging multi-task evaluation
Running

49

49

Stick To Your Role! Leaderboard

🎭

Benchmarking LLMs on the stability of simulated populations
Running

52

52

ZeroEval Leaderboard

📊

Embed and use ZeroEval for evaluation tasks
Running

26

26

Decentralized Arena Leaderboard

🥇

Display model leaderboard evaluations
Runtime error

423

423

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations
Running

228

228

GPU Poor LLM Arena

🏆

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

116

116

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

13.4k

13.4k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

409

409

TTS Spaces Arena

🤗

Blind vote on HF TTS models!