Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

vincentkoc 's Collections
LLM Agent and Prompt Optimizers
Tiny Datasets
Evaluation for Generative AI

Evaluation for Generative AI

updated May 20

Papers and resources that are dealing with the evaluation of large language models and generative AI.

Upvote
1

  • Humanity's Last Exam

    Paper • 2501.14249 • Published Jan 24 • 75

  • RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

    Paper • 2501.14492 • Published Jan 24 • 34

  • vincentkoc/tiny_qa_benchmark

    Viewer • Updated May 20 • 52 • 43 • 1

  • vincentkoc/tiny_qa_benchmark_pp

    Viewer • Updated May 20 • 662 • 356 • 1

  • Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation

    Paper • 2505.12058 • Published May 17 • 6

  • tinyBenchmarks: evaluating LLMs with fewer examples

    Paper • 2402.14992 • Published Feb 22, 2024 • 16
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs