MMTEB: Massive Multilingual Text Embedding Benchmark Paper โข 2502.13595 โข Published Feb 19 โข 38
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions Paper โข 2502.13791 โข Published Feb 19 โข 5
Bridging the Data Provenance Gap Across Text, Speech and Video Paper โข 2412.17847 โข Published Dec 19, 2024 โข 10
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper โข 2412.02980 โข Published Dec 4, 2024 โข 15
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper โข 2407.14933 โข Published Jul 20, 2024 โข 12