U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs Paper • 2412.03205 • Published Dec 4, 2024 • 16
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs Paper • 2412.03205 • Published Dec 4, 2024 • 16
Beemo: Benchmark of Expert-edited Machine-generated Outputs Paper • 2411.04032 • Published Nov 6, 2024
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection Paper • 2408.04284 • Published Aug 8, 2024 • 25
NEREL: A Russian Dataset with Nested Named Entities, Relations and Events Paper • 2108.13112 • Published Aug 30, 2021
Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models Paper • 2202.07791 • Published Feb 15, 2022
Acceptability Judgements via Examining the Topology of Attention Maps Paper • 2205.09630 • Published May 19, 2022
Vote'n'Rank: Revision of Benchmarking with Social Choice Theory Paper • 2210.05769 • Published Oct 11, 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian Paper • 2206.01583 • Published Jun 3, 2022 • 1
Artificial Text Detection via Examining the Topology of Attention Maps Paper • 2109.04825 • Published Sep 10, 2021 • 1
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark Paper • 2010.15925 • Published Oct 29, 2020
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations Paper • 2109.14017 • Published Sep 28, 2021