--- license: mit ---

Logo

Evaluating Text Creativity across Diverse Domains:
A Dataset and a Large Language Model Evaluator

homepage arXiv
arXiv arXiv arXiv arXiv

## 🔥 News
## 📍 Brief Intro We introduce **CrEval**, the 1st LLM-based evaluator for pairwise creativity evaluation, outperforming GPT-4o by 18.7% in human agreement, and **CreataSet**, a large-scale dataset of over **1M** creative instruction-response pairs across **87** domains. CrEval is a creativity evaluation model based on a pairwise comparison protocol, designed to advance automated evaluation of text creativity. CreataSet can facilitate the meta-evaluation of pairwise comparison models for assessing text creativity. Also, it can be used for training creative generation models. More details please refer to our [paper](https://arxiv.org/abs/2505.19236). ## 🤗 Quickstart You can use our CrEval model via the inference methods provided by [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). Please refer to our [GitHub repo](https://github.com/Aman-4-Real/CrEval) for more details.
> *We respect and uphold the usage terms of the original data providers. If you believe that any part of this dataset affects your legal rights or raises other concerns, please reach out to us. We will carefully review your request and respond without delay.*

Please cite our paper if you find our work useful.

``` @article{cao2025evaluating, title={Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator}, author={Cao, Qian and Wang, Xiting and Yuan, Yuzhuo and Liu, Yahui and Luo, Fang and Song, Ruihua}, journal={arXiv preprint arXiv:2505.19236}, year={2025} } ``` For any questions, please feel free to reach me at caoqian4real@ruc.edu.cn.