--- license: mit ---

Evaluating Text Creativity across Diverse Domains:
A Dataset and a Large Language Model Evaluator

## 🔥 News

[2025, Sep 01]: 🎉🎉We release the dataset CreataSet and out creativity evaluation model CrEval-7b & CrEval-14b. Feel free to use!
[2025, May 25]: 🎉🎉Our arXiv paper is available! Check it out for more details.

## 📍 Brief Intro We introduce **CrEval**, the 1st LLM-based evaluator for pairwise creativity evaluation, outperforming GPT-4o by 18.7% in human agreement, and **CreataSet**, a large-scale dataset of over **1M** creative instruction-response pairs across **87** domains. CrEval is a creativity evaluation model based on a pairwise comparison protocol, designed to advance automated evaluation of text creativity. CreataSet can facilitate the meta-evaluation of pairwise comparison models for assessing text creativity. Also, it can be used for training creative generation models. More details please refer to our [paper](https://arxiv.org/abs/2505.19236). ## 🤗 Quickstart You can use our CrEval model via the inference methods provided by [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). Please refer to our [GitHub repo](https://github.com/Aman-4-Real/CrEval) for more details.

> *We respect and uphold the usage terms of the original data providers. If you believe that any part of this dataset affects your legal rights or raises other concerns, please reach out to us. We will carefully review your request and respond without delay.*

Please cite our paper if you find our work useful.

``` @article{cao2025evaluating, title={Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator}, author={Cao, Qian and Wang, Xiting and Yuan, Yuzhuo and Liu, Yahui and Luo, Fang and Song, Ruihua}, journal={arXiv preprint arXiv:2505.19236}, year={2025} } ``` For any questions, please feel free to reach me at caoqian4real@ruc.edu.cn.

Evaluating Text Creativity across Diverse Domains:A Dataset and a Large Language Model Evaluator

Please cite our paper if you find our work useful.

Evaluating Text Creativity across Diverse Domains:
A Dataset and a Large Language Model Evaluator