guide_mds/input_jsonls_en.md · NCSOFT/ArenaLite at f0c495595bf63447ea408072cad3868b83d06b5a

[EN] Guide for Input .jsonl Files

If you have five models to compare, upload five .jsonl files.

💥All .jsonl files must have the same number of rows.
💥The model_id field must be different for each file and unique within each file.
💥Each .jsonl file should have different generated, model_id from the other files. instruction, task should be the same.

Required .jsonl Fields

Reserved Fields (Mandatory)
- model_id: The name of the model being evaluated. (Recommended to be short)
- instruction: The instruction given to the model. This corresponds to the test set prompt (not the evaluation prompt).
- generated: Enter the response generated by the model for the test set instruction.
- task: Used to group and display overall results as a subset. Can be utilized when you want to use different evaluation prompts per row.
Additional
- Depending on the evaluation prompt you use, you can utilize other additional fields. You can freely add them to your .jsonl files, avoiding the keywords mentioned above.
  - Example: For translation_pair.yaml and translation_fortunecookie.yaml prompts, the source_lang and target_lang fields are read from the .jsonl and utilized.

For example, when evaluating with the translation_pair prompt, each .jsonl file looks like this:

# model1.jsonl
{"model_id": "모델1", "task": "영한", "instruction": "어디로 가야하오", "generated": "Where should I go", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "모델1", "task": "한영", "instruction": "1+1?", "generated": "1+1?", "source_lang": "English", "target_lang": "Korean"} 

# model2.jsonl -* model1.jsonl과 `instruction`은 같고 `generated`, `model_id` 는 다릅니다!
{"model_id": "모델2", "task": "영한", "instruction": "어디로 가야하오", "generated": "글쎄다", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "모델2", "task": "한영", "instruction": "1+1?", "generated": "2", "source_lang": "English", "target_lang": "Korean"} 
...
..

On the other hand, when evaluating with the llmbar prompt, fields like source_lang and target_lang are not used, similar to translation evaluation, and naturally, you don't need to add them to your .jsonl.