ArenaLite / guide_mds /input_jsonls_en.md
sonsus's picture
rebrand: varco-arena -> arena-lite
45f8fc7
|
raw
history blame
2.33 kB

[EN] Guide for Input .jsonl Files

If you have five models to compare, upload five .jsonl files.

  • ๐Ÿ’ฅAll .jsonl files must have the same number of rows.
  • ๐Ÿ’ฅThe model_id field must be different for each file and unique within each file.
  • ๐Ÿ’ฅEach .jsonl file should have different generated, model_id from the other files. instruction, task should be the same.

Required .jsonl Fields

  • Reserved Fields (Mandatory)
    • model_id: The name of the model being evaluated. (Recommended to be short)
    • instruction: The instruction given to the model. This corresponds to the test set prompt (not the evaluation prompt).
    • generated: Enter the response generated by the model for the test set instruction.
    • task: Used to group and display overall results as a subset. Can be utilized when you want to use different evaluation prompts per row.
  • Additional
    • Depending on the evaluation prompt you use, you can utilize other additional fields. You can freely add them to your .jsonl files, avoiding the keywords mentioned above.
      • Example: For translation_pair.yaml and translation_fortunecookie.yaml prompts, the source_lang and target_lang fields are read from the .jsonl and utilized.

For example, when evaluating with the translation_pair prompt, each .jsonl file looks like this:

# model1.jsonl
{"model_id": "๋ชจ๋ธ1", "task": "์˜ํ•œ", "instruction": "์–ด๋””๋กœ ๊ฐ€์•ผํ•˜์˜ค", "generated": "Where should I go", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "๋ชจ๋ธ1", "task": "ํ•œ์˜", "instruction": "1+1?", "generated": "1+1?", "source_lang": "English", "target_lang": "Korean"} 

# model2.jsonl -* model1.jsonl๊ณผ `instruction`์€ ๊ฐ™๊ณ  `generated`, `model_id` ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค!
{"model_id": "๋ชจ๋ธ2", "task": "์˜ํ•œ", "instruction": "์–ด๋””๋กœ ๊ฐ€์•ผํ•˜์˜ค", "generated": "๊ธ€์Ž„๋‹ค", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "๋ชจ๋ธ2", "task": "ํ•œ์˜", "instruction": "1+1?", "generated": "2", "source_lang": "English", "target_lang": "Korean"} 
...
..

On the other hand, when evaluating with the llmbar prompt, fields like source_lang and target_lang are not used, similar to translation evaluation, and naturally, you don't need to add them to your .jsonl.