o 'q[e@szddlmZddlmZddlZeGdddZddZedd Gd d d Zedd Gd d d Z ej j dej j dej j dejj dejj dejj dejj dejj dejj dejj dejj dejj dejj di Zej j dej j dej j dejj dejj dejj dejj dejj dejj dejj dejj dejj dejj di ZeGd d!d!ZGd"d#d#eZeGd$d%d%ZGd&d'd'eZd(d)eeDZd*d)eeDZd+d)eeDZ d,d)eeDZ!d-d)ee DZ"d.d)ee DZ#d/d)eDZ$ej%d0dd1d2ej%dd3d1d2ej%d3d4d1d2ej%d4d5d1d2ej%d5d6d1d2ej%d6d7d1d2ej%d7d8d1d2ej%d8d9d1d2d:Z&dS);) dataclass)EnumNc@sJeZdZUeed<eed<eed<dZeed<dZeed<dZeed<dS) ColumnContentnametypedisplayed_by_defaultFhidden never_hiddendummyN) __name__ __module__ __qualname__str__annotations__boolrr r rrI/Users/pasquale/workspace/hallucinations-leaderboard/src/display/utils.pyr s   rcCsdd|jDS)NcSs4g|]\}}|dddkr|dddkr|qS)N__r).0kvrrr s4zfields..)__dict__items)Z raw_classrrrfieldssrT)frozenc@s eZdZedddddZedddddZedddZed ddZed ddZed ddZ ed ddZ ed ddZ edddZ edddZ edddZedddZeddddZedddZedddZedddZedddZedddZeddddZedddddZdS)AutoEvalColumnTrT)r ModelmarkdownuAverage ⬆️numberZARCZ HellaSwagZMMLUZ TruthfulQAZ WinograndeZGSM8KZDROPTypeFZ Architecturez Weight type Precisionz Hub Licensez #Params (B)u Hub ❤️zAvailable on the hubrz Model shaZmodel_name_for_query)r N)r r r rmodel_type_symbolmodelaveragearc hellaswagmmlu truthfulqa winograndegsm8kdrop model_type architecture weight_type precisionlicenseparamslikes still_on_hubrevisionr rrrrrs.                rc@sTeZdZedddZedddZedddZedddZed dd Zed ddZ d S) EvalQueueColumnr&r!Tr7rprivaterr2r1OriginalstatusN) r r r rr&r7r9r2r1r;rrrrr82s     r8z

Baseline

zN/Ag?@g9@gI@gzG?gGz?baselinez

Human performance

g0W@gT@gW@g33333sV@gW@dg{GX@Zhuman_baselinec@seZdZUeed<eed<dS)ModelTypeDetailsrsymbolNr r r rrrrrrr?es  r?c@s^eZdZedddZedddZedddZedd dZed d dZdd dZ e ddZ dS) ModelType pretrained🟢)rr@ fine-tuned🔶instruction-tuned⭕RL-tuned🟦r=? cCs|jj||jjS)N)valuer@r)self separatorrrrto_strrszModelType.to_strcCs^d|vsd|vr tjSd|vsd|vrtjSd|vsd|vr!tjSd|vs)d|vr,tjStjS) NrErFrCrDrIrJrGrH)rBFTPTRLIFTUnknownrrrrfrom_struszModelType.from_strN)rL) r r r r?rRrQrTrSrUrP staticmethodrWrrrrrBks      rBc@s&eZdZUeed<eed<eed<dS)Task benchmarkmetriccol_nameNrArrrrrYs  rYc@s|eZdZeddejjZeddejjZeddejjZeddej jZ eddej jZ ed dej jZ ed d ej jZ d S) Tasksz arc:challengeZacc_normr)Z hendrycksTestaccz truthfulqa:mcZmc2r,r-r.f1N) r r r rYrr(rr)r*r+r,r-r.rrrrr]sr]cCg|]}|js|jqSr)rrrcrrrrrcCr`r)rrrarrrrrccCg|] }|jr|js|jqSr)rrrrarrrrcCrdr)rrrrarrrrrecCg|]}|jqSr)rrarrrrcCrfrrVrarrrrrgcCsg|]}|jjqSr)rMr\)rtrrrrsright)closedr -Fi')rKz~1.5z~3z~7z~13z~35z~60z70+)' dataclassesrenumrpandaspdrrrr8r&rr7r2r'r(r)r*r+r,r-r.r r/Z baseline_rowZhuman_baseline_rowr?rBrYr]COLSTYPESZ COLS_LITEZ TYPES_LITE EVAL_COLS EVAL_TYPESBENCHMARK_COLSIntervalNUMERIC_INTERVALSrrrrsx