o g @s<ddlZddlZddlZddlmZeddZddeDZWdn1s)wYeddZddeDZ Wdn1sEwYee d Z dZ d de DdZ d Zd d ZddZddZeZe9ejdd eeWdn1swYejddWdn1swYWdn1swYe\ejdd!ejdde Ddde DddddZWdn1swYejddejdde e e ddZWdn 1s wYWdn 1swYeejddejdde e e ddZWdn 1sCwYejdd!ejd d!d"iee e e de e e d#d$ZWdn 1sswYeRejddejd%de e e d&dZWdn 1swYejddejd'de e e d(dZWdn 1swYWdn 1swYWdn 1swYe d)Z!e!j"eegeeeeegd*ej#eegeeeeegd*Wdn 1swYe$dS)+N)SequenceMatcherzqwen_gsm8k_output.jsonlrcCg|]}t|qSjsonloads.0linerrr zphi4_gsm8k_output.jsonlcCrrrr rrr r r)zQwen/Qwen2.5-14Bzmicrosoft/phi-4cCg|]}|qSrrr model_namerrr r a This Space is inspired by [Luis Hunt's](https://www.linkedin.com/posts/louiswhunt_see-below-for-6882-pages-of-mmlu-and-gsm8k-activity-7281011488692047872-fWCE?utm_source=share&utm_medium=member_desktop) post. He highlights how current top performing models from major vendors are contaminated with benchmark data that is supposed to be used to assess their performance. This space aims to partially reproduce this work. I chose to look at the contamination of **Qwen/Qwen2.5-14B** and **microsoft/phi-4** by **GSM8K** dataset. For **Qwen/Qwen2.5-14B** I found **729** GSM8K examples that had a least a 0.9 text similarity ratio between generated and original. For **microsoft/phi-4** I found **172** GSM8K examples that had a least a 0.9 text similarity ratio between generated and original. cCstd||}d}g}|D]%\}}}||kr"||||df|||||df||}q||t|dkrK||||ddf|dd}|S)Nr)rget_matching_blocksappendlen)originaloutputmatcherlefthighlighted_sequence_jnrrr find_similar_chunks#s   r cCs>tt|}t|d|d}|d|d||d|dgSNrrpromptsimilarity_ratioseed)randomchoice models_datar )selected_model new_examplehighlighted_outputrrr next_example1sr+cCs<t|t}t|d|d}|d|d||d|dgSr!)r'starting_indexr )r(exampler*rrr change_model?s r.r)scalecCrrrrrrr r VrcCrrrrrrr r WrTModel)value interactivelabelPromptFr")r3r2r1OriginalrOutput1yellowr)r3 color_mapr1zSimilarity ratior#Seedr$zAnoter example)fninputsoutputs)%rr%gradiogrdifflibropenfile qwen_dict phi4_dictr'r,keysstarting_modeldescription_textr r+r.BlocksdemoRowColumnMarkdownDropdownr(Textboxr"rHighlightedTextr similarityr$Buttonnext_btnclickchangelaunchrrrr s              ;