PKarchive/data.pklFBZZZZZZZZZZZZZZ€}q(X expert_nameqXhigh_school_psychology_v2qXexpert_task_nameqXhigh_school_psychologyqX parent_nodeqNX expert_configq}q(Xmodify_modulesqX.*q X modify_layersq X)c_fc|c_proj|k_proj|v_proj|q_proj|out_projq X lora_rankq KX lora_alphaq K@X lora_dropoutqG?©™™™™™šXlora_init_b_randomq‰X__model_modifier__qXloraquXtraining_configq}q(X_updated_kwargsq}q(X filenamesqX{configs/wiki-mmlu/gpt2neo_1B_experts.json+configs/wiki-mmlu/gpt2neo_1B_dense.json+configs/wiki-mmlu/evol/train_experts.jsonqX cache_dirqX./cacheqXdatasetqXostapeno/adauni-v3-10k-flatqXcustom_tasks_splitsqNXdata_dirqX~/data/qX output_dirqX‰/mnt/output/projects/knowledge_modules/amlt-results/HD_2291960f-c184-4349-8206-6bd28ba08dbf_2/fine_tune_router_high_school_psychology_ai0q Xfinetune_task_nameq!Xhigh_school_psychologyq"Xexample_to_ids_pathq#NXembeddings_pathq$NXuse_task_descriptionsq%‰Xmax_num_instances_per_taskq&KdXnum_pos_examplesq'KX task_prefixq(NXexp_nameq)NX wandb_projectq*Xadauno_experts_gpt2neo_1Bq+X padding_sideq,Xrightq-Xtruncation_sideq.Xleftq/Xmax_input_lengthq0MXmax_output_lengthq1K@X num_beamsq2KXappend_another_bosq3‰X do_lowercaseq4‰X freeze_embedsq5‰Xuse_t0_templates_as_tasksq6‰Xuse_t0_few_shot_training_setq7‰Xinclude_template_typeq8Xzs_nooptq9Xinclude_task_sourceq:X P3,Flan2021q;Xremove_phi_eval_tasksq<‰Xcompute_strategyq=NX schedulerq>Xlinear_decay_with_warmupq?X checkpointq@NXcheckpoint_stepqANXbackbone_checkpointqBNXtrain_batch_sizeqCKXpredict_batch_sizeqDK X learning_rateqEG?3©*0U2aXwarmup_proportionqFG?®¸Që…¸Xtrainable_param_namesqGX .*lora_[ab].*qHXnon_trainable_param_namesqINX weight_decayqJGX adam_epsilonqKG>EyŽâ0Œ:X max_grad_normqLG?¹™™™™™šXgradient_accumulation_stepsqMKX optimizerqNXadamwqOXadafactor_scale_parameterqPˆXadafactor_warmup_initqQ‰Xadafactor_relative_stepqR‰Xnum_train_epochsqSKX warmup_stepsqTKX total_stepsqUM§Xnum_tasks_per_batchqVNX save_everyqWNX eval_everyqXK(XdebugqY‰XseedqZK*X subsample_devq[NXni_online_evalq\‰Xt0_online_evalq]‰Xearly_stop_on_zero_shotq^‰X ortho_lossq_GX task_lossq`GXl1_lossqaGXmi_lossqbGXmc_lossqcGX length_normqdGX unlikely_lossqeGXpoly_unlikely_lossqfGX finetune_typeqgNXfinetune_skip_esqh‰Xfinetune_use_last_checkpointqi‰XmodelqjXEleutherAI/gpt-neo-1.3BqkX model_familyqlXgptqmX precisionqnXbf16qoXmonitor_grad_alignment_onqpNXmodel_modifierqqXloraqrX adapter_typeqsNh KhG?©™™™™™šXlora_init_scaleqtG?„záG®{h K@X lora_warmupqu‰h‰Xn_skillsqvKXn_tasksqwNhh h h Xrouter_granularityqxX finegrainedqyXrouter_selectorqzXpoly_router_dirq{Xrouter_weight_decayq|NXrouter_learning_rateq}NXn_splitsq~KXrouter_selector_cluster_tempqG?ðXpoly_average_correctionq€‰Xpoly_use_shared_skillq‰Xmodule_logits_relaxed_bernoulliq‚ˆXmodule_logits_straight_throughqƒ‰Xmodule_logits_learning_rateq„G?¹™™™™™šXadapters_learning_rateq…NXadapters_weight_decayq†NXmodule_logits_dropoutq‡GXmodule_logits_l2_normqˆ‰X augment_mmluq‰‰Xsoft_prompt_lengthqŠK Xpatch_last_k_layersq‹JÿÿÿÿXsoft_prompt_mlp_dimqŒNXsoft_prompt_hidden_dimqNXsoft_prompt_learn_kvqމXprompt_placementqXprefixqXadd_routing_tokenq‘‰X load_in_8bitq’‰X tensorboardq“‰X hf_token_hubq”NX hf_lib_idq•NX hf_repo_idq–NXbaselineq—‰Xsparsityq˜GXsubsample_library_expertsq™KX ranker_top_kqšKX ranker_pathq›NX ranker_modelqœNhNXroutingqXsubjectqžXmmlu_test_splitqŸXtestq X load_moduleq¡NX module_graphq¢NXmicro_batch_sizeq£KXvalidation_portionq¤G?ž¸Që…¸Xsource_templateq¥NXaugment_few_shotq¦KXmoe_num_expertsq§KX moe_emb_dimq¨K€Xexpand_val_set_w_downstreamq©‰Xeval_mmlu_callbacks_everyqªKXeval_test_set_callback_everyq«KXeval_rougeL_callback_everyq¬KXtest_sets_callbacksq­]q®Xuse_custom_valid_callbackq¯‰Xmmlu_use_hard_promptq°NXeval_mmlu_few_shotq±ˆXeval_mmlu_flagq²‰X eval_metricq³XrougeLq´Xuse_vllmqµ‰Xreset_lrq¶‰X reset_optimq·‰Xpipeline_eval_tasksq¸Xpiqa,arc-easy,arc-challengeq¹Xsubsample_train_setqºG?àXactionq»Xrouteq¼Xinit_router_bestq½‰Xregularizer_factorq¾GXn_ng_iterationsq¿KXn_active_iterationsqÀKXnew_module_actionqÁXaddqÂX modules_dirqÃXamlt/qÄXfinetune_new_expertqʼnXexperiment_state_pathqÆNXevol_expert_routingqÇX fine_tuneqÈXskqÉJÿÿÿÿX retrieve_withqÊXnoneqËXdefault_new_task_retrieve_withqÌXrandomqÍXupload_lib_to_hubqΈX to_repo_idqÏX>ostapeno/indepexp_adauniNeo1B_high_school_psychology_sub05_3epqÐXreplace_to_repo_if_existsqшXtasks_to_carry_overqÒNXevolution_warmup_stepsqÓKXevol_n_eval_timesqÔK Xsubsample_ng_train_setqÕNXsimulate_normal_trainingqÖˆXforce_library_updateq׈Xhf_repo_id_baseqØNXhp_target_decorationqÙXqÚX hf_repo_queryqÛNXuse_only_modules_for_tasksq܉Xremove_tasks_from_libqÝ]qÞXretrieve_only_onceq߉Xmixed_evolver_scheduleqàX'fine_tune:none:2+sgd_full_ft:lora_sim:8qáXretriever_include_parentqâˆXlogging_prefixqãX!act_it_0/t_high_school_psychologyqäuhhhhhhhNhhhh h!h"h#Nh$Nh%‰h&Kdh'Kh(Nh)Nh*h+h,h-h.h/h0Mh1K@h2Kh3‰h4‰h5‰h6‰h7‰h8h9h:h;h<‰h=Nh>h?h@NhANhBNhCKhDK hEG?3©*0U2ahFG?®¸Që…¸hGhHhINhJGhKG>EyŽâ0Œ:hLG?¹™™™™™šhMKhNhOhPˆhQ‰hR‰hSKhTKhUM§hVNhWNhXK(hY‰hZK*h[Nh\‰h]‰h^‰h_Gh`GhaGhbGhcGhdGheGhfGhgNhh‰hi‰hjhkhlhmhnhohpNhqhrhsNh KhG?©™™™™™šhtG?„záG®{h K@hu‰h‰hvKhwNhh h h hxhyhzh{h|Nh}Nh~KhG?ðh€‰h‰h‚ˆhƒ‰h„G?¹™™™™™šh…Nh†Nh‡Ghˆ‰h‰‰hŠK h‹JÿÿÿÿhŒNhNhމhhh‘‰h’‰h“‰h”Nh•Nh–Nh—‰h˜Gh™KhšKh›NhœNhNhhžhŸh h¡Nh¢Nh£Kh¤G?ž¸Që…¸h¥Nh¦Kh§Kh¨K€h©‰hªKh«Kh¬Kh­h®h¯‰h°Nh±ˆh²‰h³h´hµ‰h¶‰h·‰h¸h¹hºG?àh»h¼h½‰h¾Gh¿KhÀKhÁhÂhÃhÄhʼnhÆNhÇhÈhÉJÿÿÿÿhÊhËhÌhÍhΈhÏhÐhшhÒNhÓKhÔK hÕNhÖˆh׈hØNhÙhÚhÛNh܉hÝhÞh߉hàháhâˆhãhäuXexpert_deletedqå‰u.PK2õQQPK0archive/byteorderFB,ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZlittlePK…=ãPK=archive/versionFB9ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ3 PKÑžgUPK2archive/.data/serialization_idFB.ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ0636457737946401051300000027857939901747PKÝq1I((PK2õQQarchive/data.pklPK…=ã¡archive/byteorderPKÑžgUarchive/versionPKÝq1I((’archive/.data/serialization_idPK,-8PK>PK8