PKarchive/data.pklFBZZZZZZZZZZZZZZ€}q(X expert_nameqXc5o25_5eqXexpert_task_nameqXJgem_dart_1_1_0,gem_common_gen_1_1_0,gem_web_nlg_en_1_1_0,gem_e2e_nlg_1_1_0qX parent_nodeqNX expert_configq}q(Xmodify_modulesqX.*q X modify_layersq Xq_proj|k_proj|v_proj|o_projq X lora_rankq KX lora_alphaq G?ðX lora_dropoutqG?©™™™™™šXlora_init_b_randomq‰X__model_modifier__qXloraquXtraining_configq}q(X_updated_kwargsq}q(X filenamesqX"configs/wiki-mmlu/stablm_flan.jsonqX cache_dirqX./cacheqXdatasetqXsordonia/flan-10k-flatqXcustom_tasks_splitsqNXdata_dirqX~/data/qX output_dirqX_/mnt/output/projects/knowledge_modules/amlt-results/HD_659152ab-50d5-46b3-a8bf-5e5200a145dc_20/q Xfinetune_task_nameq!hXexample_to_ids_pathq"NXembeddings_pathq#NXuse_task_descriptionsq$‰Xmax_num_instances_per_taskq%KdXnum_pos_examplesq&KX task_prefixq'NXexp_nameq(X6stablelm_jointc_lora_embed_randclustersc5o25_5e_3epochq)X wandb_projectq*Xadauno_experts_stablelmq+X padding_sideq,Xrightq-Xtruncation_sideq.Xleftq/Xmax_input_lengthq0MXmax_output_lengthq1K@X num_beamsq2KXappend_another_bosq3‰X do_lowercaseq4‰X freeze_embedsq5‰Xuse_t0_templates_as_tasksq6‰Xuse_t0_few_shot_training_setq7‰Xinclude_template_typeq8Xzs_nooptq9Xinclude_task_sourceq:XP3,Flan2021,CoTq;Xremove_phi_eval_tasksq<‰Xcompute_strategyq=NX schedulerq>Xlinear_decay_with_warmupq?X checkpointq@NXcheckpoint_stepqANXbackbone_checkpointqBNXtrain_batch_sizeqCKXpredict_batch_sizeqDK X learning_rateqEG?3©*0U2aXwarmup_proportionqFG?®¸Që…¸Xtrainable_param_namesqGX .*lora_[ab].*qHXnon_trainable_param_namesqINX weight_decayqJGX adam_epsilonqKG>EyŽâ0Œ:X max_grad_normqLG?¹™™™™™šXgradient_accumulation_stepsqMKX optimizerqNXadamwqOXadafactor_scale_parameterqPˆXadafactor_warmup_initqQ‰Xadafactor_relative_stepqR‰Xnum_train_epochsqSKX warmup_stepsqTJÿÿÿÿX total_stepsqUJÿÿÿÿXnum_tasks_per_batchqVNX save_everyqWNX eval_everyqXNXeval_every_n_epochqYNXdebugqZ‰Xseedq[K*Xsubsample_trainq\NX subsample_devq]NXsubsample_testq^NXni_online_evalq_‰Xt0_online_evalq`‰Xearly_stop_on_zero_shotqa‰X ortho_lossqbGX task_lossqcGXl1_lossqdGXmi_lossqeGXmc_lossqfGX length_normqgGX unlikely_lossqhGXpoly_unlikely_lossqiGX finetune_typeqjNXfinetune_skip_esqk‰Xfinetune_use_last_checkpointql‰XmodelqmXstabilityai/stablelm-3b-4e1tqnX model_familyqoXgptqpX precisionqqXbf16qrXmonitor_grad_alignment_onqsNXmodel_modifierqtXloraquX adapter_typeqvNh KhG?©™™™™™šXlora_init_scaleqwG?„záG®{h G?ðX lora_warmupqx‰h‰Xn_skillsqyKXn_tasksqzKhh h h Xrouter_granularityq{X finegrainedq|Xrouter_selectorq}NXrouter_weight_decayq~NXrouter_learning_rateqNXn_splitsq€KXrouter_selector_cluster_tempqG?ðXpoly_average_correctionq‚‰Xpoly_use_shared_skillqƒ‰Xskip_unseen_tokensq„ˆXmodule_logits_relaxed_bernoulliq…ˆXmodule_logits_straight_throughq†‰Xmodule_logits_learning_rateq‡G?¹™™™™™šXadapters_learning_rateqˆNXadapters_weight_decayq‰NXmodule_logits_dropoutqŠGXmodule_logits_l2_normq‹‰X augment_mmluqŒ‰Xsoft_prompt_lengthqK Xpatch_last_k_layersqŽJÿÿÿÿXsoft_prompt_mlp_dimqNXsoft_prompt_hidden_dimqNXsoft_prompt_learn_kvq‘‰Xprompt_placementq’Xprefixq“Xadd_routing_tokenq”‰X load_in_8bitq•‰X tensorboardq–‰X hf_token_hubq—X%hf_chcaIYlOmgtKvbBMceSgwbCRScrCNrxCBxq˜X hf_lib_idq™X-ostapeno/library-stablelm-25-experts-flan_3epqšX hf_repo_idq›NXdo_trainqœˆXbaselineq‰XsparsityqžGXsubsample_library_expertsqŸKX ranker_top_kq KX ranker_pathq¡NX ranker_modelq¢NhhXroutingq£Xsubjectq¤Xmmlu_test_splitq¥Xtestq¦X load_moduleq§NX module_graphq¨NXmicro_batch_sizeq©KXvalidation_portionqªG?ž¸Që…¸Xuse_instruct_templateq«‰Xsource_templateq¬NXaugment_few_shotq­KXmoe_num_expertsq®KX moe_emb_dimq¯K€X moe_rkhs_dimq°MX moe_ent_regq±GXmoe_ent_free_bitsq²GX moe_top_kq³JÿÿÿÿXexpand_val_set_w_downstreamq´‰Xeval_mmlu_callbacks_everyqµKXeval_test_set_callback_everyq¶KXeval_rougeL_callback_everyq·KXtest_sets_callbacksq¸]q¹Xuse_custom_valid_callbackqº‰Xmmlu_use_hard_promptq»NXeval_mmlu_few_shotq¼ˆXeval_mmlu_flagq½‰Xeval_rouge_flagq¾‰Xpipeline_eval_tasksq¿XallqÀX eval_metricqÁXlossqÂXuse_vllmqÉXreset_lrqĉX reset_optimqʼnX hf_repo_queryqÆNXskqÇKXfinetune_regimeqÈNX tasksets_pathqÉXtask_sets/stablm.jsonqÊXlibrary_to_expert_transformqËNXeval_before_trainingq̈Xremove_expertsqÍNXcreate_transfer_matrixqΈXrouge_every_opt_stepqÏKX es_metricqÐhÂXn_ng_iterationsqÑKX vocab_sizeqÒM€ÄuhhhhhhhNhhhh h!hh"Nh#Nh$‰h%Kdh&Kh'Nh(h)h*h+h,h-h.h/h0Mh1K@h2Kh3‰h4‰h5‰h6‰h7‰h8h9h:h;h<‰h=Nh>h?h@NhANhBNhCKhDK hEG?3©*0U2ahFG?®¸Që…¸hGhHhINhJGhKG>EyŽâ0Œ:hLG?¹™™™™™šhMKhNhOhPˆhQ‰hR‰hSKhTJÿÿÿÿhUJÿÿÿÿhVNhWNhXNhYNhZ‰h[K*h\Nh]Nh^Nh_‰h`‰ha‰hbGhcGhdGheGhfGhgGhhGhiGhjNhk‰hl‰hmhnhohphqhrhsNhthuhvNh KhG?©™™™™™šhwG?„záG®{h G?ðhx‰h‰hyKhzKhh h h h{h|h}Nh~NhNh€KhG?ðh‚‰hƒ‰h„ˆh…ˆh†‰h‡G?¹™™™™™šhˆNh‰NhŠGh‹‰hŒ‰hK hŽJÿÿÿÿhNhNh‘‰h’h“h”‰h•‰h–‰h—h˜h™hšh›Nhœˆh‰hžGhŸKh Kh¡Nh¢Nhhh£h¤h¥h¦h§Nh¨Nh©KhªG?ž¸Që…¸h«‰h¬Nh­Kh®Kh¯K€h°Mh±Gh²Gh³Jÿÿÿÿh´‰hµKh¶Kh·Kh¸h¹hº‰h»Nh¼ˆh½‰h¾‰h¿hÀhÁhÂhÉhĉhʼnhÆNhÇKhÈNhÉhÊhËNḧhÍNhΈhÏKhÐhÂhÑKhÒM€ÄuXexpert_deletedqÓ‰u.PK$v´>>PKCarchive/byteorderFB?ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZlittlePK…=ãPK=archive/versionFB9ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ3 PKÑžgUPK2archive/.data/serialization_idFB.ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ0636457737946401051300000023108881974876PK-˜ ?((PK$v´>>archive/data.pklPK…=ãŽarchive/byteorderPKÑžgUarchive/versionPK-˜ ?((’archive/.data/serialization_idPK,-8PK>PK8