o g@sDddlZddlmZmZmZmZmZmZddlmZddl Z ddl m Z m Z ddl Z ddZ ddZGd d d ZGd d d eZGd ddeZedkreddddZedeeeZedeededZedeeZeZeeeeksJeD]ZeeeeesJqeddSdS)N)PreTrainedModelPretrainedConfig AutoModelAutoModelForCausalLMOPTForCausalLMBitsAndBytesConfig)nn)OptionalListcCst|}dg|}d}td|D]0}|dkr0||||kr0||d}|dkr0||||ks||||kr<|d7}|||<q|SNr)lenrange)pattern pattern_len prefix_suffixjir,/home/wyb/yanbin/ART_v1.0/modeling_crello.pykmp_preprocesss   rcCst|}t|}t|}g}d}t|D]?}|dkr4||||kr4||d}|dkr4||||ks"||||kr@|d7}||krS|||d||d}q|Sr )r rrappend)textrtext_lenrrmatchesrrrrr kmp_searchs    rc@sDeZdZddZddZeddZddZd d Z d d Z d S) ModelWrappercCs ||_dSNmodel)selfrrrr__init__- zModelWrapper.__init__cCs t|j|Sr)getattrr)r namerrr __getattr__0s zModelWrapper.__getattr__cCs ||Srr)r pixel_valuesrrr__call__3s zModelWrapper.__call__cCdSrrr rrreval7zModelWrapper.evalcCr(rrr)rrrtrain:r+zModelWrapper.traincCs |jSr)r parametersr)rrrr->r"zModelWrapper.parametersN) __name__ __module__ __qualname__r!r%torchno_gradr'r*r,r-rrrrr,s  rcs|eZdZdddgddddddd d d dfd ed ededeededededededededededeeffdd Z Z S)CrelloModelConfigi}Tzfacebook/opt-6.7bZ captioningF g?z.*\.(q_proj|v_proj)old_vocab_size vocab_size pad_token_id ignore_ids freeze_lm opt_versiontaskuse_lora lora_alphalora_r lora_dropoutlora_target_modules hidden_size load_in_4bitc stjdi||dksJd|dksJd||_||_||_||_||_||_||_| |_ | |_ | |_ | |_ | |_ ||_||_dS)Nrzold_vocab_size must be positivezvocab_size must be positiver)superr!r8r9r:r<r=r>r?r@rArBrCrDrEr;)r r8r9r:r;r<r=r>r?r@rArBrCrDrEkwargs __class__rrr!Cs" zCrelloModelConfig.__init__) r.r/r0intr boolstrfloatr r! __classcell__rrrHrr3BsX   r3csReZdZeZdZd ddZdeffdd Zdfdd Zd e j fd d Z Z S) CrelloModelTNcCs|jdSr)lmgradient_checkpointing_enable)r Zgradient_checkpointing_kwargsrrrrQrsz)CrelloModel.gradient_checkpointing_enableconfigc s\t|d}|j|_||_|j}td|dd|vr*t||_|jj j }n8|j rHtdt |j d}t tjdd}d |i}tj}n td d}d}d}tjd d d tjd|_|jj j}|jj j|j _||_|jjr|jtd|jD]} d| _q~ntd|jd |jj _td|j|j|j|j|_td|jdS)NZ%hf_kBlXvHRGTBgcTNmLZPcnTZVfcVtXvjcXaSzUsing z for the language model.z facebook/optz would load_in_4bit)rE LOCAL_RANKrz wouldn't load_in_4bitzWYBar/LLM_For_Layout_PlanningzMeta-Llama-3-8BT) subfoldertrust_remote_code torch_dtypezFreezing the LM.Fz no freeze lm, so to train lmz.resize token embeddings to match the tokenizerz-after token embeddings to match the tokenizer)rFr!r:argsr=printrfrom_pretrainedrPrRword_embed_proj_dimrErrJosenvirongetr1bfloat16rrDr<r*r- requires_gradr,gradient_checkpointingr9Zresize_token_embeddingsZget_input_embeddingsinput_embeddings) r rRuse_auth_tokenr=r[quantization_config local_rank device_maprWparamrHrrr!usV         zCrelloModel.__init__cs(tj|d|jjr|jdSdS)N)mode)rFr,rXr<rPr*)r rhrHrrr,szCrelloModel.trainlabelsc Cs|jd}|}||}|djddd}|jjD]}d|||k<q g}|D]-}t|D]&\} } | |j fvrJd|| d<| | n| t |dkrY| | dq3q-t ||ksiJt ||f|j ||dd } | ||fS) Nrr4r7)dimg?ir T)Z inputs_embedsrioutput_hidden_states) shapedetachclonerbsummeanrRr; enumerater:rr rP) r ri batch_sizeZ full_labelsZ input_embsZinput_embs_normZ ignore_idpad_idxlabelktokenoutputrrrforwards,        zCrelloModel.forwardr)T) r.r/r0r3 config_classZsupports_gradient_checkpointingrQr!r,r1 LongTensorrxrNrrrHrrOns HrO__main__iYiXiW)r9Zimage_reg_tokenZimage_gt_tokenzconfig: z model1: testz model2: zall parameters are equal) r1 transformersrrrrrrrr\typingr r rrrr3rOr.rRrYZmodel1save_pretrainedrZZmodel2 state_dictZ state_dict1Z state_dict2setkeysruequalrrrrs<  ,w