a b*@sddlZddlZddlmZddlmZmZddlZddlZddl Z ddl Z ddl m mZddl m Z ddlmZmZddlmZddlmZdd lmZdd lmZeeZGd d d e jZd dZdS)N) OrderedDict)TupleUnion)nn)DropPath trunc_normal_)build_image_encoder)build_text_encoder)build_tokenizer)DEFAULT_TEMPLATEScseZdZedfdd ZddZdgdfdd Zejj d d Z e d d Z ddZ ddZdddZdddZddZZS) UniCLModel)configcs t|dd|_t|j|_t|j|j|d|_|dd}t|jdr\|jj}nLt 4|t dd t j dd}Wdn1s0Ytt |||_|dd |_t|j|_tt |jj||_tt g|_t|jd d t|jd d dS) NMODEL TEXT_ENCODERVERBOSEDIM_PROJECTIONdim_outrlast_hidden_state IMAGE_ENCODERg{Gz?)std)super__init__conf_lang_encoderr tokenizerr text_encoderhasattrrtorchno_gradzerostype LongTensorsizer Parameteremptytext_projectionZconf_image_encoderr image_encoderimage_projectionones logit_scaler)selfrZdim_projectionr __class__$/datadrive/UniCL-Demo/model/model.pyrs0      " zUniCLModel.__init__cCsi}|D]t\}}|dr4||d|dd<q |drT||d|dd<q |dkrf||d<q |d krx||d <q |||<q |S) Nzvisual.image_encoder.ztext. lang_encoder.Zvision_projectionr(r&)items startswith)r+ model_dictZmodel_dict_updatedkvr.r.r/_convert_old_weights7s     zUniCLModel._convert_old_weightsTc stj|s"td|ddStj|dd}td|||}| fdd| D}i}i}| D]`\}}| dd |vp|d d k} | rx| d r|||<qx|rtd |d ||||<qx|j |d g||j|dddS)Nz=> Pretrained model (z!) is not a file, skip init weightcpu) map_locationz=> Loading pretrained model cs"i|]\}}|vr||qSr.)keys).0r7r8r6r.r/ Ps z.UniCLModel.from_pretrained...r*r0z=> init z from F)strict)ospathisfileloggerwarningrloadinfor9 state_dictr4splitr5r'Zfrom_state_dictload_state_dict) r+ pretrainedZpretrained_layersverboseZpretrained_dictZneed_init_state_dictZimage_encoder_state_dictr7r8Z need_initr.r?r/from_pretrainedGs0       zUniCLModel.from_pretrainedcCs^dh}t|jdr0|jD]}|d|qt|jdrZ|jD]}|d|qF|S)Nr*no_weight_decayr2r0)rrrQaddr')r+rQr7r.r.r/rQgs  zUniCLModel.no_weight_decaycCs|jjSN)r*dtyper+r.r.r/rTtszUniCLModel.dtypecstdd}g}tD]lfdd|D}j|ddddd}fd d |D}|}|jd d }||}||qtj |d d }|S) Nrcsg|]}|qSr.formatr>templateclssr.r/ |z3UniCLModel.get_imnet_embeddings.. max_lengthTMptpaddingZ truncationr^Zreturn_tensorscs,i|]$\}}|tjr$|n|qSr.next parametersis_cudacudar>keyvalrUr.r/r@r]z3UniCLModel.get_imnet_embeddings..rdim) ZIMAGENET_DEFAULT_TEMPLATESZIMAGENET_CLASSESrr4 encode_textmeannormappendrstack)r+ templatesclss_embeddingstxtstokensclss_embeddingimnet_text_embeddingsr.r[r+r/get_imnet_embeddingsxs      zUniCLModel.get_imnet_embeddingscstdd}g}|D]lfdd|D}j|ddddd}fd d |D}|}|jd d }||}||qtj|d d }|S) Nrcsg|]}|qSr.rVrXrZr.r/r\r]z2UniCLModel.get_text_embeddings..r^Tr_r`racs,i|]$\}}|tjr$|n|qSr.rcrhrUr.r/r@r]z2UniCLModel.get_text_embeddings..rrk) r rr4rmrnrorprrq)r+textsrrrsrtrurvrwr.rxr/get_text_embeddingss      zUniCLModel.get_text_embeddingscCs0|j|}||j}|r,||jddd}|S)NTrlkeepdim)r'forward_featuresr(ro)r+imageroxr.r.r/ encode_images   zUniCLModel.encode_imagecCs|jfi|}|d}|jddkrL|t|d|djddf}n|dddf}||j}|r|||jddd }|S) Nr TOKENIZERclipr input_idsr|rkTr})rrraranger#argmaxr&ro)r+textrorr.r.r/rms& zUniCLModel.encode_textcCs(||}||}|j}|||fSrS)rrmr*exp)r+rrZfeatures_imageZ features_textTr.r.r/forwards   zUniCLModel.forward)T)T)__name__ __module__ __qualname__dictrr9rPrjitignorerQpropertyrTryr{rrmr __classcell__r.r.r,r/r s    r cKst|}|dddkr|dd}ddlm}m}||rtF}t|d}||||t ||dd|dWdq1s0Yn|||dd|d|S) Nr PRETRAINEDr:r) is_valid_url download_filez base_model.ptZPRETRAINED_LAYERSr) r Z Utils.UtilsrrtempfileTemporaryDirectorypathlibPathrPstr)rkwargsmodelZpretrained_pathrrZtmp_pathZfile_local_pathr.r.r/build_unicl_models   >r) rr collectionsrtypingrrloggingrDnumpynprZtorch.nn.functionalr functionalFZtimm.models.layersrrr'r rr r rrr getLoggerrrGModuler rr.r.r.r/s$       )