B 3gdì/ã@sjddlmZddlZddlmZddlmmZGdd„deƒZGdd„dej ƒZ Gdd „d ej ƒZ dS) é)Ú BertModeléNcs|eZdZddddgddddgdf‡fdd„ Zd d „Zd d „Zd d„Zdd„Zdd„Zdd„Z dd„Z dd„Z dd„Z ‡Z S)ÚMultiModalBertééé é rgc stƒ |¡||_||_||_‡fdd„tt|ƒƒDƒ}t ¡|_ t ¡|_ t ¡|_ x¶tdt|ƒƒD]¤}||}t d|ddd|||d} |j   | ¡t tjddddt ¡tjddddt ¡¡} tj | dj¡tj | dj¡|j   | ¡|j   t d¡¡qfWdS) Ncsg|]}ˆd|‘qS)é©)Ú.0Úi)Ú embed_dimr ú//cluster/home2/cyx/elia/bert/multimodal_bert.pyú sz+MultiModalBert.__init__..ri)Ú num_headsÚdropoutF)Úbiasr )ÚsuperÚ__init__Úpwam_idxÚnum_heads_fusionÚ fusion_dropÚrangeÚlenÚnnÚ ModuleListÚpwamsÚ res_gatesÚnormsÚPWAMÚappendÚ SequentialÚLinearÚReLUÚTanhÚinitÚzeros_ÚweightÚ LayerNorm) ÚselfÚconfigr rrrZ pwam_dimsr ÚdimÚfusionZres_gate)Ú __class__)r rr s6       zMultiModalBert.__init__cCsB| ¡}tj|tj|jd}| |||j¡}|j||d}||fS)N)ÚdtypeÚdevice)Ú input_idsÚtoken_type_ids)ÚsizeÚtorchÚzerosÚlongr/Zget_extended_attention_maskÚ embeddings)r)r0Úattention_maskÚ input_shaper1Úextended_attention_maskZembedding_outputr r rÚ forward_stem,s  zMultiModalBert.forward_stemcCs@x6td|jdƒD]"}|jj|}|||ƒ}|d}qW|dS)Nr)rrÚencoderÚlayer)r)Ú hidden_statesr7r Ú layer_moduleÚ layer_outputsr r rÚforward_stage18s  zMultiModalBert.forward_stage1cCsFxr?r r rÚforward_stage2Cs  zMultiModalBert.forward_stage2cCsFxr?r r rÚforward_stage3Ns  zMultiModalBert.forward_stage3cCsFxr?r r rÚforward_stage4Ys  zMultiModalBert.forward_stage4cCs:|jd|||ƒ}||jd|ƒ|}|jd|ƒ|fS)Nr)rrr)r)ÚxÚlÚl_maskÚ l_residualr r rÚ forward_pwam1dszMultiModalBert.forward_pwam1cCs:|jd|||ƒ}||jd|ƒ|}|jd|ƒ|fS)Nr)rrr)r)rDrErFrGr r rÚ forward_pwam2iszMultiModalBert.forward_pwam2cCs:|jd|||ƒ}||jd|ƒ|}|jd|ƒ|fS)Nr )rrr)r)rDrErFrGr r rÚ forward_pwam3nszMultiModalBert.forward_pwam3cCs:|jd|||ƒ}||jd|ƒ|}|jd|ƒ|fS)Nr)rrr)r)rDrErFrGr r rÚ forward_pwam4sszMultiModalBert.forward_pwam4)Ú__name__Ú __module__Ú __qualname__rr:r@rArBrCrHrIrJrKÚ __classcell__r r )r-rr s$"     rcs&eZdZd‡fdd„ Zdd„Z‡ZS)rrçcsptt|ƒ ¡t t ||¡t ¡t |¡¡|_t ||||||d|_ t t  ||dd¡t ¡t |¡¡|_ dS)N)Ú out_channelsrr) rrrrr!r"ÚGELUÚDropoutÚ vis_projectÚSpatialImageLanguageAttentionÚimage_lang_attÚConv1dÚ project_mm)r)r+Ú v_in_channelsÚ l_in_channelsÚ key_channelsÚvalue_channelsrr)r-r rrys z PWAM.__init__cCsX| |¡}| |||¡}| ddd¡}t | ddd¡|¡}| |¡}| ddd¡}|S)Nrr r)rTrVÚpermuter3ÚmulrX)r)rDrErFZvisÚlangÚmmr r rÚforward’s  z PWAM.forward)rrP)rLrMrNrrarOr r )r-rrxsrcs&eZdZd‡fdd„ Zdd„Z‡ZS)rUNrcsÜtt|ƒ ¡||_||_||_||_||_||_|dkrB|j|_t   t j |j|jddd¡|_ t   t j |j|jdddt   |j¡¡|_t   t j |j|jdddt   |j¡¡|_t   t j |j|jdddt   |j¡¡|_dS)Nr)Ú kernel_sizeÚstride)rrUrrYrZrQr[r\rrr!rWÚf_queryÚInstanceNorm1dÚf_keyÚf_valueÚW)r)rYrZr[r\rQr)r-r rr¯s(z&SpatialImageLanguageAttention.__init__c Cs|| d¡}| d¡| d¡}}| ddd¡}| ddd¡}|}| |¡}||}| ddd¡}| |¡}| |¡}| d¡} | ||j|j|j|¡}| ||j|j|j|¡}| || |j|j|j¡ dddd¡}|  d¡}t   ||¡} |jd| } | d|d} t j | dd} t   | | dddd¡¡} |  dddd¡ ¡ || |j¡} |  ddd¡} | | ¡} |  ddd¡} | S) Nrrr réÿÿÿÿgà¿gˆÃ@)r+)Úsqueezer2r]rdrfrgÚreshaperr[Ú unsqueezer3ÚmatmulÚFÚsoftmaxÚ contiguousr\rh) r)rDrErFÚBZHWÚqueryÚkeyÚvalueZn_lZsim_mapÚoutr r rraØs2     &    z%SpatialImageLanguageAttention.forward)Nr)rLrMrNrrarOr r )r-rrU®s)rU) Z modeling_bertrr3Útorch.nnrÚtorch.nn.functionalÚ functionalrnrÚModulerrUr r r rÚs  o6