o HQg' @sddlZddlmZddlmZmZmZddlZddlm Z ddl m m Z ddl mZddlmZmZmZddlmZmZddejde jd e jd ejfd d ZGd dde jZGddde jZGddde jZdS)N)partial)ListTupleUnion) g_pathmgr) PatchEmbedwindow_partitionwindow_unpartition)DropPathMLPxpoolnormreturncCsD|dur|S|dddd}||}|dddd}|r ||}|S)Nr)permute)r r rrT/mnt/petrelfs/dingshuangrui/SAM2-Video-Predictor/sam2/modeling/backbones/hieradet.pydo_poolsrc sJeZdZ d dedededejffdd Zdejd ejfd d Z Z S) MultiScaleAttentionNdimdim_out num_headsq_poolcsFt||_||_||_||_t||d|_t|||_ dS)Nr) super__init__rrrrnnLinearqkvproj)selfrrrr __class__rrr(s zMultiScaleAttention.__init__r rc Cs|j\}}}}|||||d|jd}t|d\}}} |jrBt||||d|j}|jdd\}}|||||jd}t | dd| dd| dd}| dd}||||d}| |}|S)Nrrr) shaper reshapertorchunbindrrFscaled_dot_product_attention transposer!) r"r BHW_r qkvrrrforward8s      zMultiScaleAttention.forwardN) __name__ __module__ __qualname__intrModulerr(Tensorr4 __classcell__rrr#rr'srcs|eZdZddddejdfdededed ed ed eeje fd e eefd ejdeffdd Z de j de j fddZZS)MultiScaleBlockg@ LayerNormNrrrr mlp_ratio drop_path norm_layerq_stride act_layer window_sizec stt|trttt|dd}||_||_|||_ | |_ d||_ |_ |j r4tj ||dd|_ t||||j d|_|dkrFt|nt|_|||_t|t|||d|d|_||krkt|||_dSdS) Ngư>)epsF) kernel_sizestride ceil_mode)rrr>r) num_layers activation)rr isinstancestrrgetattrrrrnorm1rEr rC MaxPool2drattnr IdentityrAnorm2r r9mlprr!) r"rrrr@rArBrCrDrEr#rrrUs<    zMultiScaleBlock.__init__r rc Cs|}||}|j|jkrt|||j}|j}|dkr/|jd|jd}}t||\}}| |}|j r`|j|j d}|jdd\}}||||}||||}||||f}|jdkrnt |||||f}|| |}|| | ||}|S)Nrrrr)rOrrrr!r rEr&rrQrCr rArTrS) r"r shortcutrEr.r/Zpad_hwZpad_hZpad_wrrrr4s(    zMultiScaleBlock.forward)r6r7r8rGELUr9floatrr:rMrrr(r;r4r<rrr#rr=Ts6    1r=cseZdZdZ         d'dededededeeefdeedfdededeeefdeedfdeedfffdd Zdeeefdej fddZ d ej de ej fd!d"Z d#d$Z defd%d&ZZS)(Hieraz5 Reference: https://arxiv.org/abs/2306.00989 `rr>rrrrrr@r_r_ r\NT embed_dimrdrop_path_raterrCstages.dim_mulhead_mul!window_pos_embed_bkg_spatial_size window_specglobal_att_blocksc s^ttt| ksJ| _t}|_fddtdtdD_d|krs z"Hiera.__init__..rrr%cSsg|]}|dqS)rrrqr rrrrss)rgcSsg|]}|qSr)itemrtrrrrss)rrrrArCrEcsg|]}j|jqSr)blocksrrpr"rrrsrbcpu) map_locationz loading HieraF)strict)"rrlenrmrorCrange stage_endsZ q_pool_blocksreturn_interm_layersr patch_embedrnrlr Parameterr(zeros pos_embedpos_embed_windowlinspace ModuleListrvr9r=appendr channel_listropenloadlogginginfoload_state_dict)r"rgrrhrrCrirjrkrlrmrnZ weights_pathrdepthZdprZ cur_stagerrrrEblockfZchkptr#)r"rirrsl "*"         zHiera.__init__hwrcCsZ|\}}|j}tj|j||fdd}||ddt|j|jD}|dddd}|S) Nbicubic)sizemodecSsg|]\}}||qSrr)rqr yrrrrsrxz(Hiera._get_pos_embed..rrrr)rr* interpolatertilezipr&r)r"rhwZ window_embedrrrr_get_pos_embedszHiera._get_pos_embedr cCs~||}|||jdd}g}t|jD]$\}}||}||jdks/||jvr<|jr<|dddd}||q|S)Nrrr%rr) rrr& enumeratervrrrr)r"r outputsrrblkfeatsrrrr4s  z Hiera.forwardcCsx|}|ddkr|dS|ddkrdS|ddkr!dS|ddkr8t|dddddS|dS) NZrel_posr%rrrrrv.)get_num_layersfindr9split)r"Z layer_namerJrrr get_layer_id-s zHiera.get_layer_idcCs t|jSr5)r}rvrwrrrr<s zHiera.get_num_layers) rYrr>rrZr[r]r]r^r`rdNT)r6r7r8__doc__r9rWrrr(r;rrr4rrr<rrr#rrXsT      c rXr5)r functoolsrtypingrrrr(torch.nnrZtorch.nn.functional functionalr*iopath.common.file_iorZsam2.modeling.backbones.utilsrrr sam2.modeling.sam2_utilsr r r;r:rrr=rXrrrrs   $-U