o HQg @s\dZddlmZddlZddlmZddlmmZddZ ddZ Gdd d ej Z dS) z9Some utilities for backbones, in particular for windowing)TupleNc Cs|j\}}}}||||}||||}|dks|dkr+t|ddd|d|f}||||}} |||||| |||}|ddddddd|||} | || ffS)aT Partition into non-overlapping windows with padding if needed. Args: x (tensor): input tokens with [B, H, W, C]. window_size (int): window size. Returns: windows: windows after partition with [B * num_windows, window_size, window_size, C]. (Hp, Wp): padded height and width before partition r)shapeFpadviewpermute contiguous) x window_sizeBHWCpad_hpad_wHpWpwindowsrQ/mnt/petrelfs/dingshuangrui/SAM2-Video-Predictor/sam2/modeling/backbones/utils.pywindow_partitions " rc Cs|\}}|\}}|jd||||}||||||||d} | dddddd|||d} ||ks=||krO| ddd|d|ddf} | S) a Window unpartition into original sequences and removing padding. Args: x (tensor): input tokens with [B * num_windows, window_size, window_size, C]. window_size (int): window size. pad_hw (Tuple): padded height and width (Hp, Wp). hw (Tuple): original height and width (H, W) before padding. Returns: x: unpartitioned sequences with [B, H, W, C]. rrrrrrrN)r r r r) rrpad_hwhwrrrrrrrrrwindow_unpartition)s $$rc speZdZdZ     ddeedfd eedfd eedfd ed ef fd d ZdejdejfddZ Z S) PatchEmbedz# Image to Patch Embedding. r"rrrrr kernel_size.stridepaddingin_chans embed_dimcs$ttj|||||d|_dS)ab Args: kernel_size (Tuple): kernel size of the projection layer. stride (Tuple): stride of the projection layer. padding (Tuple): padding size of the projection layer. in_chans (int): Number of input image channels. embed_dim (int): embed_dim (int): Patch embedding dimension. )r&r'r(N)super__init__nnConv2dproj)selfr&r'r(r)r* __class__rrr,Fs   zPatchEmbed.__init__rreturncCs||}|dddd}|S)Nrrrr)r/r )r0rrrrforward[s zPatchEmbed.forward)r!r#r$rr%) __name__ __module__ __qualname____doc__rintr,torchTensorr4 __classcell__rrr1rr As&   r ) r8typingrr:torch.nnr-torch.nn.functional functionalr rrModuler rrrrs