Mg0:ddlZddlZddlZddlmcmZddlmZddlmZddlm Z ddl m Z ddl m Z  ddlmZn #e$rdZYnwxYwgdZ dd lmZn3#e$r+ejd Gd d ejjZYnwxYwGd dejZGddejZGddejZGddejZGddejZGddejZdS)N) rearrange)nn)scaled_dot_product_attention)DropPath)apply_rotary_embfused_mlp_func)FFN SwiGLUFFNRMSNormAdaLNSelfCrossAttnAdaLNBeforeHead) FusedRMSNormzdimepsct||_tjt j||_dS)a Initialize the RMSNorm normalization layer. Args: dim (int): The dimension of the input tensor. eps (float, optional): A small value added to the denominator for numerical stability. Default is 1e-6. Attributes: eps (float): A small value added to the denominator for numerical stability. weight (nn.Parameter): Learnable scaling parameter. N)super__init__rr Parametertorchonesweight)selfrr __class__s 6/home/notantonvoron/switti_demo/models/basic_switti.pyrzRMSNorm.__init__s? GG     DH,uz#77DKKKc|tj|ddd|jzzS)z Apply the RMSNorm normalization to the input tensor. Args: x (torch.Tensor): The input tensor. Returns: torch.Tensor: The normalized tensor. T)keepdim)rrsqrtpowmeanrrxs r_normz RMSNorm._norm-s:u{15588==T=#B#BTX#MNNN Nrc|||}||jzS)z Forward pass through the RMSNorm layer. Args: x (torch.Tensor): The input tensor. Returns: torch.Tensor: The output tensor after applying RMSNorm. )r(floattype_asr)rr'outputs rforwardzRMSNorm.forward:s8ZZ **22155FDK' 'r)r) __name__ __module__ __qualname__intr*rr(r- __classcell__rs@rr r sq 8 8 8% 8 8 8 8 8 8" O O O ( ( ( ( ( ( (rr c:eZdZ dfd ZdZdefdZxZS) r NTc|t|rtnd|_|p|}|p|}tj|||_tjd|_tj|||_|dkrtj |dntj |_ dS)Ntanh) approximaterTinplace) rrr rLinearfc1GELUactfc2DropoutIdentitydrop)r in_featureshidden_features out_featuresrBfused_if_availablers rrz FFN.__init__Js 0BLnn#2{ )8[9[/::7v...9_l;;6:QhQBJtT2222BKMM rcl|j`||||jj|jj|jj|jjd|jdddd S|||||S)N gelu_approxFr) r'weight1weight2bias1bias2 activation save_pre_actreturn_residualcheckpoint_lvl heuristic process_group)r rBr<rr?biastrainingr>r&s rr-z FFN.forward[s   >99## HO HO(-(-,!%$)#$"&$    99TXXdhhtxx{{&;&;<<== =rreturncd|jduSNzfused_mlp_func=rrs r extra_reprzFFN.extra_reproB!4D!@BBBr)NNr5T)r.r/r0rr-strrYr2r3s@rr r Is  RRRRRR">>>(CCCCCCCCCCrr cxeZdZ d dedeffd ZdZdejdejfdZ d ejfd Z d e fd Z xZ S)r UUUUUU@rff_multcJtt||z}tj||d|_tj||d|_tj||d|_d|_| dS)z Initialize the FeedForward module. Args: dim (int): Input dimension. ff_mult (float, optional): Custom multiplier for hidden dimension. Defaults to 4. FrSN) rrr1rr;up_proj down_proj gate_projr _init)rrr^ hidden_dimrs rrzSwiGLUFFN.__init__ts w'' yju=== :s???3 ???" rc|D]k}t|tjrOtj|j|j$tj|jldSN) modules isinstancerr;initxavier_uniform_rrSzeros_)rmodules rrdzSwiGLUFFN._initspllnn 0 0F&"),, 0'' 666;0GNN6;///  0 0rx_gatex_upc0tj||zSrg)Fsilu)rrnros r_forward_silu_gatingzSwiGLUFFN._forward_silu_gatingsvf~~$$rr'c||||||Srg)rbrsrcrar&s rr-zSwiGLUFFN.forwardsA~~  % %dnnQ&7&7a I I   rrUcd|jduSrWrrXs rrYzSwiGLUFFN.extra_reprrZr)r])r.r/r0r1r*rrdrTensorrsr-r[rYr2r3s@rr r ss *000%5<%u|%%%%     CCCCCCCCCCrr c ZeZdZ ddededed ed ed ef fd Zd efdZddZxZ S)CrossAttention r5F embed_dim context_dim num_heads attn_drop proj_dropqk_normct||zdksJ|dksJ|||zc|_|_||_dt j|jz |_tj |dd|_ tj |dd|_ tj ||d|_ tj ||d zd|_tj |||_|dkrtj|d ntj|_||_d \|_|_|_dS) Nrr5rFrelementwise_affineTr`r r9FNN)rrr~head_dimrmathsqrtscaler LayerNormq_normk_normr;to_qto_kvprojr@rArrcachingcached_kcached_v)rr|r}r~rrrrs rrzCrossAttention.__init__sC 9$))))C   " &  4=111 l9$5QQQ l9$5QQQ Ii>>> Y{IMEEE Ii33 3 b 1 1 jr5querykeyvaluer attn_mask dropout_p)shaperviewrrr~rpermuterrunbindrrrrrTrrr transposereshaperr)rr'contextcontext_attn_bias freqs_cisBLC context_B context_L context_Cqkvkvrouts rr-zCrossAttention.forwards.'1a*1-' 9iI~ IIaLL  aB ' ' <  AA FF1a 7 7 IIaAq ! ! = G$$))!Y2>>B::aAq))00Q077DAq| #KKNNq)T^T]CCA !Q1%%Aq)T^T]CCA !Q1%%A| " ! !  A A  O )*;=M N N &*m" C3rs  <<<<<<######((((((9999999NNN S R R0(:::::::.(.(.(HMPQQQ+(+(+(+(+(%(/+(+(+(+(+(.(b'C'C'C'C'C")'C'C'CT'C'C'C'C'C 'C'C'CTY.Y.Y.Y.Y.RYY.Y.Y.xT.T.T.T.T.BIT.T.T.nqqqqqqqqh D D D D Dbi D D D D Ds!<AAA-BB