o h2hf|@stddlZddlZddlmmZddlmZddlmmZ ddl m Z dgZ dZ GdddejZGdddejZGd d d ejZGd d d ejZGd ddejZGdddejZddZddZGdddejZGdddejZGdddejZGdddejZGdddejZGdd d ejZd!d"ZGd#d$d$ejZd+d(d)Z Gd*ddZ!dS),N) rearrange Wan2_2_VAEcs.eZdZdZfddZdfdd ZZS) CausalConv3dz Causal 3d convolusion. csPtj|i||jd|jd|jd|jdd|jddf|_d|_dS)Nrrrrr)super__init__padding_padding)selfargskwargs __class__4/home/ubuntu/wan22/wan2.2-main/wan/modules/vae2_2.pyr s  zCausalConv3d.__init__Ncslt|j}|dur*|jddkr*||j}tj||gdd}|d|jd8<t||}t |S)Nrrdim) listr todevicetorchcatshapeFpadrforward)r xcache_xr rrrr"s    zCausalConv3d.forwardN__name__ __module__ __qualname____doc__r r __classcell__rrrrrs  rcs&eZdZdfdd ZddZZS)RMS_normTFcsrt|s dnd}|r|g|Rn|f}||_|d|_tt||_|r4tt ||_ dSd|_ dS)N)rrr)rr?) rr channel_firstscalenn Parameterronesgammazerosbias)r rr+imagesr2broadcastable_dimsrrrrr /s   $zRMS_norm.__init__cCs*tj||jrdndd|j|j|jS)Nrr)r normalizer+r,r0r2r rrrrr9szRMS_norm.forwardTTFr#r$r%r rr'rrrrr(-s r(cseZdZfddZZS)Upsamplecst||S)zJ Fix bfloat16 support for nearest neighbor interpolation. )rrfloattype_asr7rrrr@szUpsample.forward)r#r$r%rr'rrrrr:>sr:cs<eZdZfddZddgfddZddZd d ZZS) Resamplec s|dvsJt||_||_|dkr)ttdddtj||ddd|_dS|d krLttdddtj||ddd|_t ||d d d d|_ dS|d krctt dtj||ddd|_dS|dkrtt dtj||ddd|_t ||d ddd|_ dSt |_dS)N)none upsample2d upsample3d downsample2d downsample3dr?)@rCz nearest-exact) scale_factormoderr r@r)rFrr)rrrrA)rrrr)rr)striderB)rrrr)rHr ) rr rrEr- Sequentialr:Conv2dresampler time_conv ZeroPad2dIdentity)r rrErrrr Is>          zResample.__init__Nrc Cs|\}}}}}|jdkr|dur|d} || dur)d|| <|dd7<n|ddddt dddddf} | jddkrs|| durs|| dkrstj|| dddddddddfd| j | gdd} | jddkr|| dur|| dkrtjt | | j | gdd} || dkr| |}n| ||| }| || <|dd7<| |d||||}t |dddddddddddf|dddddddddddffd}| |||d||}|jd}t|d }||}t|d |d }|jd krr|durr|d} || dur3||| <|dd7<|S|ddddddddddf} | t|| ddddddddddf|gd}| || <|dd7<|S) Nr@rReprrr5rrFb c t h w -> (b t) c h wz(b t) c h w -> b c t h wtrB)sizerECACHE_Tclonerrr unsqueezerr zeros_likerLreshapestackrrK) r r feat_cachefeat_idxbcrRhwidxr rrrrpsl  , ,   R      *4zResample.forwardc Cs|j}tj||\}}}}}t||}|} tj|| |j dddddddf<t ||_tj|j j dS)Nrr) weightdetachrUr-initzeros_rSreyedatar.r2) r conv conv_weightc1c2rRr^r_ one_matrix init_matrixrrr init_weights    zResample.init_weightc Cs|jj}tj||\}}}}}t |d|}||d|ddddddf<|||dddddddf<t ||_tj|j jdS)Nrr5r) rarfrbrUr-rcrdrSrrer.r2) r rgrhrirjrRr^r_rlrrr init_weight2s  zResample.init_weight2)r#r$r%r rrmrnr'rrrrr=Gs  '; r=cs.eZdZdfdd ZddgfddZZS) ResidualBlockr*cst||_||_tt|ddtt||dddt|ddtt |t||ddd|_ ||kr?t||d|_ dSt |_ dS)NFr3rFrrG) rr in_dimout_dimr-rIr(SiLUrDropoutresidualrNshortcut)r rqrrdropoutrrrr s"    zResidualBlock.__init__Nrc Cs||}|jD]k}t|tro|duro|d}|ddddt dddddf}|jddkr[||dur[tj||dddddddddf d |j |gdd}||||}|||<|dd7<q||}q||SNrrr5rr) rvru isinstancerrTrUrrrrVrr)r rrZr[r^layerr`r rrrrs&  ,, zResidualBlock.forward)r*r9rrrrrosrocs(eZdZdZfddZddZZS)AttentionBlockz3 Causal self-attention with a single head. csRt||_t||_t||dd|_t||d|_tj |jj dS)NrFr) rr rr(normr-rJto_qkvprojrcrdra)r rrrrr s  zAttentionBlock.__init__c Cs|}|\}}}}}t|d}||}||||d|ddddddjddd\}} } t || | }| dddd|||||}| |}t|d|d }||S) NrPrrFr5rrrz(b t) c h w-> b c t h wrQ) rSrr|r}rXpermute contiguouschunkrscaled_dot_product_attentionsqueezer~) r ridentityr\r]rRr^r_qkvrrrrs,   $ zAttentionBlock.forwardr"rrrrr{s  r{cCs\|dkr|S|dkrt|d||d}|S|dkr&t|d||d}|Std|j)Nrrz b c (h q) (w r) -> b (c r q) h wrrz$b c f (h q) (w r) -> b (c r q) f h wzInvalid input shape: )rr ValueErrorrr patch_sizerrrpatchifys    rcCsL|dkr|S|dkrt|d||d}|S|dkr$t|d||d}|S)Nrrz b (c r q) h w -> b c (h q) (w r)rrz$b (c r q) f h w -> b c f (h q) (w r))rrrrrr unpatchify+s  rcs6eZdZ dfdd ZdejdejfddZZS) AvgDown3Drcs`t||_||_||_||_|j|j|j|_||j|dks&J||j||_dSNr)rr in_channels out_channelsfactor_tfactor_sfactor group_sizer rrrrrrrr >s zAvgDown3D.__init__rreturnc Cs|j|jd|j|j}dddd|df}t||}|j\}}}}}|||||j|j||j|j||j|j}|dddddddd}||||j||j||j||j}|||j |j ||j||j||j}|j dd }|S) NrrrrFrrr) rrrrviewrrrrrrmean) r rpad_trBCTHWrrrrOs@   zAvgDown3D.forwardr)r#r$r%r rTensorrr'rrrrr<srcsBeZdZ d dedeffdd Zd dejdejfd d ZZS) DupUp3Drrrcs`t||_||_||_||_|j|j|j|_||j|dks&J||j||_dSr)rr rrrrrrepeatsrrrrr ts zDupUp3D.__init__Frrc Cs|j|jdd}||d|j|j|j|j|d|d|d}|ddddddd d}||d|j|d|j|d|j|d|j}|ri|dddd|jddddddf}|S) NrrrrrFrrrr) repeat_interleaverrrSrrrrr)r r first_chunkrrrrs, ,zDupUp3D.forwardrF) r#r$r%intr rrrr'rrrrrrs rcs2eZdZ  dfdd ZddgfddZZS) Down_ResidualBlockFc stt|||r dnd|rdndd|_g}t|D] }|t||||}q|r;|r0dnd} |t|| dtj ||_ dS)NrrrrrBrArE) rr r avg_shortcutrangeappendror=r-rI downsamples) r rqrrrwmulttemperal_downsample down_flagr_rErrrr s     zDown_ResidualBlock.__init__NrcCs.|}|jD]}||||}q|||Sr!)rUrr)r rrZr[x_copymodulerrrrs zDown_ResidualBlock.forwardFFr9rrrrrs rcs4eZdZ  dfdd ZddgdfddZZS) Up_ResidualBlockFc st|rt|||rdnd|rdndd|_nd|_g}t|D] }|t||||}q"|rA|r6dnd} |t|| dtj ||_ dS)Nrrrr@r?r) rr rrrrror=r-rI upsamples) r rqrrrwrtemperal_upsampleup_flagrrrErrrr s"      zUp_ResidualBlock.__init__NrcCsB|}|jD]}||||}q|jdur|||}||S|Sr!)rUrr)r rrZr[rx_mainr x_shortcutrrrrs   zUp_ResidualBlock.forwardrr9rrrrrs  rcsDeZdZddgddggddffdd Zd d gfd d ZZS) Encoder3drrrrrrr8r*c s2t|_||_||_||_||_||_fdddg|D}d} td|dddd|_ g} t t |dd |ddD])\} \} } | t |krQ|| nd }| t| | |||| t |dkd | d } qAtj| |_tt| | |t| t| | ||_tt| d d tt| |ddd|_dS)Ncg|]}|qSrr.0urrr z&Encoder3d.__init__..r? rrFrGr5F)rqrrrwrrrrCrp)rr rz_dimdim_multnum_res_blocks attn_scalesrrconv1 enumerateziplenrrr-rIrror{middler(rshead)r rrrrrrrwdimsr,rirqrr t_down_flagrrrr sL *      zEncoder3d.__init__Nrc Cs |dura|d}|ddddt dddddf}|jddkrL||durLtj||dddddddddfd|j|gdd}||||}|||<|dd7<n||}|j D]}|durv||||}qi||}qi|j D]}t |t r|dur||||}q~||}q~|j D]k}t |tr|dur|d}|ddddt dddddf}|jddkr||durtj||dddddddddfd|j|gdd}||||}|||<|dd7<q||}q|Srx)rTrUrrrrVrrrrrryrorr)r rrZr[r`r rzrrrr/sT,,      ,, zEncoder3d.forwardr9rrrrrs9rcsFeZdZddgddggddffdd Zd d gd fd d ZZS) Decoder3drrrrFTTr*c s`t|_||_||_||_||_||_fdd|dg|dddD}ddt|d} t ||dddd |_ t t |d|d|t|dt |d|d||_g} tt|dd|ddD]'\} \} } | t|kr||| nd }| t| | ||d|| t|dkd qlt j | |_t t| d d t t | d ddd |_dS)Ncrrrrrrrr}rz&Decoder3d.__init__..r5rrrrFrrGF)rqrrrwrrrrpr)rr rrrrrrrrrr-rIror{rrrrrrr(rsr)r rrrrrrrwrr,rrrqrr t_up_flagrrrr jsL & *   zDecoder3d.__init__NrFc Cs|dura|d}|ddddt dddddf}|jddkrL||durLtj||dddddddddfd|j|gdd}||||}|||<|dd7<n||}|j D]}t |t r{|dur{||||}qi||}qi|j D]}|dur|||||}q||}q|j D]m}t |tr|dur|d}|ddddt dddddf}|jddkr||durtj||dddddddddfd|j|gdd}||||}|||<|dd7<q||}q|Srx)rTrUrrrrVrrrrryrorrr)r rrZr[rr`r rzrrrrsT,,      ,, zDecoder3d.forwardr9rrrrrhs6rcCs(d}|D] }t|tr|d7}q|S)Nrr)modulesryr)modelcountmrrr count_conv3ds   rcspeZdZdddgddggddffdd Zd d gfd d ZddZddZddZdddZddZ Z S)WanVAE_rrr8r*c st||_||_||_||_||_||_|ddd|_t ||d||||j||_ t |d|dd|_ t ||d|_ t||||||j||_dS)Nr5rr)rr rrrrrrrrencoderrrconv2rdecoder) r rdec_dimrrrrrrwrrrr s8   zWanVAE_.__init__rrcCs |||}|||}||fSr!)encodedecode)r rr,mux_reconrrrr s  zWanVAE_.forwardc Cs\|t|dd}|jd}d|dd}t|D]S}dg|_|dkrA|j|ddddddddddf|j|jd}q|j|dddddd|ddd|ddddf|j|jd}t||gd}q| |j ddd\}} t |dtj r||d d|jddd|d d|jddd}n ||d|d}||S)NrrrrrrZr[r) clear_cacherrr _enc_conv_idxr _enc_feat_maprrrrryrrr) r rr,rRiter_routout_rlog_varrrrrs4   $8" zWanVAE_.encodec Cs4|t|dtjr'||dd|jddd|dd|jddd}n ||d|d}|jd}||}t|D]N}dg|_ |dkrh|j |dddd||dddddf|j |j dd}q?|j |dddd||dddddf|j |j d}t ||gd}q?t |dd}||S)NrrrT)rZr[rrr)rryrrrrrrr _conv_idxr _feat_maprr)r zr,rrrrrrrrr,s4"    (( zWanVAE_.decodecCs$td|}t|}|||S)Nr))rexp randn_like)r rrstdepsrrrreparameterizeIs  zWanVAE_.reparameterizeFcCs>||\}}|r |Std|dd}||t|S)Nr)g>g4@)rrrclampr)r imgs deterministicrrrrrrsampleNs zWanVAE_.samplecCsHt|j|_dg|_dg|j|_t|j|_dg|_dg|j|_dSr) rr _conv_numrrr _enc_conv_numrrr rrrrUs  zWanVAE_.clear_cacher) r#r$r%r rrrrrrr'rrrrrs * rrrcpuc Kst||gddggddd}|jd i|tdtd i|}Wdn1s.wYtd||jtj||dd d |S) Nrr)TTTr*)rrrrrrrwmetazloading ) map_locationT)assignr) dictupdaterrrlogginginfoload_state_dictload)pretrained_pathrrrrcfgrrrr _video_vae_s$ rc@s>eZdZdddgdgdejdfddZd d Zd d ZdS) r0rNrrcudac Csj||_||_tjgd||d}tjgd||d} |d| g|_t|||||dd||_ dS)N)0g_LͿg_Luga4g8gDioͿ'ѿg5;Nё?gI&†?g\C?gTR'g6?gX?gsA϶?gjt?gng333333?g\(\ǿg>W[̿g.1澿gBiޱgEJY?guV?g$~ʿg=U˿gc]F?gMʿg?ܵ|g[ Ac?g?߾?g g鷯?rgrh|gǺg!uqſggj+?gn4@?gfj+ҿg ʿg9vgx#?gŏ1w-!?g镲 ?g&†W?gHPsҿgY?g?߾g:MſgN@)dtyper)0g"~?g0*?gZӼ?g1?gqh?gV-?gͪ?g{/L ?g%u?g鷯?gec]?g镲 q?gx&1?g|?5^?g:M ?g$?gY8m?g_L?gE?gKY8?g7d?g3?goT?gQ?gW[?gZӼ?g9#J?gd]Fx?g{/L ?g.ryr TypeErrorampautocastrr r )r videoserrrrs  ( zWan2_2_VAE.encodec r)Nzzs should be a listrcs4g|]}j|djdddqS)rr5r)rrrVr,r;clamp_rrrrrrsz%Wan2_2_VAE.decode..r)r zsr!rrrrs  ( zWan2_2_VAE.decode)r#r$r%rr;r rrrrrrrxs  )Nrrr)"r rtorch.cuda.amprrtorch.nnr-torch.nn.functional functionalreinopsr__all__rTConv3drModuler(r:r=ror{rrrrrrrrrrrrrrrrs8   z-*6-(-tn