g>ddlZddlmcmZddlmZddlmZmZmZm Z ddl m Z m Z m Z ddlmZmZddlmZddlmZddlmZmZmZmZmZddlmZddlZdd lmZmZd Z Gd d ejjBZ"Gd dejjBZ#GddejjBZ$GddejjBZ%GddejjBZ&GddejjBZ'GddejjBZ(GddejBZ)GddejBZ*dZ+dZ,dZ-Gd d!ejBZ.y)"N)Conv1dConvTranspose1d AvgPool1dConv2d) weight_normremove_weight_norm spectral_norm) init_weights get_padding)MossFormer_MaskNet)Snake1d)OptionalListUnionDictTuple)AttrDict) SpectrogramResample皙?c,eZdZdfd ZdZdZxZS) ResBlock1ctt| ||_t j t t|||d|dt||dt t|||d|dt||dt t|||d|dt||dg|_ |jjtt j t|t|t|g|_ t j t t|||ddt|dt t|||ddt|dt t|||ddt|dg|_|jjtt j t|t|t|g|_y)Nrdilationpadding)superr__init__hnn ModuleListrrr convs1applyr r convs1_activatesconvs2convs2_activatesselfr!channels kernel_sizer __class__s l/Users/zhaoshengkui/Downloads/github/ClearerVoice-Studio/clearvoice_super/models/mossformer2_sr/generator.pyr zResBlock1.__init__s i')mm x;HUVK'2; 'LN O x;HUVK'2; 'LN O x;HUVK'2; 'LN O %    ,' " X  X  X / !  mm x;A'2;'BD E x;A'2;'BD E x;A'2;'BD E %    ,' " X  X  X / ! ct|j|j|j|jD],\}}}}||}||}||}||}||z}.|SN)zipr$r'r&r()r*xc1c2act1act2xts r.forwardzResBlock1.forward5sp"%dkk4;;@U@UW[WlWl"m  BD$aBBBbBBBQA r/ct|jD] }t||jD] }t|yr1)r$rr'r*ls r.rzResBlock1.remove_weight_normAs8 "A q ! " "A q ! "r/))rr=__name__ __module__ __qualname__r r9r __classcell__r-s@r.rrs! F "r/rc,eZdZdfd ZdZdZxZS) ResBlock2ctt| ||_t j t t|||d|dt||dt t|||d|dt||dg|_ |jjtt j t|t|g|_ y)Nrrr)rrFr r!r"r#rrr convsr%r r convs_activatesr)s r.r zResBlock2.__init__Is i')]] x;HUVK'2; 'LN O x;HUVK'2; 'LN O$    &!}} X  X .  r/ct|j|jD]\}}||}||}||z}|Sr1)r2rHrI)r*r3cactr8s r.r9zResBlock2.forwardYsH$**d&:&:; FAsQB2BQA   r/c<|jD] }t|yr1)rHrr;s r.rzResBlock2.remove_weight_normas "A q ! "r/)r=)rr=r?rDs@r.rFrFHs "r/rFc*eZdZfdZdZdZxZS) Generatorc :tt| ||_t |j |_t |j|_ttd|jddd|_ |jdk(rtnt}t!j"|_t!j"|_t)t+|j|j,D]\}\}}|j&j/t1|jd|zz|j$j/tt3|jd|zz|jd|dzzz||||z dzt!j"|_t7t |j$D]o}|jd|dzzz}t)t+|j |j8D],\}\}}|j4j/|||||.qt1|_tt|dddd|_|j$j?t@|j<j?t@y)NPrr=r1r)!rrOr r!lenresblock_kernel_sizes num_kernelsupsample_rates num_upsamplesrrupsample_initial_channelconv_preresblockrrFr"r#upssnakes enumerater2upsample_kernel_sizesappendr r resblocksrangeresblock_dilation_sizes snake_post conv_postr%r ) r*r!r\iukchjdr-s r.r zGenerator.__init__gs i')q667 !1!12#F2q/I/I1aYZ$[\ ! c 19y==?mmo "3q'7'79P9P#QR :IAv1 KK  wq'A'AAqD'IJ K HHOOK : :QT BAD^D^abefghehaiDj !1qsQh89 : : s488}% =A++a!A#h7B&s1+B+BAD]D]'^_ = 6Aq%%hq"a&;< = = ""+$VB1a%CD |$ \*r/c|j|}t|jD]}|j||}|j||}d}t|j D]R}|&|j ||j z|z|}+||j ||j z|z|z }T||j z }|j|}|j|}tj|}|Sr1) r[rcrYr^r]rWrbrerftorchtanh)r*r3rgxsrks r.r9zGenerator.forwards  MM! t))* &A Aq!A AAB4++, B:=$*:*:(:1(<=a@B>$..4+;+;);A)=>qAAB  B T%%%A & OOA  NN1  JJqMr/c|jD] }t||jD]}|jt|jt|jyr1)r]rrbr[rfr;s r.rzGenerator.remove_weight_normsT "A q ! " #A " #4==)4>>*r/r?rDs@r.rOrOfs+4*+r/rOc&eZdZdfd ZdZxZS)DiscriminatorPctt| ||_|dk(rtnt }t j|tdd|df|dftdddf|tdd|df|dftdddf|tdd|df|dftdddf|tdd |df|dftdddf|td d |dfdd g|_ |td dd dd |_ y) NFr r>rrS)rr)r=r)rr) rrsr periodrr r"r#rr rHrf)r*ryr,strideuse_spectral_normnorm_fr-s r.r zDiscriminatorP.__init__s# nd,. 1U : ]] 6!R+q!1FA;UVXYIZ\]H^_ ` 6"cK#3fa[;WXZ[K\^_J`a b 6#s[!$4vqkKXY[\L]_`Kab c 6#tk1%5{[YZ\]M^`aLbc d 6${A&66J K $    tQ6 JKr/cg}|j\}}}||jzdk7r:|j||jzz }tj|d|fd}||z}|j ||||jz|j}|j D]5}||}tj |t}|j|7|j|}|j|tj|dd}||fS)Nrreflectr) shaperyFpadviewrH leaky_relu LRELU_SLOPErarfrnflatten)r*r3fmapbrKtn_padr<s r.r9zDiscriminatorP.forwards''1a t{{?a KK1t{{?3Ea!UY/AE A FF1adkk)4;; 7 A!A Q ,A KKN  NN1  A MM!Q #$wr/)r>r=Fr@rArBr r9rCrDs@r.rsrss Lr/rsc$eZdZfdZdZxZS)MultiPeriodDiscriminatorc tt| tjt dt dt dt dt dg|_y)Nrr=r>rR )rrr r"r#rsdiscriminatorsr*r-s r.r z!MultiPeriodDiscriminator.__init__sO &68 mm 1  1  1  1  2  -  r/c g}g}g}g}t|jD]_\}}||\} } ||\} } |j| |j| |j| |j| a||||fSr1)r_rra r*yy_haty_d_rsy_d_gsfmap_rsfmap_gsrgrly_d_rfmap_ry_d_gfmap_gs r.r9z MultiPeriodDiscriminator.forwardsd112 #DAqaDME6eHME6 MM% NN6 " MM% NN6 "  #vw//r/rrDs@r.rrs   0r/rc&eZdZdfd ZdZxZS)DiscriminatorSctt| |dk(rtnt}t j |tddddd|tddddd d |tdd ddd d |td ddd d d |tdddd d d |tddddd d |tdddddg|_|tddddd|_ y)NFrrvrRrS)r)groupsrrwrxr>r=) rrr rr r"r#rrHrf)r*r{r|r-s r.r zDiscriminatorS.__init__s nd,. 1U : ]] 6!S"a3 4 6#sB!R@ A 6#sB"bA B 6#sB"bA B 6#tR2rB C 6$b!BC D 6$aA6 7$    tQ1a @Ar/cg}|jD]5}||}tj|t}|j |7|j |}|j |t j|dd}||fS)Nrr)rHrrrrarfrnr)r*r3rr<s r.r9zDiscriminatorS.forwardst A!A Q ,A KKN  NN1  A MM!Q #$wr/)FrrDs@r.rrs B r/rc$eZdZfdZdZxZS)MultiScaleDiscriminatorctt| tjt dt t g|_tjtdddtdddg|_y)NT)r{rrrS) rrr r"r#rrr meanpoolsrs r.r z MultiScaleDiscriminator.__init__sj %t57 mm T 2    -    aA & aA &(  r/cvg}g}g}g}t|jD]\}}|dk7r0|j|dz |}|j|dz |}||\} } ||\} } |j| |j| |j| |j| ||||fS)Nrr)r_rrrars r.r9zMultiScaleDiscriminator.forwardsd112 #DAqAv'DNN1Q3'*+qs+E2aDME6eHME6 MM% NN6 " MM% NN6 " #vw//r/rrDs@r.rrs   0r/rc eZdZ d dedededeeeefdfffd Zdejde ejfd Z dejdeeje ejffd Z xZ S) DiscriminatorB window_lengthr+ hop_factorbands.c t |||_||_t |t ||z|d|_|dzdz}|Dcgc]$}t |d|zt |d|zf&}}||_fd}tjtt|jDcgc] }| c}|_ ttjdddd|_ycc}wcc}w) N)n_fft hop_length win_lengthpowerrrrctjttjddddttjdddttjdddttjdddttjdddgS)Nr)r= rr)rrrS)rrr=r=)r"r#rr)r+sr.z)DiscriminatorB.__init__..8s BIIa666RSIIh&&&QIIh&&&QIIh&&&QIIh&&&Q  r/rrrS)rr rrrintspec_fnrr"r#rcrU band_convsrrrf) r*rr+rrrrrH_r-s ` r.r zDiscriminatorB.__init__s *$"=:56$   "Q&AFGA#adUl#S1%67GG  "--%DJJ:P(QQ(QR$ IIh666 B +H&)Rs )C41C9r3returncV||jddz }d|z|jjddddzz }|j|}t j |}|j ddd d }|jDcgc]}|d |d|d f}}|Scc}w) NrT)dimkeepdimsg?)rkeepdimrg& .>r=rr.)meanabsmaxrrn view_as_realpermuter)r*r3rx_bandss r. spectrogramzDiscriminatorB.spectrogramNs 2- - !Gquuw{{r4{8;dB C LLO   q ! IIaAq !04 ;11S!A$1+%&;;|j|P|j|tt j|d}|j|}|j|||fS)Nrrrr)r) rsqueezer2rr_rnr" functionalrracatrf)r*r3rrbandstackrglayers r.r9zDiscriminatorB.forwardZs""199Q<0 w8 KD%%e, &5T{xx**55dC@q5KK%  & HHTN   IIaR  NN1  A$wr/)ru?))gr)rr)r?)r?)rg?) r@rArBrfloatrr rnTensorrrr9rCrDs@r.rrs 2 . . .  . U5%<(#-. . ^ U\\ d5<<.@ % d5<<>P0P*Qr/rc eZdZfdZdej dej deeej eej eeej eeej ffdZxZ S)MultiBandDiscriminatorct||jdgd|_t j |jDcgc]}t |c}|_ycc}w)z Multi-band multi-scale STFT discriminator, with the architecture based on https://github.com/descriptinc/descript-audio-codec. and the modified code adapted from https://github.com/gemelo-ai/vocos. mbd_fft_sizes)irxrw)rN)rr get fft_sizesr"r#rr)r*r!wr-s r.r zMultiBandDiscriminator.__init__qsN 0AB mm6:nn E^! , E  EsA&rrrcg}g}g}g}|jD]^}||\}} ||\} } |j||j| |j| |j| `||||fS)N)r3)rra) r*rrrrrrrlrrrrs r.r9zMultiBandDiscriminator.forwards$$ #AFME6JME6 MM% NN6 " MM% NN6 "  #vw//r/) r@rArBr rnrrrr9rCrDs@r.rrpso  00ell0u U\\ U\\ T%,,   T%,,   "@0r/rc d}t||D]G\}}t||D]3\}}|tjtj||z z }5I|dzS)Nrr)r2rnrr)rrlossdrdgrlgls r. feature_lossrse Dff%3B"bk 3FB EJJuyyb12 2D 33 6Mr/c2d}g}g}t||D]~\}}tjd|z dz}tj|dz}|||zz }|j|j |j|j |||fSNrrr)r2rnrraitem) disc_real_outputsdisc_generated_outputsrr_lossesg_lossesrrr_lossg_losss r.discriminator_lossrs DHH')?@'BQrTAI&BE" &! & & ' 8 ##r/cd}g}|D]3}tjd|z dz}|j|||z }5||fSr)rnrra) disc_outputsr gen_lossesrr<s r.generator_lossrsU DJ JJ"qy !!    r/c$eZdZfdZdZxZS) MossformercPtt| tddd|_y)NrQrw) in_channels out_channelsout_channels_final)rrr r mossformerrs r.r zMossformer.__init__s! j$(*,#bder/c(|j|}|Sr1)r)r*inputouts r.r9zMossformer.forwardsooe$ r/rrDs@r.rrsfr/r)/rntorch.nn.functionalr"rrtorch.nnrrrrtorch.nn.utilsrrr models.mossformer2_sr.utilsr r !models.mossformer2_sr.mossformer2r models.mossformer2_sr.snaker typingrrrrrmodels.mossformer2_sr.envrtorchaudio.transformsrrrModulerrFrOrsrrrrrrrrrr/r.r s  ??IIA@/55. 7 4"4"n""<7+7+t!UXX__!H0uxx06UXX__80ehhoo0FMRYYMd$0RYY$0L $r/