B 3cf@sddlmZddlZddlmZddlmmZGdddejZ GdddejZ GdddejZ Gd d d ejZ Gd d d ejZ Gd ddejZGdddejZGdddejZddZGdddejZGdddejZd$ddZd%ddZd&dd Zd'd"d#ZdS)()partialNcs*eZdZdZd fdd ZddZZS) Upsample1Da An upsampling layer with an optional convolution. Parameters: channels: channels in the inputs and outputs. use_conv: a bool determining if a convolution is applied. use_conv_transpose: out_channels: FNconvcsnt||_|p||_||_||_||_d|_|rNt ||jddd|_n|rjtj |j|jddd|_dS)N)padding) super__init__channels out_channelsuse_convuse_conv_transposenamernnConv1DTransposeConv1D)selfr rrr r) __class__+/home/aistudio/ppdiffusers/models/resnet.pyr "s  zUpsample1D.__init__cCsH|jd|jkst|jr$||Stj|ddd}|jrD||}|S)Nrg@nearest) scale_factormode)shaper AssertionErrorrrF interpolater)rxrrrforward0s  zUpsample1D.forward)FFNr)__name__ __module__ __qualname____doc__r r __classcell__rr)rrrs rcs*eZdZdZd fdd Zdd ZZS) Downsample1Dz A downsampling layer with an optional convolution. Parameters: channels: channels in the inputs and outputs. use_conv: a bool determining if a convolution is applied. out_channels: padding: FNrrcstt||_|p||_||_||_d}||_|rPtj|j|jd||d|_ n |j|jks`t tj ||d|_ dS)Nrr)strider ) kernel_sizer') r r r r rr rrrrr AvgPool1D)rr rr r rr')rrrr Hs  zDownsample1D.__init__cCs|jd|jkst||S)Nr)rr rr)rrrrrr WszDownsample1D.forward)FNrr)r!r"r#r$r r r%rr)rrr&=s r&cs,eZdZdZd fdd Zd ddZZS) Upsample2Dz An upsampling layer with an optional convolution. Parameters: channels: channels in the inputs and outputs. use_conv: a bool determining if a convolution is applied. use_conv_transpose: out_channels: FNrcs~t||_|p||_||_||_||_d}|rJt||jddd}n|rdtj |j|jddd}|dkrt||_ n||_ dS)Nrrrr)r r) r r r r rrrrConv2DTransposeConv2DrConv2d_0)rr rrr rr)rrrr gs  zUpsample2D.__init__cCs|jd|jkst|jr$||S|j}|tjkr>|d}|dkrXt j |ddd}nt j ||dd}|tjkr|||}|j r|j dkr||}n | |}|S)Nrfloat32g@r)rr)sizerr)rr rrrdtypepaddlebfloat16castrrrrr-)r hidden_states output_sizer0rrrr {s         zUpsample2D.forward)FFNr)N)r!r"r#r$r r r%rr)rrr*\s r*cs*eZdZdZd fdd Zdd ZZS) Downsample2Dz A downsampling layer with an optional convolution. Parameters: channels: channels in the inputs and outputs. use_conv: a bool determining if a convolution is applied. out_channels: padding: FNrrcst||_|p||_||_||_d}||_|rNtj|j|jd||d}n|j|jks^t tj ||d}|dkr||_ ||_ n|dkr||_ n||_ dS)Nrr)r'r )r(r'rr-) r r r r rr rrr,r AvgPool2Dr-r)rr rr r rr'r)rrrr s"  zDownsample2D.__init__cCs\|jd|jkst|jr:|jdkr:d}tj||ddd}|jd|jksNt||}|S)Nrr)rrrrconstant)rvalue)rr rrr rpadr)rr4r:rrrr s zDownsample2D.forward)FNrr)r!r"r#r$r r r%rr)rrr6s r6cs0eZdZd fdd Zd dd Zd d ZZS) FirUpsample2DNFrrrrcsFt|r|n|}|r0tj||dddd|_||_||_||_dS)Nrr)r(r'r )r r rr,r-r fir_kernelr )rr r rr=)rrrr s  zFirUpsample2D.__init__rrcCst|tr|dkst|dkr(dg|}tj|dd}|jdkrLt||}|t|}|||d}|jr|j d}|j d}|j d}|j d||d} ||f} |j dd|||j dd||f} | d|j dd| d|| d|j dd| d|f} | ddkr:| ddks>t|j d|} | | d|||g}tj |dd gd  ddddd g}| | |d||g}t j||| | dd }t|t|| dd|d| ddfd }n<|j d|} t|t||| dd|d| dfd }|S)aCFused `upsample_2d()` followed by `Conv2d()`. Padding is performed only once at the beginning, not between the operations. The fused op is considerably more efficient than performing the same calculation using standard TensorFlow ops. It supports gradients of arbitrary order. Args: hidden_states: Input tensor of the shape `[N, C, H, W]` or `[N, H, W, C]`. weight: Weight tensor of the shape `[filterH, filterW, inChannels, outChannels]`. Grouped convolution can be performed by `inChannels = x.shape[0] // numGroups`. kernel: FIR filter of the shape `[firH, firW]` or `[firN]` (separable). The default is `[1] * factor`, which corresponds to nearest-neighbor upsampling. factor: Integer upsampling factor (default: 2). gain: Scaling factor for signal magnitude (default: 1.0). Returns: output: Tensor of the shape `[N, C, H * factor, W * factor]` or `[N, H * factor, W * factor, C]`, and same datatype as `hidden_states`. rNr.)r0rrrr)axis)r'output_paddingr )r:)upr:) isinstanceintrr1 to_tensorndimoutersumrrreshapeflip transposerconv2d_transposeupfirdn2d_native)rr4weightkernelfactorgainconvHconvWinC pad_valuer' output_shaper@ num_groupsZ inverse_convoutputrrr _upsample_2dsH       $ "& zFirUpsample2D._upsample_2dcCsN|jr8|j||jj|jd}||jjddddg}n|j||jdd}|S)N)rNrr>r)rNrO)rrXr-rMr=biasrH)rr4heightrrrr &s zFirUpsample2D.forward)NNFr<)NNrr)r!r"r#r rXr r%rr)rrr;s Pr;cs0eZdZd fdd Zd dd Zd d ZZS)FirDownsample2DNFrrrrcsFt|r|n|}|r0tj||dddd|_||_||_||_dS)Nrr)r(r'r )r r rr,r-r=rr )rr r rr=)rrrr 1s  zFirDownsample2D.__init__rrc Cst|tr|dkst|dkr(dg|}tj|dd}|jdkrLt||}|t|}||}|jr|j \}}}}|j d||d} ||g} t |t|| dd| dfd} t j | || dd} n4|j d|} t |t||| dd| dfd } | S) a>Fused `Conv2d()` followed by `downsample_2d()`. Padding is performed only once at the beginning, not between the operations. The fused op is considerably more efficient than performing the same calculation using standard TensorFlow ops. It supports gradients of arbitrary order. Args: hidden_states: Input tensor of the shape `[N, C, H, W]` or `[N, H, W, C]`. weight: Weight tensor of the shape `[filterH, filterW, inChannels, outChannels]`. Grouped convolution can be performed by `inChannels = x.shape[0] // numGroups`. kernel: FIR filter of the shape `[firH, firW]` or `[firN]` (separable). The default is `[1] * factor`, which corresponds to average pooling. factor: Integer downsampling factor (default: 2). gain: Scaling factor for signal magnitude (default: 1.0). Returns: output: Tensor of the shape `[N, C, H // factor, W // factor]` or `[N, H // factor, W // factor, C]`, and same datatype as `x`. rNr.)r0rr)r:)r'r )downr:) rBrCrr1rDrErFrGrrrLrconv2d) rr4rMrNrOrP_rQrRrTZ stride_valueZ upfirdn_inputrWrrr_downsample_2d:s0   zFirDownsample2D._downsample_2dcCsN|jr8|j||jj|jd}||jjddddg}n|j||jdd}|S)N)rMrNrr>r)rNrO)rr`r-rMr=rYrH)rr4Zdownsample_inputrrrr ps zFirDownsample2D.forward)NNFr\)NNrr)r!r"r#r r`r r%rr)rrr[0s 6r[csFeZdZdddddddddd dd dddd fd d ZddZZS) ResnetBlock2DNFgi Tgư>swishdefaultg?)r conv_shortcutdropout temb_channelsgroups groups_outpre_normeps non_linearitytime_embedding_normrNoutput_scale_factoruse_in_shortcutrAr]csDt||_d|_||_|dkr(|n|}||_||_| |_||_||_| |_ |dkr\|}t j ||| d|_ t j ||dddd|_|dk r|jdkr|}n&|jdkr|d}ntd |jd t |||_nd|_t j ||| d|_t ||_t j ||dddd|_| d kr"d d |_n(| dkr6t|_n| dkrJt |_d|_|_|jr| dkr|dfdd |_n,| dkrttjddd|_nt|dd|_nV|jr| dkrdfdd |_n0| dkrttjddd|_nt |dddd|_|dkr|j|jkn||_!d|_"|j!r@t j ||dddd|_"dS)NT)rV num_channelsepsilonrr)r(r'r rd scale_shiftrzunknown time_embedding_norm :  rccSs t|S)N)rsilu)rrrrz(ResnetBlock2D.__init__..mishrtfir)rrrrcs t|dS)N)rN) upsample_2d)r)r=rrrurvsde_vpg@r)rrF)rcs t|dS)N)rN) downsample_2d)r)r=rrrurv)r(r'op)rr rr)#r r rj in_channelsr use_conv_shortcutrmrAr]rnr GroupNormnorm1r,conv1 ValueErrorLinear time_emb_projnorm2Dropoutrfconv2 nonlinearityMishSiluupsample downsamplerrrr* avg_pool2dr6rore)rr}r rerfrgrhrirjrkrlrmrNrnrorAr]Ztime_emb_proj_out_channels)r)r=rr {sf                zResnetBlock2D.__init__cCs,|}||}||}|jdk r8||}||}n|jdk rV||}||}||}|dk r|||ddddddf}|dk r|jdkr||}||}|dk r|jdkrtj |ddd\}}|d||}||}| |}| |}|j dk r| |}|||j }|S)Nrdrrrr)r?)rrrrrrrmrr1chunkrfrrern)r input_tensortembr4scaleshift output_tensorrrrr s2         $      zResnetBlock2D.forward)r!r"r#r r r%rr)rrrazs Drac@seZdZddZdS)rcCs|tt|S)N)r1tanhrsoftplus)rr4rrrr sz Mish.forwardN)r!r"r#r rrrrrsrcCst|jdkr$|dddddfSt|jdkrN|dddddddfSt|jdkrx|dddddddfStdt|ddS)Nrrrrz`len(tensor)`: z has to be 2, 3 or 4.)lenrr)tensorrrrrearrange_dimssrcs*eZdZdZdfdd ZddZZS) Conv1dBlockz' Conv1d --> GroupNorm --> Mish cs>ttj||||dd|_t|||_t|_dS)Nr)r ) r r rrconv1dr group_normrrw)r inp_channelsr r(n_groups)rrrr s zConv1dBlock.__init__cCs2||}t|}||}t|}||}|S)N)rrrrw)rrrrrr s    zConv1dBlock.forward)r)r!r"r#r$r r r%rr)rrr srcs&eZdZdfdd ZddZZS)ResidualTemporalBlock1Dcsbtt||||_t||||_t|_t|||_ ||krTt ||dnt |_ dS)Nr) r r rconv_inconv_outrr time_emb_actrtime_embrIdentity residual_conv)rrr embed_dimr()rrrr !s   z ResidualTemporalBlock1D.__init__cCs>||}||}||t|}||}|||S)z Args: x : [ batch_size x inp_channels x horizon ] t : [ batch_size x embed_dim ] returns: out : [ batch_size x out_channels x horizon ] )rrrrrr)rrtoutrrrr -s   zResidualTemporalBlock1D.forward)r)r!r"r#r r r%rr)rrr s rrrcCst|tr|dkst|dkr(dg|}tj|dd}|jdkrLt||}|t|}|dkrt|||d}n ||d}|jd|}t ||||dd|d|dfd}|S)aUpsample2D a batch of 2D images with the given filter. Accepts a batch of 2D images of the shape `[N, C, H, W]` or `[N, H, W, C]` and upsamples each image with the given filter. The filter is normalized so that if the input pixels are constant, they will be scaled by the specified `gain`. Pixels outside the image are assumed to be zero, and the filter is padded with zeros so that its shape is a: multiple of the upsampling factor. Args: hidden_states: Input tensor of the shape `[N, C, H, W]` or `[N, H, W, C]`. kernel: FIR filter of the shape `[firH, firW]` or `[firN]` (separable). The default is `[1] * factor`, which corresponds to nearest-neighbor upsampling. factor: Integer upsampling factor (default: 2). gain: Scaling factor for signal magnitude (default: 1.0). Returns: output: Tensor of the shape `[N, C, H * factor, W * factor]` rNr.)r0rr)rAr:) rBrCrr1rDrErFrGrrL)r4rNrOrPrTrWrrrry=s"     rycCst|tr|dkst|dkr(dg|}tj|dd}|jdkrLt||}|t|}||}|jd|}t ||||dd|dfd}|S)aDownsample2D a batch of 2D images with the given filter. Accepts a batch of 2D images of the shape `[N, C, H, W]` or `[N, H, W, C]` and downsamples each image with the given filter. The filter is normalized so that if the input pixels are constant, they will be scaled by the specified `gain`. Pixels outside the image are assumed to be zero, and the filter is padded with zeros so that its shape is a multiple of the downsampling factor. Args: hidden_states: Input tensor of the shape `[N, C, H, W]` or `[N, H, W, C]`. kernel: FIR filter of the shape `[firH, firW]` or `[firN]` (separable). The default is `[1] * factor`, which corresponds to average pooling. factor: Integer downsampling factor (default: 2). gain: Scaling factor for signal magnitude (default: 1.0). Returns: output: Tensor of the shape `[N, C, H // factor, W // factor]` rNr.)r0rr)r]r:) rBrCrr1rDrErFrGrrL)r4rNrOrPrTrWrrrr{es    r{c Cs|dkrPtj|tj|jd|jd|jd|jd||jdg|jdgdd}|dkrtj|tj|jd|jd||jd|jd|jdg|jdgdd}|S) Nrrrrr)r0r)r?)r1concatzerosrr0)rup_xup_yrrr dummy_pads, , rrrcCs`|}}|}}|d} } |d} } |j\} }}}|d||dg}|j\} }}}|j\}}|d|d|d|g}t||d|d}|d|||||g}|d}tj|t| dt| dt| dt| dddgdd}|d}|ddt| d|jdt| dt| d|jdt| dddf}|ddddg}|dd||| | ||| | g}t |ddgdd||g}t ||}|d|||| | |d||| | |dg}|ddddg}|dddd|dd|ddf}||| | ||d}||| | ||d}|d|||gS)Nrrr>NDHWC) data_formatrr) rrHr unsqueezerr:maxsqueezerJr1rIr^)rrNrAr]r:rrdown_xdown_ypad_x0pad_y0pad_x1pad_y1r_channelin_hin_wminorkernel_hkernel_wrwout_hout_wrrrrLs8    4  ,* 6$rL)Nrr)Nrr)rr)rrr) functoolsrr1 paddle.nnrpaddle.nn.functional functionalrLayerrr&r*r6r;r[rarrrrryr{rrLrrrrs$  &A/dJ  ( !