B R cZ@s&ddlmZddlmZmZmZmZddlZddl Z ddl m Z ddl m Z mZddlmZddlmZdd lmZmZmZeGd d d eZeGd d d eZeGdddeZGddde jZGddde jZGddde jZGdddeZGdddee Z Gdddee Z!dS)) dataclass)ListOptionalTupleUnionN) ConfigMixinregister_to_config) ModelMixin) BaseOutput)UNetMidBlock2Dget_down_block get_up_blockc@seZdZUdZejed<dS) DecoderOutputz Output of decoding method. Args: sample (`paddle.Tensor` of shape `(batch_size, num_channels, height, width)`): Decoded output sample of the model. Output of the last layer of the model. sampleN)__name__ __module__ __qualname____doc__paddleTensor__annotations__rr(/home/aistudio/ppdiffusers/models/vae.pyrs rc@seZdZUdZejed<dS)VQEncoderOutputz Output of VQModel encoding method. Args: latents (`paddle.Tensor` of shape `(batch_size, num_channels, height, width)`): Encoded output sample of the model. Output of the last layer of the model. latentsN)rrrrrrrrrrrr)s rc@seZdZUdZded<dS)AutoencoderKLOutputa@ Output of AutoencoderKL encoding method. Args: latent_dist (`DiagonalGaussianDistribution`): Encoded outputs of `Encoder` represented as the mean and logvar of `DiagonalGaussianDistribution`. `DiagonalGaussianDistribution` allows for sampling latents from the distribution. DiagonalGaussianDistribution latent_distN)rrrrrrrrrr6s rcs&eZdZd fdd Zd d ZZS) EncoderDownEncoderBlock2D@r siluTc st||_tj||ddddd|_d|_tg|_|d} x^t |D]R\} } | } || } | t |dk} t | |j| | | dd||ddd }|j |qNWt |dd|ddd|dd |_tj|d|dd |_t|_|rd |n|}tj|d|ddd |_dS) Nrr!r ) kernel_sizestridepaddinggư>) num_layers in_channels out_channelsadd_downsample resnet_epsdownsample_padding resnet_act_fn resnet_groupsattn_num_head_channels temb_channelsdefault)r,r/r1output_scale_factorresnet_time_scale_shiftr3r2r4) num_channels num_groupsepsilonr)r*)super__init__layers_per_blocknnConv2Dconv_in mid_block LayerList down_blocks enumeratelenrappendr GroupNorm conv_norm_outSiluconv_actconv_out)selfr,r-down_block_typesblock_out_channelsr>norm_num_groupsact_fndouble_zoutput_channelidown_block_type input_channelis_final_block down_blockconv_out_channels) __class__rrr=EsJ    zEncoder.__init__cCsR|}||}x|jD] }||}qW||}||}||}||}|S)N)rArDrBrIrKrL)rMxrrXrrrforwards       zEncoder.forward)r!r!r"r$rr&r'T)rrrr=r\ __classcell__rr)rZrr Ds5r cs&eZdZd fdd Zd d ZZS) Decoderr!UpDecoderBlock2Dr%rr&r'cst||_tj||ddddd|_d|_tg|_t |dd|ddd|dd|_t t |}|d} xft |D]Z\} } | } || } | t |dk} t| |jd| | d| d||ddd }|j|| } qxWtj|d|dd |_t|_tj|d|ddd |_dS) Nr5r!r )r(r)r*gư>r6)r,r/r1r7r8r3r2r4r) r+r,r-prev_output_channel add_upsampler/r1r2r3r4)r9r:r;)r*)r<r=r>r?r@rArBrC up_blocksr listreversedrErFrrGrHrIrJrKrL)rMr,r-up_block_typesrOr>rPrQreversed_block_out_channelsrSrT up_block_typerbrWup_block)rZrrr=sJ      zDecoder.__init__cCsR|}||}||}x|jD] }||}q W||}||}||}|S)N)rArBrdrIrKrL)rMzrrjrrrr\s       zDecoder.forward)r!r!r_rarr&r')rrrr=r\r]rr)rZrr^s4r^csBeZdZdZdfdd Zdd Zd d Zd d ZddZZ S)VectorQuantizerz Improved version over VectorQuantizer, can be used as a drop-in replacement. Mostly avoids costly matrix multiplications and allows for post-hoc remapping of indices. NrandomFTcst||_||_||_||_tj|j|jtj d|jd|jd|_ ||_ |j dk r| dt t|j |jjd|_||_|jdkr|j|_|jd|_td|jd |jd |jd n||_||_dS) Ngg?) weight_attrusedrextrar z Remapping z indices to z indices. Using z for unknown indices.)r<r=n_e vq_embed_dimbetalegacyr? Embedding initializerUniform embeddingremapregister_bufferr to_tensornploadroshapere_embed unknown_indexprintsane_index_shape)rMrqrrrsryrrrt)rZrrr=s& (   "zVectorQuantizer.__init__cCs|j}t|dkst||ddg}|j|j}|dddddf|dkd}|d}|ddk}|j dkrt j d|j ||jd||<n |j ||<||S) Nr rr5)NN.int64rrm)r~) r~rFAssertionErrorreshaperocastdtypeargmaxsumrrrandintr)rMindsishaperomatchnewunknownrrr remap_to_used s$   zVectorQuantizer.remap_to_usedcCs|j}t|dkst||ddg}|j|j}|j|jjdkr\d|||jjdk<tj |dddf|jddgddf|dd}||S)Nr rr5)axis) r~rFrrrorrrrtake_along_axis)rMrrrobackrrr unmap_to_alls4zVectorQuantizer.unmap_to_allc Cs|ddddg}|d|jg}tj|ddddtj|jjddddtj||jjdd }tj|dd}|||j }d}d}|j s|j t | |dt || d}n2t | |d|j t || d}||| }|ddddg}|jdk rR||j ddg}||}|ddg}|jr|||j d|j d|j dg}|||||ffS) Nrrr!r r5T)rkeepdim)r) transpose_y) transposerrrrrrxweightmatmulargminr~rtrsmeandetachryrr) rMrk z_flatteneddmin_encoding_indicesz_q perplexity min_encodingslossrrrr\%s&B42  "zVectorQuantizer.forwardcCsd|jdk r2||ddg}||}|dg}||}|dk r`||}|ddddg}|S)Nrr5r!r r)ryrrrxr)rMindicesr~rrrrget_codebook_entryLs    z"VectorQuantizer.get_codebook_entry)NrmFT) rrrrr=rrr\rr]rr)rZrrls  'rlc@sTeZdZdddZdeejejdddZddd Z d d d gfd dZ ddZ dS)rFcCs|||_tj|ddd\|_|_t|jdd|_||_td|j|_t|j|_ |jrxtj |j|jj d|_ |_dS)Nrr )rg>g4@g?)r) parametersrchunkrlogvarclip deterministicexpstdvar zeros_liker)rMrrrrrr=csz%DiagonalGaussianDistribution.__init__N) generatorreturncCs4tj|jj|d}||jj}|j|j|}|S)N)r)rrandnrr~rrrr)rMrrr[rrrrmsz#DiagonalGaussianDistribution.samplecCs|jrtdgS|dkrLdtjt|jd|jd|jdddgdSdtjt|j|jd|j|j|jd|j|jdddgdSdS)Ngg?rg?r r!)r)rrr{rpowrrr)rMotherrrrklts 24zDiagonalGaussianDistribution.klr rr!cCsR|jrtdgStdtj}dtj||jt||j d|j |dS)Ngg@g?r)r) rrr{r|logpirrrrr)rMrrlogtwopirrrnlls z DiagonalGaussianDistribution.nllcCs|jS)N)r)rMrrrmodesz!DiagonalGaussianDistribution.mode)F)N)N) rrrr=rr Generatorrrrrrrrrrrbs   rcseZdZdZedeeeeeeeeeeeeeeeed fd d Z de j e dddZ de j e e dddZde j e dddZZS)VQModelafVQ-VAE model from the paper Neural Discrete Representation Learning by Aaron van den Oord, Oriol Vinyals and Koray Kavukcuoglu. This model inherits from [`ModelMixin`]. Check the superclass documentation for the generic methods the library implements for all the model (such as downloading or saving, etc.) Parameters: in_channels (int, *optional*, defaults to 3): Number of channels in the input image. out_channels (int, *optional*, defaults to 3): Number of channels in the output. down_block_types (`Tuple[str]`, *optional*, defaults to : obj:`("DownEncoderBlock2D",)`): Tuple of downsample block types. up_block_types (`Tuple[str]`, *optional*, defaults to : obj:`("UpDecoderBlock2D",)`): Tuple of upsample block types. block_out_channels (`Tuple[int]`, *optional*, defaults to : obj:`(64,)`): Tuple of block output channels. act_fn (`str`, *optional*, defaults to `"silu"`): The activation function to use. latent_channels (`int`, *optional*, defaults to `3`): Number of channels in the latent space. sample_size (`int`, *optional*, defaults to `32`): TODO num_vq_embeddings (`int`, *optional*, defaults to `256`): Number of codebook vectors in the VQ-VAE. vq_embed_dim (`int`, *optional*): Hidden dim of codebook vectors in the VQ-VAE. r!r#r`r%r r'r&N) r,r-rNrgrOr>rQlatent_channels sample_sizenum_vq_embeddingsrPrrc stt||||||| dd|_| dk r0| n|} t|| d|_t| | dddd|_t| |d|_ t ||||||| d|_ dS)NF)r,r-rNrOr>rQrPrRr g?)rsryr)r,r-rgrOr>rQrP) r<r=r encoderr?r@ quant_convrlquantizepost_quant_convr^decoder) rMr,r-rNrgrOr>rQrrrrPrr)rZrrr=s,  zVQModel.__init__T)r[ return_dictcCs(||}||}|s|fSt|dS)N)r)rrr)rMr[rhrrrencodes   zVQModel.encodeF)rforce_not_quantizercCsB|s||\}}}n|}||}||}|s8|fSt|dS)N)r)rrrr)rMrrrquantemb_lossinfodecrrrdecodes  zVQModel.decode)rrcCs0|}||j}||j}|s&|fSt|dS)z Args: sample (`paddle.Tensor`): Input sample. return_dict (`bool`, *optional*, defaults to `True`): Whether or not to return a [`DecoderOutput`] instead of a plain tuple. )r)rrrrr)rMrrr[rrrrrr\s   zVQModel.forward) r!r!rrrr r'r!r&rr&N)T)FT)T)rrrrr intrstrrr=rrboolrrr\r]rr)rZrrs"rQrrPrc stt|| ||dk r|n||| | dd|_t| |||dk rD|n||| | d|_td| d| d|_t| | d|_ dS)NT)r,r-rNrOr>rQrPrR)r,r-rgrOr>rPrQrr ) r<r=r rr^rr?r@rr) rMr,r-rNrrgrrOr>rQrrPr)rZrrr=s.     zAutoencoderKL.__init__T)r[rcCs0||}||}t|}|s&|fSt|dS)N)r)rrrr)rMr[rrmoments posteriorrrrrFs   zAutoencoderKL.encode)rkrcCs(||}||}|s|fSt|dS)N)r)rrr)rMrkrrrrrrRs   zAutoencoderKL.decodeF)rsample_posteriorrrrc CsJ|}||j}|r"|j|d}n|}||j}|s@|fSt|dS)ab Args: sample (`paddle.Tensor`): Input sample. sample_posterior (`bool`, *optional*, defaults to `False`): Whether to sample from the posterior. return_dict (`bool`, *optional*, defaults to `True`): Whether or not to return a [`DecoderOutput`] instead of a plain tuple. )r)r)rrrrrr) rMrrrrr[rrkrrrrr\[s  zAutoencoderKL.forward) r!r!rNrNrr r'rr&r&)T)T)FTN)rrrrr rrrr=rrrrrrrrrrr\r]rr)rZrrs(@#   r)" dataclassesrtypingrrrrnumpyr|r paddle.nnr?configuration_utilsrr modeling_utilsr utilsr unet_2d_blocksr rrrrrLayerr r^rlobjectrrrrrrrs(        RP|,n