U Od@sbdZddlZddlmZddlZddlZddlm Z ddl m Z m Z ddl mZmZddlmZddlZddlmZddlmZdd lmZdd lmZdd lmZmZmZmZmZm Z m!Z!m"Z"dd l#m$Z$dd l%m&Z&m'Z'ddl(m)Z)m*Z*ddl+m,Z,m-Z-m.Z.ddl/m0Z0ddddZ1d ddZ2ddZ3Gdddej4Z5Gddde5Z6Gdddej4Z7dS)!ap wild mixture of https://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e3164b3/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py https://github.com/openai/improved-diffusion/blob/e94489283bb876ac1477d5dd7709bbbd2d9902ce/improved_diffusion/gaussian_diffusion.py https://github.com/CompVis/taming-transformers -- merci N)LambdaLR) rearrangerepeat)contextmanager nullcontext)partial)tqdm) make_grid)rank_zero_only) ListConfig)log_txt_as_imgexistsdefaultismapisimage mean_flat count_paramsinstantiate_from_config)LitEma) normal_klDiagonalGaussianDistribution)IdentityFirstStage AutoencoderKL)make_beta_scheduleextract_into_tensor noise_like) DDIMSamplerc_concat c_crossattny)concat crossattnadmTcCs|S)zbOverwrite model.train with this function to make sure train/eval mode does not change anymore.)selfmoder#r#o/group/30042/chongmou/ft_local/Diffusion/iccv23/ft_local/hug_coadapter/T2I-Adapter/ldm/models/diffusion/ddpm.pydisabled_train$sr'cCs||tj|d|i|S)Ndevice)torchrand)r1r2shaper(r#r#r&uniform_on_device*sr.csveZdZddddgddddd d d dd d dddddddddddddddffdd ZdJddZedKddZee dfddZ ddZ ddZ ddZ d d!Zd"d#Zed$d%d&ZedLd'd(ZedMd)d*ZedNd,d-ZdOd.d/Zd0d1ZdPd2d3ZdQd4d5Zd6d7Zd8d9Zd:d;Zdd?Zd@dAZdBdCZedRdFdGZ dHdIZ!Z"S)SDDPMlinearl2NFzval/lossTimaged-C6?{Gz?Mb??epsc st|dkstd||_t|jjd|jdd|_||_| |_ | |_ | |_ | |_ ||_ t|||_t|jdd| |_|jrt|j|_tdtt|jd|dk |_|jr||_||_||_||_|dk r||_||_|rt|st|dk rD|j|||d |rD|js0ttd t|j|_|rhtd |js^t|j |j!||||||d ||_"||_#t$j%||j&fd |_'|j#rt(j)|j'dd|_'|pt*|_+|j+rt,j-.|_/dS)N)r<x0vz0currently only supporting "eps" and "x0" and "v"z : Running in z-prediction modeT)verbosezKeeping EMAs of .) ignore_keys only_model_Resetting ema to pure model weights. This is useful when restoring from an ema-only checkpoint.D +++++++++++ WARNING: RESETTING NUM_EMA UPDATES TO ZERO +++++++++++ ) given_betas beta_schedule timesteps linear_start linear_endcosine_s) fill_valuesize) requires_grad)0super__init__AssertionErrorparameterizationprint __class____name__cond_stage_model clip_denoised log_every_tfirst_stage_key image_sizechannelsuse_positional_encodingsDiffusionWrappermodelruse_emar model_emalenlistbuffers use_schedulerscheduler_config v_posteriororiginal_elbo_weightl_simple_weightmonitor make_it_fitr init_from_ckptreset_num_updatesregister_schedule loss_type learn_logvarr)full num_timestepslogvarnn Parameterdict ucg_trainingnprandom RandomStateZucg_prng) r$Z unet_configrGrFrm ckpt_pathrAZload_only_unetrhr^rXrYrZrWrVrHrIrJrErfrergconditioning_keyrQrdr[rnZ logvar_initriru reset_emareset_num_ema_updatesrSr#r&rO0sf!          z DDPM.__init__c Cst|r|}nt|||||d}d|}tj|dd} td| dd} |j\}t||_||_||_ | jd|jkst dt t j t jd} |d| ||d | | |d | | |d | t| |d | td| |d | td| |d| td| |d| td| dd|j|d| d| |j|} |d| | |d| tt| d|d| |t| d| |d| d| t|d| |jdkr|jdd|j| |d|j} nr|jdkrDdtt | dt | } nB|jdkr~t |jdd|j| |d|j} ntd| d| d<|jd| ddt |jrt dS) N)rHrIrJr;r)axisz+alphas have to be defined for each timestep)dtypebetasalphas_cumprodalphas_cumprod_prevsqrt_alphas_cumprodsqrt_one_minus_alphas_cumprodlog_one_minus_alphas_cumprodsqrt_recip_alphas_cumprodsqrt_recipm1_alphas_cumprodposterior_varianceposterior_log_variance_clippedg#B ;posterior_mean_coef1posterior_mean_coef2r<r=?@r>zmu not supported lvlb_weightsF) persistent)r rrvcumprodappendr-intrprHrIrPrr)tensorfloat32register_buffersqrtlogremaximumrQrrrTensor ones_likeNotImplementedErrorisnanrall)r$rErFrGrHrIrJralphasrrto_torchrrr#r#r&rlsb    $   zDDPM.register_schedulec cs||jr<|j|j|j|j|dk rtd|||=q>q6|jrztddt | | D}t t | | d|dD]\} } | |krq|| j} | j} t| t| kstt| d kr| d d| d dkst| | ks| } || }t| d krdt| jd D]}||| d | |<qDn t| d krpt| jd D]@}t| jd D]*}||| d || d f| ||f<qqt| d }t| jd D]}||| d d 7<qt| d }t| jd D]}||| d ||<q|dddf}t|jt| krh|d }qF| |} | || <q|s|j|d dn|jj|d d\}}td|dt|dt|dt|d krtd|t|d krtd|dS)Ncpu) map_location state_dictz Deleting key {} from state_dict.cSsg|] \}}|qSr#r#).0name_r#r#r& sz'DDPM.init_from_ckpt..z"Fitting old weights to new weightsdesctotalrrrrF)strictzRestored from z with z missing and z unexpected keyszMissing Keys: z Unexpected Keys: )r)loadrakeys startswithrRformatrir` itertoolschainnamed_parameters named_buffersrr-rPclonerangeoneszeros unsqueezeload_state_dictr])r$pathrArBsdrkikn_paramsrparamZ old_shape new_shape new_param old_paramijZ n_used_oldZ n_used_newmissing unexpectedr#r#r&rjsv      ,  $zDDPM.init_from_ckptcCsBt|j||j|}td|j||j}t|j||j}|||fS)a Get the distribution q(x_t | x_0). :param x_start: the [N x C x ...] tensor of noiseless inputs. :param t: the number of diffusion steps (minus 1). Here, 0 means one step. :return: A tuple (mean, variance, log_variance), all of x_start's shape. r;)rrr-rr)r$x_starttmeanvarianceZ log_variancer#r#r&q_mean_varianceszDDPM.q_mean_variancecCs(t|j||j|t|j||j|SNrrr-r)r$x_trnoiser#r#r&predict_start_from_noiseszDDPM.predict_start_from_noisecCs(t|j||j|t|j||j|Srrrr-rr$rrr>r#r#r&predict_start_from_z_and_v szDDPM.predict_start_from_z_and_vcCs(t|j||j|t|j||j|Srrrr#r#r&predict_eps_from_z_and_v(szDDPM.predict_eps_from_z_and_vcCsRt|j||j|t|j||j|}t|j||j}t|j||j}|||fSr)rrr-rrr)r$rrrZposterior_meanrrr#r#r& q_posterior.szDDPM.q_posteriorrVc Csf|||}|jdkr(|j|||d}n|jdkr6|}|rF|dd|j|||d\}}}|||fS)Nr<rrr=r;rrr)r]rQrclamp_r) r$xrrV model_outx_recon model_meanrposterior_log_variancer#r#r&p_mean_variance7s    zDDPM.p_mean_variancec Cs||j|jf^}}}|j|||d\}}} t|j||} d|dkj|fdt|jd} || d| | S)N)rrrVrrrr)r-r(rrfloatreshaper`exp) r$rrrV repeat_noisebrr(rmodel_log_variancer nonzero_maskr#r#r&p_sampleCs *z DDPM.p_samplec Cs|jj}|d}tj||d}|g}tttd|jd|jdD]N}|j|tj |f||tj d|j d}||j dks||jdkr@| |q@|r||fS|S)Nrr( Sampling trr(rrr)rr(r)randnrreversedrrprrolongrVrWr)r$r-return_intermediatesr(rimg intermediatesrr#r#r& p_sample_loopLs  zDDPM.p_sample_loopcCs"|j}|j}|j||||f|dS)N)r)rYrZr)r$ batch_sizerrYrZr#r#r&sample[s z DDPM.samplecs:t|fdd}t|j|jt|j|j|S)Ncs tSrr) randn_liker#rr#r&czDDPM.q_sample..)rrrr-r)r$rrrr#rr&q_samplebsz DDPM.q_samplecCs(t|j||j|t|j||j|Srr)r$rrrr#r#r&get_vgsz DDPM.get_vcCsf|jdkr$||}|rb|}n>|jdkrZ|rDtjj||}qbtjjj||dd}ntd|S)Nl1r2none) reductionzunknown loss type '{loss_type}')rmabsrr)rr functionalmse_lossr)r$predtargetrlossr#r#r&get_lossms    z DDPM.get_lossc s t|fdd}|j||d}|||}i}|jdkrB|}n<|jdkrR}n,|jdkrl|||}ntd|jd|j||d d jd d d gd}|jrdnd} | | d|i||j } |j ||} | | d| i| |j | }| | d|i||fS)Ncs tSrrr#rr#r&r}rzDDPM.p_losses..rrrr<r=r>zParameterization z not yet supportedFrrrr5dimtrainval /loss_simple /loss_vlb/loss) rrr]rQrrrrtrainingupdatergrrf) r$rrrx_noisyr loss_dictr r Z log_prefix loss_simpleloss_vlbr#rr&p_losses|s(    z DDPM.p_lossescOs6tjd|j|jdf|jd}|j||f||S)Nrr)r)randintrpr-r(rr)r$rargskwargsrr#r#r&forwards"z DDPM.forwardcCs ||}|Srr#)r$batchrrr#r#r& get_inputszDDPM.get_inputcCs"|||j}||\}}||fSrr$rX)r$r#rr rr#r#r& shared_steps zDDPM.shared_stepcCsl||\}}|j|ddddd|jd|jddddd|jrh|jdd}|jd|ddddd|S)NTprog_barloggeron_stepon_epoch global_stepFrlrZlr_abs)r&log_dictrr,rc optimizers param_groups)r$r# batch_idxr rr-r#r#r& training_steps  zDDPM.training_stepc sn||\}}|&||\}fddDW5QRX|j|ddddd|jddddddS)Ncsi|]}|d|qS)Z_emar#rkeyZ loss_dict_emar#r& sz(DDPM.validation_step..FTr')r&rr.)r$r#r1rZloss_dict_no_emar#r5r&validation_steps  zDDPM.validation_stepcOs|jr||jdSr)r^r_r])r$r r!r#r#r&on_train_batch_endszDDPM.on_train_batch_endcCs,t|}t|d}t|d}t||d}|S)Nn b c h w -> b n c h wb n c h w -> (b n) c h wnrow)r`rr )r$samplesn_imgs_per_row denoise_gridr#r#r&_get_rows_from_lists    zDDPM._get_rows_from_listrc sjt|||j}t|jd|}t|jd|}||jd|}|d<t}|d|} t|j D]j} | |j dks| |j dkrlt t | gd|d} | |j} t | } |j| | | d} || ql||d<|r*|d|j|d d \} }W5QRX| d <||d <|rftt|jddkrTSfd d|DSS)Nrinputsr1 -> brr diffusion_rowZPlottingT)rrr= denoise_rowcsi|]}||qSr#r#r3rr#r&r6sz#DDPM.log_images..)rtr$rXminr-tor(rarrprWrr)rrrrrr@rrrv intersect1dr)r$r#Nn_rowr return_keysr!rrErrrrr=rFr#rGr& log_imagess4     zDDPM.log_imagescCs:|j}t|j}|jr&||jg}tjj||d}|S)Nr-) learning_raterar]rrnrqr)optimAdamW)r$r-paramsoptr#r#r&configure_optimizerss  zDDPM.configure_optimizers)Nr1r0r7r8r9)N)TF)F)rF)N)T)N)rArTN)#rT __module__ __qualname__rOrlrrr)no_gradrarjrrrrrboolrrrrrrrrr"r$r&r2r7r8r@rNrU __classcell__r#r#r}r&r/.sX 8 >           %r/cseZdZdZdTfdd Zd d ZdUfdd ZddZddZdVddZ ddZ ddZ ddZ dd Z d!d"ZdWd$d%ZedXfd&d' ZedYd(d)Zed*d+Zd,d-Zd.d/Zd0d1ZdZd2d3Zd4d5Zd6d7Zd[d8d9Zd\ed:d;d<Zed]d>d?Zed^d@dAZed_dBdCZ ed`dEdFZ!edGdHZ"edadIdJZ#edbdNdOZ$dPdQZ%edRdSZ&Z'S)cLatentDiffusionz main classNr3FTr;c s|t|d|_| |_|j| dks$t|dkr8|r4dnd}|dkrDd}| dd} | dd}| d d}| d g}tj| d |i| ||_||_||_ zt |j j j d|_Wnd |_YnX| s| |_n|d t| ||||||_d|_d|_d|_| dk rT|| |d|_|rT|js@ttdt|j|_|rxtd|jsnt|jdS)NrrGr r!__is_unconditional__ryr{Fr|rArzr scale_factorTrCrD) rnum_timesteps_cond scale_by_stdrPpoprNrO concat_modecond_stage_trainablecond_stage_keyr`rSZddconfigZch_multZ num_downsr]rr)rinstantiate_first_stageinstantiate_cond_stagecond_stage_forwardrVZbbox_tokenizerZrestarted_from_ckptrjr^rRrr]r_rk)r$Zfirst_stage_configZcond_stage_configr^rcrbrarfrzr]r_r r!ryr{r|rAr}r#r&rOsR              zLatentDiffusion.__init__cCsRtj|jf|jdtjd|_ttd|jd|j}||jd|j<dS)Nr)rLrKrr)r)rorprcond_idsroundlinspacer^)r$idsr#r#r&make_cond_schedule;s z"LatentDiffusion.make_cond_scheduler1r0r7r8r9cs4t|||||||jdk|_|jr0|dS)Nr)rNrlr^shorten_cond_schedulerk)r$rErFrGrHrIrJr}r#r&rl@s z!LatentDiffusion.register_schedulecCs4t|}||_t|j_|jD] }d|_q$dS)NF)revalfirst_stage_modelr'rrrMr$configr]rr#r#r&rdIs  z'LatentDiffusion.instantiate_first_stagecCs|jsv|dkr td|j|_q|dkrDtd|jjdd|_qt|}||_t|j_ |j D] }d|_ qhn&|dkst |dkst t|}||_dS)N__is_first_stage__z%Using first stage also as cond stage.r\z Training z as an unconditional model.F) rbrRrnrUrSrTrrmr'rrrMrPror#r#r&rePs      z&LatentDiffusion.instantiate_cond_stagecCshg}t||dD] }||j||j|dqt|}t|}t|d}t|d}t ||d}|S)Nr)force_not_quantizer9r:r;) rrdecode_first_stagerIr(r`r)stackrr )r$r=rZforce_no_decoder_quantizationrFZzdr>r?r#r#r&_get_denoise_row_from_listes     z*LatentDiffusion._get_denoise_row_from_listcCsDt|tr|}n&t|tjr&|}ntdt|d|j|S)Nzencoder_posterior of type 'z' not yet implemented) isinstancerrr)rrtyper])r$encoder_posteriorzr#r#r&get_first_stage_encodingqs    z(LatentDiffusion.get_first_stage_encodingcCsv|jdkrNt|jdrBt|jjrB|j|}t|trL|}qr||}n$t|j|js`tt |j|j|}|S)Nencode) rfhasattrrUcallabler}rxrr%rPgetattr)r$cr#r#r&get_learned_conditioningzs     z(LatentDiffusion.get_learned_conditioningcCsVtd||ddd|d}td|d|d|dd}tj||gdd}|S)Nrrrr)r)arangeviewrcat)r$hwrrarrr#r#r&meshgrids  zLatentDiffusion.meshgridcCst|d|dgddd}||||}tj|dddd}tjd|dddd}tjtj||gddddd}|S)z :param h: height :param w: width :return: normalized distance to image border, wtith min distance = 0 at border and max dist = 0.5 at image center rrrT)rkeepdimsrr)r)rrrrHr)r$rrZlower_right_cornerrZ dist_left_upZdist_right_downZ edge_distr#r#r& delta_borders   zLatentDiffusion.delta_bordercCs|||}t||jd|jd}|d||ddd|||}|jdr|||}t||jd|jd}|dd|||}||}|S)NZclip_min_weightZclip_max_weightrZ tie_brakerZclip_min_tie_weightZclip_max_tie_weight)rr)clipZsplit_input_paramsrrrI)r$rrLyLxr( weightingZ L_weightingr#r#r& get_weightings &  zLatentDiffusion.get_weightingrcCs|j\}}}} ||d|dd} | |d|dd} |dkr|dkrt|dd|d} tjjf| } tjjfd|jddi| }||d|d| | |j|j }|| dd|| }| dd|d|d| | f}n|dkr|dkrt|dd|d} tjjf| } t|d||d|fdd|d||d|fd}tjjfd|jd||jd|fi|}||d||d|| | |j|j }|| dd||| |}| dd|d||d|| | f}n|dkr|dkrt|dd|d} tjjf| } t|d||d|fdd|d||d|fd}tjjfd|jd||jd|fi|}||d||d|| | |j|j }|| dd||| |}| dd|d||d|| | f}nt || ||fS)z :param x: img of size (bs, c, h, w) :return: n img crops of size (n, bs, c, kernel_size[0], kernel_size[1]) rr) kernel_sizedilationpaddingstride output_sizerNr5) r-rtr)rrUnfoldFoldrr(rIrrr)r$rrrufdfbsncrrrrZ fold_paramsunfoldfoldr normalizationZ fold_params2r#r#r&get_fold_unfoldsH $$.,,.,*zLatentDiffusion.get_fold_unfoldcst||}|dk r"|d|}||j}||} || } |jjdk r2|dkrb|j }||j kr|dkr~||} q|dkr|} qt|||j} n|} |j r|rt | t st | tr|| } q|| |j} n| } |dk r| d|} |jrZ||\} }t|jj}|| d| d|i} n(d} d} |jrZ||\} }| |d} | | g}|r|| }|||g|r|| |S)N)captionZcoordinates_bboxtxt class_labelclspos_xpos_y)rr)rNr$rIr(encode_first_stager|detachr]rzrcrXrbrxrtrarr[Zcompute_latent_shifts__conditioning_keys__ruextendr)r$r#rreturn_first_stage_outputsforce_c_encodeZcond_keyreturn_original_condrrrzr{xcrrrZckeyoutxrecr}r#r&r$sN            zLatentDiffusion.get_inputcCs`|rF|dkr&tj|dd}|jjj|dd}t|d }d|j |}|j |S)Nrr)r-zb h w c -> b c h wr;) rr)argmaxrrrnquantizeZget_codebook_entryr contiguousr]decode)r$r{Z predict_cidsrtr#r#r&rus z"LatentDiffusion.decode_first_stagecCs |j|Sr)rnr}r$rr#r#r&rsz"LatentDiffusion.encode_first_stagecKs$|||j\}}|||f|}|Srr%)r$r#r!rrr r#r#r&r&#szLatentDiffusion.shared_stepcCs|dkr&tjd|j|f|jd}nx|dkrdtj|f|jd}ttjd||j}|}n:|dkrtj|f|jd}d|d|j}|}nttj |d|jdd }|S) Nr1rrcosinercubicrr5)rHmax) r)rrpr(rr*cospirclamp)r$ schedulerrrr#r#r&get_time_with_schedule(s  z&LatentDiffusion.get_time_with_schedulecOsLd|kr,tjd|j|jdf|jd}n |d}|j|||f||S)Nrrr)r)rrpr-r(rr`r)r$rrr r!rr#r#r&r"8s$ zLatentDiffusion.forwardcKsjt|tr n,t|ts|g}|jjdkr,dnd}||i}|j||f||}t|trb|sb|dS|SdS)Nr rrr)rxrtrar]rztuple)r$rrcond return_idsr!r4rr#r#r& apply_model@s  zLatentDiffusion.apply_modelcCs(t|j||j||t|j||jSrr)r$rrZ pred_xstartr#r#r&_predict_eps_from_xstartQsz(LatentDiffusion._predict_eps_from_xstartcCsZ|jd}tj|jdg||jd}|||\}}}t||ddd}t|t dS)a; Get the prior KL term for the variational lower-bound, measured in bits-per-dim. This term can't be optimized, as it only depends on the encoder. :param x_start: the [N x C x ...] tensor of inputs. :return: a batch of [N] KL values (in bits), one per batch element. rrrr:)mean1Zlogvar1mean2Zlogvar2r) r-r)rrpr(rrrrvr)r$rrrZqt_meanrZqt_log_varianceZkl_priorr#r#r& _prior_bpdUs  zLatentDiffusion._prior_bpdc st|fdd}|j||d}|j|||f|}i}|jrBdnd} |jdkrV} n0|jdkrf|} n |jdkr|||} nt|j|| d d d d d g} | | d| i|j | |j } | t | | } |jr| | d| i| d|j ji|j| } |j|| d d jdd}|j||}| | d|i| |j|7} | | d| i| |fS)Ncs tSrrr#rr#r&rdrz*LatentDiffusion.p_losses..rrrr=r<r>Frrrr5rz /loss_gammarq)rrr5rrr)rrrrrQrrrrrrqrIr(r)rrndatargrrf)r$rrrrr!r model_outputrprefixr rZlogvar_tr rr#rr&rcs4   zLatentDiffusion.p_lossesrc Cs|} |j|| ||d} |dk rB|jdks,t|j|| |||f| } |rN| \} } |jdkrj|j||| d} n|jdkrz| } nt|r| dd|r|j| \} }\}}}|j | ||d\}}}|r|||| fS|r|||| fS|||fSdS)N)rr<rr=rr;r) rrQrP modify_scorerrrrnrr)r$rrrrVreturn_codebook_idsquantize_denoised return_x0score_correctorcorrector_kwargst_inrlogitsrrindicesrrrr#r#r&rs,     zLatentDiffusion.p_mean_variancer:c  Cs$|j|jf^} }}|j|||||||| | d }|rLtd|\}}}}n|r^|\}}}}n |\}}}t|j||| }| dkrtjjj|| d}d|dk j | fdt |jd}|r||d| ||j dd fS|r||d| ||fS||d| |SdS) N) rrrrVrrrrrzSupport dropped.r:)prrrrr)r-r(rDeprecationWarningrr)rrr dropoutrrr`rr)r$rrrrVrrrr temperature noise_dropoutrrrrr(outputsrrrr=rrr#r#r&rs.  *$zLatentDiffusion.p_samplec s"|s |j}|j}dk r<dk r$n|d}gt|}n |d}|dkrbtj||jd}n|}g}dk rttrfddDn(ttrfddDn d|dk rt||}|rt t t d|d|dn t t d|}t | t kr| g|} |D]}tj|f||jtjd }|jrr|jjd ksJt|j|j}|j|td |j|||j|d | || | | d \}}|dk r|dk st|||}||d||}||dks||dkr|||r|||r|||q||fS)NrrcsFi|]>}|t|ts(|dnttfdd|qS)Ncs |dSrr#rrr#r&rrzBLatentDiffusion.progressive_denoising...rxramapr3rrr#r&r6sz9LatentDiffusion.progressive_denoising..csg|]}|dqSrr#rrrr#r&rsz9LatentDiffusion.progressive_denoising..Progressive GenerationrrhybridrT)rVrrrrrrr;r)rWrprar)rr(rxrtrHrrrryrrorrlr]rzrPrgrIrrrrVr)r$rr-r?callbackr img_callbackmaskr=rrrrrx_Tstart_TrWrGrrriteratorrtstcZ x0_partialimg_origr#rr&progressive_denoisingsn   (        z%LatentDiffusion.progressive_denoisingcCs| s |j} |jj}|d}|dkr2tj||d}n|}|g}|dkrJ|j}| dk r\t|| }|rxttt d|d|dn tt d|}| dk r| dk st | j dd| j ddkst |D]}tj |f||tj d}|jr|jjdkst |j||j}|j||t|d }|j||||j|d }| dk rX|| |}|| d | |}|| dkst||d kr~|||r||| r| ||q|r||fS|S) Nrrrrrr5rrr)rVrr;r)rWrr(r)rrprHrrrrPr-rorrlr]rzrgrIrrrrVr)r$rr-rrr?rrGrrr=rrrWr(rrrrrrrrr#r#r&rsP        zLatentDiffusion.p_sample_looprc s| dkr|j|j|jf} dk rjttrBfddDn(ttr^fddDn d|j| ||||||| d S)NcsFi|]>}|t|ts(|dnttfdd|qS)Ncs |dSrr#rrr#r&r:rz3LatentDiffusion.sample...rr3rr#r&r69sz*LatentDiffusion.sample..csg|]}|dqSrr#rrr#r&r<sz*LatentDiffusion.sample..)rrr?rGrrr=)rZrYrxrtrar) r$rrrrr?rGrrr=r-r!r#rr&r1s$  (zLatentDiffusion.samplec Ksb|r>t|}|j|j|jf}|j||||fddi|\}} n|jf||dd|\}} || fS)Nr?FT)rrr)rrZrYr) r$rrddim ddim_stepsr!Z ddim_samplerr-r=rr#r#r& sample_logCs"   zLatentDiffusion.sample_logcCs|dk r`|}t|trt|}t|ts2t|tr>||}qt|drT||j}||}n.|jdkr|j j ||jd}||St dt|trt t |D]"}t||d|d|j||<qnt|d|d|j}|S)NrIrrtodoz1 ... -> b ...rD)rxr rartrr~rIr(rcrUget_unconditional_conditioningrrr`r)r$rZ null_labelrrrr#r#r&rQs$        "z.LatentDiffusion.get_unconditional_conditioningrAr2c- sD|r |jnt}|dk }t|j||jddd|d\}}}}}t|jd|}t|jd|}|d<|d<|jjdk rht |j dr|j |}|d<n|j dkrt |jd |jd f||j |jd d d }|d<np|j d kr>z8t |jd |jd f|d|jd d d }|d<Wntk r:YnXnt|rP|d<t|rh||d<| r:t}|d|}t|jD]v}||jdks||jdkrtt|gd|d}||j}t|}|j|||d}|||qt |}t!|d}t!|d}t"||jdd}|d<|r|d|j#|||||d\}}W5QRX||} | d<| r|$|}!|!d<|rt%|j&t'st%|j&t(s|d |j#|||||dd\}}W5QRX|||j} | d<| dkr~|)||}"|jjd kr4|"g|d!d"}"|d#<|j#|||||| |"d$\}#}$||#}%|%d%| d&<W5QRX| r|jd|jd |jd }&}'}(t*||'|(|j})d'|)dd|'d(d |'d(|(d(d |(d(f<|)dddd)f})|d**|j#||||||d||)d+\}}$W5QRX|||j} | d,<|)d-<d|)})|d.*|j#||||||d||)d+\}}$W5QRX|||j} | d/<| r|d0&|j+||j,|j-|j-f|d1\}*}+W5QRX|j$|+d2d3},|,d4<|r@t./t0|jddkr.Sfd5d6|DSS)7NT)rrrrrrBreconstructionr conditioning)rrrr5)rLrZ human_labelZoriginal_conditioningrrCrDrr9r:r;rEZSampling)rrrretar=rFzPlotting Quantized Denoised)rrrrrrZsamples_x0_quantizedr; crossattn-admc_adm)rrz&Sampling with classifier-free guidance)rrrrrunconditional_guidance_scaleunconditional_conditioningZsamples_cfg_scale_z.2fr:r.zPlotting Inpaint)rrrrrr=rZsamples_inpaintingrzPlotting OutpaintZsamples_outpaintingzPlotting Progressives)r-rrrsZprogressive_rowcsi|]}||qSr#r#r3rGr#r&r6sz.LatentDiffusion.log_images..)1rrrtr$rXrHr-r]rzr~rUrrcr KeyErrorrrto_rgbrarrprWrr)rrIr(rrrrrurvrr rrwrxrnrrrrrrZrYrvrJr)-r$r#rKrLrrddim_etarMrinpaintZplot_denoise_rowsZplot_progressive_rowsZplot_diffusion_rowsrZunconditional_guidance_labelZ use_ema_scoper!rZuse_ddimr{rrrrrEZz_startrrZz_noisyZdiffusion_gridr=Z z_denoise_row x_samplesr?ucZ samples_cfgrZ x_samples_cfgrrrrrZ progressivesZprog_rowr#rGr&rNjs     .  ,                     "2        zLatentDiffusion.log_imagescCs|j}t|j}|jr>t|jjd|t|j}|j rXtd| |j t j j||d}|jrd|jks|tt|j}tdt||jdddd g}|g|fS|S) Nz%: Also optimizing conditioner params!z!Diffusion model optimizing logvarrOr z Setting up LambdaLR scheduler...) lr_lambdastepr)rinterval frequency)rPrar]rrbrRrSrTrUrnrrqr)rQrRrcrdrPrrschedule)r$r-rSrTrr#r#r&rUs(    z$LatentDiffusion.configure_optimizerscCsj|}t|ds0td|jddd||_tjj ||jd}d|| | | d}|S)Ncolorizer5r)weightrr;) rr~r)rr-rIrrrr conv2drHrrr#r#r&rs  $zLatentDiffusion.to_rgb)Nr3FTNNr;F)Nr1r0r7r8r9)rrF)rr)FFNFN)FF)F)N)FFFNN) FFFFFr;r:NN)TNFNNNr;r:NNNNNN) FNTNNFNNNNN) rFNTNFNNN)N)rArTrr:NTTFTTr;NT)(rTrVrW__doc__rOrkrlrdrerwr|rrrrrr)rXr$rurr&rr"rrrrrYrrrrrrrrNrUrrZr#r#r}r&r[s8     43    %  7 2    r[cs.eZdZfddZdeedddZZS)r\cs,tt||_||_|jdks(tdS)N)Nr r!rr" hybrid-admr)rNrOrdiffusion_modelrzrP)r$Zdiff_model_configrzr}r#r&rOs  zDiffusionWrapper.__init__N)rrc Ks|jdkr|j||f|}nr|jdkrPtj|g|dd}|j||f|}n@|jdkrt|d} |j||fd| i|}n|jdkrtj|g|dd}t|d} |j||fd| i|}n|jdkr|dk sttj|g|dd}t|d} |j||f| |d|}nv|jd kr\|dk s4tt|d} |j||f| |d|}n4|jd kr|d } |j||fd | i|}nt|S) Nr rrr!rrr)rrrr"rr)rzr r)rrPr) r$rrrrrr!rrccr#r#r&r"s4            zDiffusionWrapper.forward)NNN)rTrVrWrOrar"rZr#r#r}r&r\s r\)T)8rr)torch.nnrrnumpyrvpytorch_lightningplZtorch.optim.lr_schedulerreinopsrr contextlibrr functoolsrrrZtorchvision.utilsr 'pytorch_lightning.utilities.distributedr omegaconfr ldm.utilr r rrrrrrZldm.modules.emarZ'ldm.modules.distributions.distributionsrrZldm.models.autoencoderrr!ldm.modules.diffusionmodules.utilrrrldm.models.diffusion.ddimrrr'r.LightningModuler/r[r\r#r#r#r&sJ       (   U