U b/@sdZddlZddlZddlZddlZddlmZddl m Z m Z ddZ dd d Z Gd d d ejZGd ddejZGdddejZGdddZddZGdddZdS)a! This code started out as a PyTorch port of Ho et al's diffusion models: https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py Docstrings have been added, as well as DDIM sampling and a new collection of beta schedules. N) mean_flat) normal_klnll_l1cCs\|dkr4d|}|d}|d}tj|||tjdS|dkrJt|ddStd |d S) a@ Get a pre-defined beta schedule for the given name. The beta schedule library consists of beta schedules which remain similar in the limit of num_diffusion_timesteps. Beta schedules may be added, but should not be removed or changed once they are committed to maintain backwards compatibility. linearig-C6?g{Gz?dtypeZcosinecSs t|ddtjddS)NgMb?gT㥛 ?)mathcospi)tr=/home/zsyue/code/python/GradDiff/models/gaussian_diffusion.py&z)get_named_beta_schedule..zunknown beta schedule: N)nplinspacefloat64betas_for_alpha_barNotImplementedError)Z schedule_namenum_diffusion_timestepsscaleZ beta_startZbeta_endrrrget_named_beta_schedules r+?cCsPg}t|D]8}||}|d|}|td|||||q t|S)a$ Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of (1-beta) over time from t = [0,1]. :param num_diffusion_timesteps: the number of betas to produce. :param alpha_bar: a lambda that takes an argument t from 0 to 1 and produces the cumulative product of (1-beta) up to that part of the diffusion process. :param max_beta: the maximum beta to use; use values lower than 1 to prevent singularities. r)rangeappendminrarray)r alpha_barZmax_betabetasit1t2rrrr+s   "rc@s(eZdZdZeZeZeZdS) ModelMeanTypez2 Which type of output the model predicts. N) __name__ __module__ __qualname____doc__enumauto PREVIOUS_XSTART_XEPSILONrrrrr$>sr$c@s0eZdZdZeZeZeZeZ dS) ModelVarTypez What is used as the model's output variance. The LEARNED_RANGE option has been added to allow the model to predict values between FIXED_SMALL and FIXED_LARGE, making its job easier. N) r%r&r'r(r)r*LEARNED FIXED_SMALL FIXED_LARGE LEARNED_RANGErrrrr.Gs r.c@s4eZdZeZeZeZeZddZ dS)LossTypecCs|tjkp|tjkSN)r3KL RESCALED_KL)selfrrris_vb\szLossType.is_vbN) r%r&r'r)r*MSE RESCALED_MSEr5r6r8rrrrr3Ts r3c@seZdZdZddddZddZd/d d Zd d Zd0ddZddZ ddZ ddZ ddZ d1ddZ d2ddZd3ddZd4dd Zd5d!d"Zd6d#d$Zd7d%d&Zd8d'd(Zd9d)d*Zd+d,Zd:d-d.ZdS);GaussianDiffusionMineO Utilities for training and sampling diffusion models. Ported directly from here, and then adapted over time to further experimentation. https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py#L42 :param betas: a 1-D numpy array of betas for each diffusion timestep, starting at T and going to 1. :param model_mean_type: a ModelMeanType determining what the model outputs. :param model_var_type: a ModelVarType determining how variance is output. :param loss_type: a LossType determining the loss function to use. :param rescale_timesteps: if True, pass floating point timesteps into the model so that they are always scaled like in the original paper (0 to 1000). Frescale_timestepscCs||_||_||_||_tj|tjd}||_t|j dksDt d|dk r\|dk s`t t |j d|_ d|}tj|dd|_td|jdd|_t|jddd|_|jj |j fkst t|j|_td|j|_td|j|_td|j|_td|jd|_|d|jd|j|_tt|jd|jdd|_|t|jd|j|_d|jt|d|j|_dS Nrrzbetas must be 1-Dr?)axismodel_mean_typemodel_var_type loss_typer>rrrr lenshapeAssertionErrorallint num_timestepscumprodalphas_cumprodralphas_cumprod_prevalphas_cumprod_nextsqrtsqrt_alphas_cumprodsqrt_one_minus_alphas_cumprodloglog_one_minus_alphas_cumprodsqrt_recip_alphas_cumprodsqrt_recipm1_alphas_cumprodposterior_varianceposterior_log_variance_clippedposterior_mean_coef1posterior_mean_coef2r7r rErFrGr>alphasrrr__init__ps@ zGaussianDiffusionMine.__init__cCsBt|j||j|}td|j||j}t|j||j}|||fSa Get the distribution q(x_t | x_0). :param x_start: the [N x C x ...] tensor of noiseless inputs. :param t: the number of diffusion steps (minus 1). Here, 0 means one step. :return: A tuple (mean, variance, log_variance), all of x_start's shape. r@_extract_into_tensorrSrIrOrVr7x_startr meanvariance log_variancerrrq_mean_variances z%GaussianDiffusionMine.q_mean_varianceNcCsJ|dkrt|}|j|jks"tt|j||j|t|j||j|Sam Diffuse the data for a given number of diffusion steps. In other words, sample from q(x_t | x_0). :param x_start: the initial data batch. :param t: the number of diffusion steps (minus 1). Here, 0 means one step. :param noise: if specified, the split-out normal noise. :return: A noisy version of x_start. Nth randn_likerIrJrbrSrTr7rdr noiserrrq_samples  zGaussianDiffusionMine.q_samplecCs|j|jkstt|j||j|t|j||j|}t|j||j}t|j||j}|jd|jdkr|jdkr|jdksnt|||fSzo Compute the mean and variance of the diffusion posterior: q(x_{t-1} | x_t, x_0) rrIrJrbr[r\rYrZr7rdx_tr Zposterior_meanrYrZrrrq_posterior_mean_variances& z/GaussianDiffusionMine.q_posterior_mean_varianceTc sJ|dkr i}|jdd\}}|j|fks.t||||f|} |jtjtjfkr| j||df|jddksxttj| |dd\} } |jtjkr| } t | } nPt |j ||j} t t |j||j}| dd}||d|| } t | } nxtjt |jd|jddt t |jd|jddftj|j|j fi|j\} } t | ||j} t | ||j} fdd}|jtjkr||j||| d}| }nb|jtjtjfkr|jtjkr|| }n||j||| d}|j|||d \}}}n t|j|j| jkr6|jkr6|jks.process_xstartrsr xprevrsr epsrdrsr rerfrg pred_xstartrIrJ_scale_timestepsrFr.r/r2rksplitexprbrZrrUr r1rrYr0rEr$r+_predict_xstart_from_xprevr,r-_predict_xstart_from_epsrtrr7modelr|r r~r model_kwargsBC model_outputmodel_var_valuesZmodel_log_varianceZmodel_varianceZmin_logZmax_logfracrr model_mean_rr}rp_mean_variances~$           z%GaussianDiffusionMine.p_mean_variancecCs8|j|jkstt|j||j|t|j||j|Sr4rIrJrbrWrXr7rsr rrrrrBs z.GaussianDiffusionMine._predict_xstart_from_epscCsB|j|jksttd|j||j|t|j|j||j|SNr@rIrJrbr[r\r7rsr rrrrrIs z0GaussianDiffusionMine._predict_xstart_from_xprevcCs(t|j||j||t|j||jSr4rbrWrIrXr7rsr rrrr_predict_eps_from_xstartSs z.GaussianDiffusionMine._predict_eps_from_xstartcCs|jr|d|jS|SN@@r>floatrMr7r rrrrYsz&GaussianDiffusionMine._scale_timestepsc Csv|j||||||d}t|}|dkjd dgt|jd} |d| td|d|} | |dd S a Sample x_{t-1} from the model at the given timestep. :param model: the model to sample from. :param x: the current tensor at x_{t-1}. :param t: the value of t, starting at 0 for the first diffusion step. :param clip_denoised: if True, clip the x_start prediction to [-1, 1]. :param denoised_fn: if not None, a function which applies to the x_start prediction before it is used to sample. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :return: a dict containing the following keys: - 'sample': a random sample from the model. - 'pred_xstart': a prediction of x_0. r~rrrrBrre?rgrsampler)rBrrkrlrviewrHrIr r7rr|r r~rroutrn nonzero_maskrrrrp_sample^s $"zGaussianDiffusionMine.p_samplec Cs0d} |j||||||||dD]} | } q| dSa Generate samples from the model. :param model: the model module. :param shape: the shape of the samples, (N, C, H, W). :param noise: if specified, the noise from the encoder to sample. Should be of the same shape as `shape`. :param clip_denoised: if True, clip x_start predictions to [-1, 1]. :param denoised_fn: if not None, a function which applies to the x_start prediction before it is used to sample. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :param device: if specified, the device to create the samples on. If not specified, use a model parameter's device. :param progress: if True, show a tqdm progress bar. :return: a non-differentiable batch of samples. N)rnr~rrdeviceprogressrp_sample_loop_progressive r7rrIrnr~rrrrfinalrrrr p_sample_loops z#GaussianDiffusionMine.p_sample_loopc  cs|dkrt|j}t|ttfs(t|dk r6|} ntj|d|i} tt |j ddd} |rvddl m } | | } | D]T} tj | g|d|d} t*|j|| | |||d}|V|d} W5QRXqzdS a Generate samples from the model and yield intermediate samples from each timestep of diffusion. Arguments are the same as p_sample_loop(). Returns a generator over dicts, where each dict is the return value of p_sample(). NrrBrtqdmrrrnext parametersr isinstancetuplelistrJrkrandnrrM tqdm.autortensorno_gradrr7rrIrnr~rrrrimgindicesrr!r rrrrrs.  z/GaussianDiffusionMine.p_sample_loop_progressiverCcCs|j||||||d}||||d} t|j||j} t|j||j} |td| d| td| | } t|} |dt| td| | d| }|dk j ddgt |jd}||| | }||ddS z^ Sample x_{t-1} from the model using DDIM. Same usage as p_sample(). rrrr rrBr)rB rrrbrOrIrPrkrRrlrrrHr7rr|r r~rretarrrZalpha_bar_prevsigmarn mean_predrrrrr ddim_samples4  $z!GaussianDiffusionMine.ddim_samplec Cs|dkstd|j||||||d}t|j||j||dt|j||j} t|j||j} |dt| td| | } | |ddSzG Sample x_{t+1} from the model using DDIM reverse ODE. rCz'Reverse ODE only for deterministic pathrrrr rJrrbrWrIrXrQrkrR r7rr|r r~rrrrrZalpha_bar_nextrrrrddim_reverse_samples(  z)GaussianDiffusionMine.ddim_reverse_samplec Cs2d} |j||||||||| d D]} | } q | dSze Generate samples from the model using DDIM. Same usage as p_sample_loop(). N)rnr~rrrrrrddim_sample_loop_progressive r7rrIrnr~rrrrrrrrrrddim_sample_loop,s z&GaussianDiffusionMine.ddim_sample_loopc  cs|dkrt|j}t|ttfs(t|dk r6|} ntj|d|i} tt |j ddd} |rvddl m } | | } | D]V} tj | g|d|d}t,|j|| ||||| d}|V|d} W5QRXqzdS z Use DDIM to sample from the model and yield intermediate samples from each timestep of DDIM. Same usage as p_sample_loop_progressive(). NrrBrrr)r~rrrrrrrrrrrJrkrrrMrrrrrr7rrIrnr~rrrrrrrrr!r rrrrrLs0   z2GaussianDiffusionMine.ddim_sample_loop_progressivecCs|j|||d\}}} |j|||||d} t|| | d| d} t| td} |jtjksj|jtj krzt | d|} n t | } t |dk| | } | | ddS) i Get a term for the variational lower-bound. The resulting units are bits (rather than nats, as one might expect). This allows for comparison to other papers. :return: a dict with the following keys: - 'output': a shape [N] tensor of NLLs or KLs. - 'pred_xstart': the x_0 predictions. rr~rrerg@rroutputr)rtrrrrrUrGr3r5r6rrk zeros_likewherer7rrdrsr r~rZ true_meanrZtrue_log_variance_clippedrklZ decoder_nllrrrr _vb_terms_bpd|s0   z#GaussianDiffusionMine._vb_terms_bpdcCs|dkr i}|dkrt|}|j|||d}i}|jtjksJ|jtjkr|j||||d|dd|d<|jtjkr|d|j9<n(|jtj ks|jtj kr||| |f|}|j t jt jfkrx|jdd\} } |j| | df|jddksttj|| dd \}} tj|| gdd } |j| d d d |||dd d|d<|jtj krx|d|jd9<tj|j|||ddtj|tj|i|j} |j| jkr|jksntt| |d}|jtjkr|j|||d}|}nZ|jtjtjfkrL|jtjkr$|}n|j|||d}|j|t|d\}}}n t|jtt ||}t!|dk|||d<d|kr|d|d|d<n |d|d<n t|j|S)\ Compute training losses for a single timestep. :param model: the model to evaluate loss on. :param x_start: the [N x C x ...] tensor of inputs. :param t: a batch of timestep indices. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :param noise: if specified, the specific Gaussian noise to try to remove. :return: a dict with the key "loss" containing a tensor of shape [N]. Some mean or variance settings may also have other keys. NrnFrrdrsr r~rrlossr rrvrcWs|Sr4rrargsrrrrrz7GaussianDiffusionMine.training_losses..rrdrsr r~vbrrrrrmse)"rkrlrorGr3r5r6rrMr9r:rrFr.r/r2rIrJrcatdetachr$r+rtr,r-rErrrr|rrr)r7rrdr rrnrstermsrrrr frozen_outtargetZ terms_othersrrrZterms_t0rrrtraining_lossess   & $   z%GaussianDiffusionMine.training_lossescCsZ|jd}tj|jdg||jd}|||\}}}t||ddd}t|t dSa= Get the prior KL term for the variational lower-bound, measured in bits-per-dim. This term can't be optimized, as it only depends on the encoder. :param x_start: the [N x C x ...] tensor of inputs. :return: a batch of [N] KL values (in bits), one per batch element. rrrrC)Zmean1Zlogvar1Zmean2Zlogvar2r rIrkrrMrrhrrrrUr7rd batch_sizer Zqt_meanrZqt_log_varianceZkl_priorrrr _prior_bpds z GaussianDiffusionMine._prior_bpdc Cs6|j}|jd}g}g}g} tt|jdddD]} tj| g||d} t|} |j|| | d} t |j ||| | ||d}W5QRX| |d| t |d|d | | | |d}| t || d q4tj|d d }tj|d d }tj| d d } ||}|jd d |}||||| d S au Compute the entire variational lower-bound, measured in bits-per-dim, as well as other related quantities. :param model: the model to evaluate loss on. :param x_start: the [N x C x ...] tensor of inputs. :param clip_denoised: if True, clip denoised samples. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :return: a dict containing the following keys: - total_bpd: the total variational lower-bound, per batch element. - prior_bpd: the prior term in the lower-bound. - vb: an [N x T] tensor of terms in the lower-bound. - xstart_mse: an [N x T] tensor of x_0 MSEs for each timestep. - mse: an [N x T] tensor of epsilon MSEs for each timestep. rNrBr)rdr rn)rdrsr r~rrrr rrv) total_bpd prior_bpdr xstart_mserrrIrrrMrkrrlrorrrrrstackrsumr7rrdr~rrrrrrr Zt_batchrnrsrrrrrrr calc_bpd_loop sB    z#GaussianDiffusionMine.calc_bpd_loop)N)TNN)TNN)NTNNNF)NTNNNF)TNNrC)TNNrC)NTNNNFrC)NTNNNFrC)TN)NN)TNr%r&r'r(r_rhrortrrrrrrrrrrrrrrrrrrrrr;_s 5  `  % . 5 2 * $ 1 " Yr;cCsBt|j|jd|}t|jt|kr8|d}q||S)a Extract values from a 1-D numpy array for a batch of indices. :param arr: the 1-D numpy array. :param timesteps: a tensor of indices into the array to extract. :param broadcast_shape: a larger shape of K dimensions with the batch dimension equal to the length of timesteps. :return: a tensor of shape [batch_size, 1, ...] where the shape has K dims. r).N)rk from_numpytorrrHrIexpand)arr timestepsbroadcast_shaperesrrrrbBs  rbc@seZdZdZddddZddZd/d d Zd d Zd0ddZddZ ddZ ddZ ddZ d1ddZ d2ddZd3ddZd4dd Zd5d!d"Zd6d#d$Zd7d%d&Zd8d'd(Zd9d)d*Zd+d,Zd:d-d.ZdS);GaussianDiffusionr<Fr=cCs||_||_||_||_tj|tjd}||_t|j dksDt d|dk r\|dk s`t t |j d|_ d|}tj|dd|_td|jdd|_t|jddd|_|jj |j fkst t|j|_td|j|_td|j|_td|j|_td|jd|_|d|jd|j|_tt|jd|jdd|_|t|jd|j|_d|jt|d|j|_dSr?rDr]rrrr_bs@ zGaussianDiffusion.__init__cCsBt|j||j|}td|j||j}t|j||j}|||fSr`rarcrrrrhs z!GaussianDiffusion.q_mean_varianceNcCsJ|dkrt|}|j|jks"tt|j||j|t|j||j|Srirjrmrrrros  zGaussianDiffusion.q_samplecCs|j|jkstt|j||j|t|j||j|}t|j||j}t|j||j}|jd|jdkr|jdkr|jdksnt|||fSrprqrrrrrrts& z+GaussianDiffusion.q_posterior_mean_varianceTc sJ|dkr i}|jdd\}}|j|fks.t||||f|} |jtjtjfkr| j||df|jddksxttj| |dd\} } |jtjkr| } t | } nPt |j ||j} t t |j||j}| dd}||d|| } t | } nxtjt |jd|jddt t |jd|jddftj|j|j fi|j\} } t | ||j} t | ||j} fdd}|jtjkr||j||| d}| }nb|jtjtjfkr|jtjkr|| }n||j||| d}|j|||d \}}}n t|j|j| jkr6|jkr6|jks.process_xstartrrrrrrrr}rrs~$           z!GaussianDiffusion.p_mean_variancecCs8|j|jkstt|j||j|t|j||j|Sr4rrrrrr4s z*GaussianDiffusion._predict_xstart_from_epscCsB|j|jksttd|j||j|t|j|j||j|Srrrrrrr;s z,GaussianDiffusion._predict_xstart_from_xprevcCs(t|j||j||t|j||jSr4rrrrrrEs z*GaussianDiffusion._predict_eps_from_xstartcCs|jr|d|jS|SrrrrrrrKsz"GaussianDiffusion._scale_timestepsc Csv|j||||||d}t|}|dkjd dgt|jd} |d| td|d|} | |dd SrrrrrrrPs $"zGaussianDiffusion.p_samplec Cs0d} |j||||||||dD]} | } q| dSrrrrrrrqs zGaussianDiffusion.p_sample_loopc  cs|dkrt|j}t|ttfs(t|dk r6|} ntj|d|i} tt |j ddd} |rvddl m } | | } | D]T} tj | g|d|d} t*|j|| | |||d}|V|d} W5QRXqzdSrrrrrrrs.  z+GaussianDiffusion.p_sample_loop_progressiverCcCs|j||||||d}||||d} t|j||j} t|j||j} |td| d| td| | } t|} |dt| td| | d| }|dk j ddgt |jd}||| | }||ddSrrrrrrrs4  $zGaussianDiffusion.ddim_samplec Cs|dkstd|j||||||d}t|j||j||dt|j||j} t|j||j} |dt| td| | } | |ddSrrrrrrrs(  z%GaussianDiffusion.ddim_reverse_samplec Cs2d} |j||||||||| d D]} | } q | dSrrrrrrrs z"GaussianDiffusion.ddim_sample_loopc  cs|dkrt|j}t|ttfs(t|dk r6|} ntj|d|i} tt |j ddd} |rvddl m } | | } | D]V} tj | g|d|d}t,|j|| ||||| d}|V|d} W5QRXqzdSrrrrrrr>s0   z.GaussianDiffusion.ddim_sample_loop_progressivecCs|j|||d\}}} |j|||||d} t|| | d| d} t| td} t|| dd| dd } | j|jks~tt| td} t |dk| | } | | d d S) rrrrergrr)meansZ log_scalesrrr) rtrrrrrUZ#discretized_gaussian_log_likelihoodrIrJrkrrrrrrns8   zGaussianDiffusion._vb_terms_bpdcCs|dkr i}|dkrt|}|j|||d}i}|jtjksJ|jtjkr|j||||d|dd|d<|jtjkr|d|j9<n|jtj ks|jtj kr ||| |f|}|j t jt jfkrx|jdd\} } |j| | df|jddksttj|| dd \}} tj|| gdd } |j| d d d |||dd d|d<|jtj krx|d|jd9<tj|j|||ddtj|tj|i|j} |j| jkr|jksntt| |d|d<d|kr|d|d|d<n |d|d<n t|j|S)rNrFrrrr rrvrcWs|Sr4rrrrrrrz3GaussianDiffusion.training_losses..rrrrrr)rkrlrorGr3r5r6rrMr9r:rrFr.r/r2rIrJrrrr$r+rtr,r-rErr)r7rrdr rrnrsrrrrrrrrrrrsv   & $  z!GaussianDiffusion.training_lossescCsZ|jd}tj|jdg||jd}|||\}}}t||ddd}t|t dSrrrrrrrs zGaussianDiffusion._prior_bpdc Cs6|j}|jd}g}g}g} tt|jdddD]} tj| g||d} t|} |j|| | d} t |j ||| | ||d}W5QRX| |d| t |d|d | | | |d}| t || d q4tj|d d }tj|d d }tj| d d } ||}|jd d |}||||| d SrrrrrrrsB    zGaussianDiffusion.calc_bpd_loop)N)TNN)TNN)NTNNNF)NTNNNF)TNNrC)TNNrC)NTNNNFrC)NTNNNFrC)TN)NN)TNrrrrrrQs 5  `  % . 5 2 * $ 1 # Hr)r)r(r)r numpyrtorchrk basic_opsrlossesrrrrEnumr$r.r3r;rbrrrrrs&     h