U jۦb}}@sdZddlZddlZddlZddlZddlmZddl m Z m Z ddZ dd d Z Gd d d ejZGd ddejZGdddejZGdddZddZdS)a! This code started out as a PyTorch port of Ho et al's diffusion models: https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py Docstrings have been added, as well as DDIM sampling and a new collection of beta schedules. N) mean_flat) normal_kl#discretized_gaussian_log_likelihoodcCs\|dkr4d|}|d}|d}tj|||tjdS|dkrJt|ddStd |d S) a@ Get a pre-defined beta schedule for the given name. The beta schedule library consists of beta schedules which remain similar in the limit of num_diffusion_timesteps. Beta schedules may be added, but should not be removed or changed once they are committed to maintain backwards compatibility. linearig-C6?g{Gz?dtypecosinecSs t|ddtjddS)NgMb?gT㥛 ?)mathcospi)trA/home/zsyue/code/python/GradDiff/models/gaussian_diffusion_ori.py'z)get_named_beta_schedule..zunknown beta schedule: N)nplinspacefloat64betas_for_alpha_barNotImplementedError)Z schedule_namenum_diffusion_timestepsscaleZ beta_startZbeta_endrrrget_named_beta_schedules r+?cCsPg}t|D]8}||}|d|}|td|||||q t|S)a$ Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of (1-beta) over time from t = [0,1]. :param num_diffusion_timesteps: the number of betas to produce. :param alpha_bar: a lambda that takes an argument t from 0 to 1 and produces the cumulative product of (1-beta) up to that part of the diffusion process. :param max_beta: the maximum beta to use; use values lower than 1 to prevent singularities. r)rangeappendminrarray)r alpha_barZmax_betabetasit1t2rrrr-s   "rc@s(eZdZdZeZeZeZdS) ModelMeanTypez2 Which type of output the model predicts. N) __name__ __module__ __qualname____doc__enumauto PREVIOUS_XSTART_XEPSILONrrrrr%Asr%c@s0eZdZdZeZeZeZeZ dS) ModelVarTypez What is used as the model's output variance. The LEARNED_RANGE option has been added to allow the model to predict values between FIXED_SMALL and FIXED_LARGE, making its job easier. N) r&r'r(r)r*r+LEARNED FIXED_SMALL FIXED_LARGE LEARNED_RANGErrrrr/Ks r/c@s4eZdZeZeZeZeZddZ dS)LossTypecCs|tjkp|tjkSN)r4KL RESCALED_KL)selfrrris_vbaszLossType.is_vbN) r&r'r(r*r+MSE RESCALED_MSEr6r7r9rrrrr4Ys r4c@seZdZdZddddZddZd/d d Zd d Zd0ddZddZ ddZ ddZ ddZ d1ddZ d2ddZd3ddZd4dd Zd5d!d"Zd6d#d$Zd7d%d&Zd8d'd(Zd9d)d*Zd+d,Zd:d-d.ZdS);GaussianDiffusionaO Utilities for training and sampling diffusion models. Ported directly from here, and then adapted over time to further experimentation. https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/diffusion_utils_2.py#L42 :param betas: a 1-D numpy array of betas for each diffusion timestep, starting at T and going to 1. :param model_mean_type: a ModelMeanType determining what the model outputs. :param model_var_type: a ModelVarType determining how variance is output. :param loss_type: a LossType determining the loss function to use. :param rescale_timesteps: if True, pass floating point timesteps into the model so that they are always scaled like in the original paper (0 to 1000). F)rescale_timestepscCs||_||_||_||_tj|tjd}||_t|j dksDt d|dk r\|dk s`t t |j d|_ d|}tj|dd|_td|jdd|_t|jddd|_|jj |j fkst t|j|_td|j|_td|j|_td|j|_td|jd|_|d|jd|j|_tt|jd|jdd|_|t|jd|j|_d|jt|d|j|_dS) Nrrzbetas must be 1-Dr?)axis)model_mean_typemodel_var_type loss_typer=rrrr!lenshapeAssertionErrorallint num_timestepscumprodalphas_cumprodralphas_cumprod_prevalphas_cumprod_nextsqrtsqrt_alphas_cumprodsqrt_one_minus_alphas_cumprodloglog_one_minus_alphas_cumprodsqrt_recip_alphas_cumprodsqrt_recipm1_alphas_cumprodposterior_varianceposterior_log_variance_clippedposterior_mean_coef1posterior_mean_coef2)r8r!rBrCrDr=alphasrrr__init__vs@ zGaussianDiffusion.__init__cCsBt|j||j|}td|j||j}t|j||j}|||fS)a Get the distribution q(x_t | x_0). :param x_start: the [N x C x ...] tensor of noiseless inputs. :param t: the number of diffusion steps (minus 1). Here, 0 means one step. :return: A tuple (mean, variance, log_variance), all of x_start's shape. r>)_extract_into_tensorrPrFrLrS)r8x_startrmeanvariance log_variancerrrq_mean_variances z!GaussianDiffusion.q_mean_varianceNcCsJ|dkrt|}|j|jks"tt|j||j|t|j||j|S)am Diffuse the data for a given number of diffusion steps. In other words, sample from q(x_t | x_0). :param x_start: the initial data batch. :param t: the number of diffusion steps (minus 1). Here, 0 means one step. :param noise: if specified, the split-out normal noise. :return: A noisy version of x_start. N)th randn_likerFrGr\rPrQ)r8r]rnoiserrrq_samples  zGaussianDiffusion.q_samplecCs|j|jkstt|j||j|t|j||j|}t|j||j}t|j||j}|jd|jdkr|jdkr|jdksnt|||fS)zo Compute the mean and variance of the diffusion posterior: q(x_{t-1} | x_t, x_0) r)rFrGr\rXrYrVrW)r8r]x_trZposterior_meanrVrWrrrq_posterior_mean_variances& z+GaussianDiffusion.q_posterior_mean_varianceTc sJ|dkr i}|jdd\}}|j|fks.t||||f|} |jtjtjfkr| j||df|jddksxttj| |dd\} } |jtjkr| } t | } nPt |j ||j} t t |j||j}| dd}||d|| } t | } nxtjt |jd|jddt t |jd|jddftj|j|j fi|j\} } t | ||j} t | ||j} fdd}|jtjkr||j||| d}| }nb|jtjtjfkr|jtjkr|| }n||j||| d}|j|||d \}}}n t|j|j| jkr6|jkr6|jks.process_xstart)rfrxprev)rfrepsr]rfr)r^r_r` pred_xstart)rFrG_scale_timestepsrCr/r0r3rbsplitexpr\rWrrRr!r2rrVr1rBr%r,_predict_xstart_from_xprevr-r._predict_xstart_from_epsrgr)r8modelrkrrmrn model_kwargsBC model_outputmodel_var_valuesZmodel_log_varianceZmodel_varianceZmin_logZmax_logfracrorsZ model_mean_rrlrp_mean_variances~$           z!GaussianDiffusion.p_mean_variancecCs8|j|jkstt|j||j|t|j||j|Sr5)rFrGr\rTrU)r8rfrrqrrrrxHs z*GaussianDiffusion._predict_xstart_from_epscCsB|j|jksttd|j||j|t|j|j||j|S)Nr>)rFrGr\rXrY)r8rfrrprrrrwOs z,GaussianDiffusion._predict_xstart_from_xprevcCs(t|j||j||t|j||jSr5)r\rTrFrU)r8rfrrsrrr_predict_eps_from_xstartYs z*GaussianDiffusion._predict_eps_from_xstartcCs|jr|d|jS|S)N@@)r=floatrJ)r8rrrrrt_sz"GaussianDiffusion._scale_timestepsc Csv|j||||||d}t|}|dkjd dgt|jd} |d| td|d|} | |dd S) a Sample x_{t-1} from the model at the given timestep. :param model: the model to sample from. :param x: the current tensor at x_{t-1}. :param t: the value of t, starting at 0 for the first diffusion step. :param clip_denoised: if True, clip the x_start prediction to [-1, 1]. :param denoised_fn: if not None, a function which applies to the x_start prediction before it is used to sample. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :return: a dict containing the following keys: - 'sample': a random sample from the model. - 'pred_xstart': a prediction of x_0. rmrnrzrr@rr^?r`rssamplers)r@)rrbrcrviewrErFrv) r8ryrkrrmrnrzoutrd nonzero_maskrrrrp_sampleds $"zGaussianDiffusion.p_samplec Cs0d} |j||||||||dD]} | } q| dS)a Generate samples from the model. :param model: the model module. :param shape: the shape of the samples, (N, C, H, W). :param noise: if specified, the noise from the encoder to sample. Should be of the same shape as `shape`. :param clip_denoised: if True, clip x_start predictions to [-1, 1]. :param denoised_fn: if not None, a function which applies to the x_start prediction before it is used to sample. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :param device: if specified, the device to create the samples on. If not specified, use a model parameter's device. :param progress: if True, show a tqdm progress bar. :return: a non-differentiable batch of samples. N)rdrmrnrzdeviceprogressr)p_sample_loop_progressive) r8ryrFrdrmrnrzrrfinalrrrr p_sample_loops zGaussianDiffusion.p_sample_loopc  cs|dkrt|j}t|ttfs(t|dk r6|} ntj|d|i} tt |j ddd} |rvddl m } | | } | D]T} tj | g|d|d} t*|j|| | |||d}|V|d} W5QRXqzdS) a Generate samples from the model and yield intermediate samples from each timestep of diffusion. Arguments are the same as p_sample_loop(). Returns a generator over dicts, where each dict is the return value of p_sample(). Nrr@rtqdmrrr)next parametersr isinstancetuplelistrGrbrandnrrJ tqdm.autortensorno_gradr)r8ryrFrdrmrnrzrrimgindicesrr"rrrrrrs.  z+GaussianDiffusion.p_sample_loop_progressiverAcCs|j||||||d}||||d} t|j||j} t|j||j} |td| d| td| | } t|} |dt| td| | d| }|dk j ddgt |jd}||| | }||ddS) z^ Sample x_{t-1} from the model using DDIM. Same usage as p_sample(). rrsrr rr@r)r@) rrr\rLrFrMrbrOrcrrrE)r8ryrkrrmrnrzetarrqr Zalpha_bar_prevsigmard mean_predrrrrr ddim_samples4  $zGaussianDiffusion.ddim_samplec Cs|dkstd|j||||||d}t|j||j||dt|j||j} t|j||j} |dt| td| | } | |ddS)zG Sample x_{t+1} from the model using DDIM reverse ODE. rAz'Reverse ODE only for deterministic pathrrsrr) rGrr\rTrFrUrNrbrO) r8ryrkrrmrnrzrrrqZalpha_bar_nextrrrrddim_reverse_sample s(  z%GaussianDiffusion.ddim_reverse_samplec Cs2d} |j||||||||| d D]} | } q | dS)ze Generate samples from the model using DDIM. Same usage as p_sample_loop(). N)rdrmrnrzrrrr)ddim_sample_loop_progressive) r8ryrFrdrmrnrzrrrrrrrrddim_sample_loop2s z"GaussianDiffusion.ddim_sample_loopc  cs|dkrt|j}t|ttfs(t|dk r6|} ntj|d|i} tt |j ddd} |rvddl m } | | } | D]V} tj | g|d|d}t,|j|| ||||| d}|V|d} W5QRXqzdS) z Use DDIM to sample from the model and yield intermediate samples from each timestep of DDIM. Same usage as p_sample_loop_progressive(). Nrr@rrr)rmrnrzrr)rrrrrrrGrbrrrJrrrrr)r8ryrFrdrmrnrzrrrrrrr"rrrrrrRs0   z.GaussianDiffusion.ddim_sample_loop_progressivecCs|j|||d\}}} |j|||||d} t|| | d| d} t| td} t|| dd| dd } | j|jks~tt| td} t |dk| | } | | d d S) ai Get a term for the variational lower-bound. The resulting units are bits (rather than nats, as one might expect). This allows for comparison to other papers. :return: a dict with the following keys: - 'output': a shape [N] tensor of NLLs or KLs. - 'pred_xstart': the x_0 predictions. rr)rmrzr^r`@r)meansZ log_scalesrrs)outputrs) rgrrrrrRrrFrGrbwhere)r8ryr]rfrrmrzZ true_meanrZtrue_log_variance_clippedrklZ decoder_nllrrrr _vb_terms_bpds8   zGaussianDiffusion._vb_terms_bpdcCs|dkr i}|dkrt|}|j|||d}i}|jtjksJ|jtjkr|j||||d|dd|d<|jtjkr|d|j9<n|jtj ks|jtj kr ||| |f|}|j t jt jfkrx|jdd\} } |j| | df|jddksttj|| dd \}} tj|| gdd } |j| d d d |||dd d|d<|jtj krx|d|jd9<tj|j|||ddtj|tj|i|j} |j| jkr|jksntt| |d|d<d|kr|d|d|d<n |d|d<n t|j|S)a\ Compute training losses for a single timestep. :param model: the model to evaluate loss on. :param x_start: the [N x C x ...] tensor of inputs. :param t: a batch of timestep indices. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :param noise: if specified, the specific Gaussian noise to try to remove. :return: a dict with the key "loss" containing a tensor of shape [N]. Some mean or variance settings may also have other keys. N)rdF)ryr]rfrrmrzrlossr rrh)rcWs|Sr5r)rargsrrrrrz3GaussianDiffusion.training_losses..)ryr]rfrrmvbrrrrmse)rbrcrerDr4r6r7rrJr:r;rtrCr/r0r3rFrGrucatdetachr%r,rgr-r.rBrr)r8ryr]rrzrdrftermsr}r{r|r~Z frozen_outtargetrrrtraining_lossessv   & $  z!GaussianDiffusion.training_lossescCsZ|jd}tj|jdg||jd}|||\}}}t||ddd}t|t dS)a= Get the prior KL term for the variational lower-bound, measured in bits-per-dim. This term can't be optimized, as it only depends on the encoder. :param x_start: the [N x C x ...] tensor of inputs. :return: a batch of [N] KL values (in bits), one per batch element. rrrrA)mean1Zlogvar1Zmean2Zlogvar2r) rFrbrrJrrarrrrR)r8r] batch_sizerZqt_meanrZqt_log_varianceZkl_priorrrr _prior_bpds zGaussianDiffusion._prior_bpdc Cs6|j}|jd}g}g}g} tt|jdddD]} tj| g||d} t|} |j|| | d} t |j ||| | ||d}W5QRX| |d| t |d|d | | | |d}| t || d q4tj|d d }tj|d d }tj| d d } ||}|jd d |}||||| d S) au Compute the entire variational lower-bound, measured in bits-per-dim, as well as other related quantities. :param model: the model to evaluate loss on. :param x_start: the [N x C x ...] tensor of inputs. :param clip_denoised: if True, clip denoised samples. :param model_kwargs: if not None, a dict of extra keyword arguments to pass to the model. This can be used for conditioning. :return: a dict containing the following keys: - total_bpd: the total variational lower-bound, per batch element. - prior_bpd: the prior term in the lower-bound. - vb: an [N x T] tensor of terms in the lower-bound. - xstart_mse: an [N x T] tensor of x_0 MSEs for each timestep. - mse: an [N x T] tensor of epsilon MSEs for each timestep. rNr@r)r]rrd)r]rfrrmrzrrsr rrh) total_bpd prior_bpdr xstart_mser)rrFrrrJrbrrcrerrrrrstackrsum)r8ryr]rmrzrrrrrrZt_batchrdrfrrqrrrrr calc_bpd_loopsB    zGaussianDiffusion.calc_bpd_loop)N)TNN)TNN)NTNNNF)NTNNNF)TNNrA)TNNrA)NTNNNFrA)NTNNNFrA)TN)NN)TN)r&r'r(r)r[rarergrrxrwrrtrrrrrrrrrrrrrrrr<es 5  `  % . 5 2 * $ 1 # Hr<cCsBt|j|jd|}t|jt|kr8|d}q||S)a Extract values from a 1-D numpy array for a batch of indices. :param arr: the 1-D numpy array. :param timesteps: a tensor of indices into the array to extract. :param broadcast_shape: a larger shape of K dimensions with the batch dimension equal to the length of timesteps. :return: a tensor of shape [batch_size, 1, ...] where the shape has K dims. r).N)rb from_numpytorrrErFexpand)arr timestepsbroadcast_shaperesrrrr\9s  r\)r)r)r*r numpyrtorchrb basic_opsrlossesrrrrEnumr%r/r4r<r\rrrrs$    Y