a W]dz@sjddlZddlZddlmZddlZddlZddlZddlm Z ddlm Z m Z d^dd Z d_d d Z d`d dZdaddZdbddZdcddZGdddeZddddZdedd Zd!d"Zd#d$Zd%d&Zd'd(Zed)d dddfd*d+Zd,d-Zed.Zed/d0dddd1dd2ed3d0ded.d.d4dd2ed5d0d6ed.d7d4dd2ed8d0ddd9d4d d2ed:d0ddd;d4d d2edd0ddd?d4d d2ed@d0dddAd4d d2edBd0ded.dCdDd d2dE ZdfdGdHZ dgdIdJZ!dhdKdLZ"didMdNZ#GdOdPdPej$j%Z&GdQdRdRej$j%Z'djdSdTZ(dkdUdVZ)GdWdXdXej$j%Z*dYdZZ+e,d[d\d]Z-dS)lN)Any)repeat)conv2dconv_transpose2d-C6?{Gz?Mb?c Cs|dkr*tj|d|d|tjdd}n|dkrtj|dtjd|||}|d|tjd}t|d|}||d}d|dd|dd}tj |dd d }nP|d krtj|||tjd}n2|d krtj|||tjdd}nt d |d| S)Nlinearg?dtypecosinerg+?)a_mina_maxZ sqrt_linearsqrtz schedule 'z ' unknown.) torchlinspacefloat64arangetonppicospowclip ValueErrornumpy) devicescheduleZ n_timestepZ linear_startZ linear_endZcosine_sbetas timestepsalphasr$BD:\my_projects\my_pycharmprojects\lama-cleaner-demo\model\utils.pymake_beta_schedule s " r&TcCs||}t|dg||dd}|td|d|d||}|r~td|d|td|d||||fS)Nrrrz'Selected alphas for ddim sampler: a_t: z ; a_(t-1): z&For the chosen value of eta, which is zB, this results in the following sigma_t schedule for ddim sampler )rasarraytolistrprint)Z alphacumsddim_timestepsetaverboser#Z alphas_prevsigmasr$r$r%make_ddim_sampling_parameters$s$& r.cCs|dkr(||}tttd||}n<|dkrTtdt|d|dt}ntd|d|d}|r~t d ||S) Nuniformrquadg?r z/There is no ddim discretization method called ""rz%Selected timesteps for ddim sampler: ) rr'listrangerrastypeintNotImplementedErrorr))Zddim_discr_methodZnum_ddim_timestepsZnum_ddpm_timestepsr,cr*Z steps_outr$r$r%make_ddim_timesteps2s$r8Fcs,fdd}fdd}|r&|S|S)Ncs<tjdgddRdjdgdtdRS)Nrrr)r)rrandnrlenr$rshaper$r%Dznoise_like..cstjdS)Nr9)rr:r$r<r$r%r>Er?r$)r=rrZ repeat_noisenoiser$r<r% noise_likeCsrA'c Cs|d}tt| tjd|tjd|j|d}|dddf|d}tjt |t |gdd}|drtj|t |ddddfgdd}|S) aX Create sinusoidal timestep embeddings. :param timesteps: a 1-D Tensor of N indices, one per batch element. These may be fractional. :param dim: the dimension of the output. :param max_period: controls the minimum frequency of the embeddings. :return: an [N x dim] Tensor of positional embeddings. r r)startendr r9Nrdimr) rexpmathlogrfloat32rfloatcatrsin zeros_like) rr"rF max_periodZ repeat_onlyhalfZfreqsargs embeddingr$r$r%timestep_embeddingIs  (rSr:0yE>cCs||j|dd|S)NT)rFkeepdim)squaremeanrsqrt)xrFepsr$r$r%normalize_2nd_momentbsr[c@sBeZdZdZeedddZeeddddZeddd d ZdS) EasyDictzWConvenience class that behaves like a dict but allows access with the attribute syntax.)namereturncCs*z ||WSty$t|Yn0dSN)KeyErrorAttributeErrorselfr]r$r$r% __getattr__is  zEasyDict.__getattr__N)r]valuer^cCs |||<dSr_r$)rcr]rer$r$r% __setattr__oszEasyDict.__setattr__cCs ||=dSr_r$rbr$r$r% __delattr__rszEasyDict.__delattr__) __name__ __module__ __qualname____doc__strrrdrfrgr$r$r$r%r\fsr\r cs0t|tjsJ|dus$|dks$Jt|}t|dur:|n|j}t|durP|n|j}t|durf|nd}|durt|tjr|jdksJdkr|jksnJ|jd|jksJ|| fddt |jD}t|}|j ||d}t|}|dkr||}|dkr,| | |}|S)zQSlow reference implementation of `bias_act()` using standard TensorFlow ops. Nrrrcsg|]}|krdndqS)rrr$.0irEr$r% r?z!_bias_act_ref..)alpha) isinstancerTensoractivation_funcsrK def_alphadef_gainndimr=reshaper3funcclamp)rYbrFactrqgainrzspecr$rEr% _bias_act_refvs&"  rrefc Cs2t|tjsJ|dvsJt|||||||dS)aFused bias and activation function. Adds bias `b` to activation tensor `x`, evaluates activation function `act`, and scales the result by `gain`. Each of the steps is optional. In most cases, the fused op is considerably more efficient than performing the same calculation using standard PyTorch ops. It supports first and second order gradients, but not third order gradients. Args: x: Input activation tensor. Can be of any shape. b: Bias vector, or `None` to disable. Must be a 1D tensor of the same type as `x`. The shape must be known, and it must match the dimension of `x` corresponding to `dim`. dim: The dimension in `x` corresponding to the elements of `b`. The value of `dim` is ignored if `b` is not specified. act: Name of the activation function to evaluate, or `"linear"` to disable. Can be e.g. `"relu"`, `"lrelu"`, `"tanh"`, `"sigmoid"`, `"swish"`, etc. See `activation_funcs` for a full list. `None` is not allowed. alpha: Shape parameter for the activation function, or `None` to use the default. gain: Scaling factor for the output tensor, or `None` to use default. See `activation_funcs` for the default scaling of each activation function. If unsure, consider specifying 1. clamp: Clamp the output values to `[-clamp, +clamp]`, or `None` to disable the clamping (default). impl: Name of the implementation to use. Can be `"ref"` or `"cuda"` (default). Returns: Tensor of the same shape and datatype as `x`. )rcuda)rYr{rFr|rqr}rz)rrrrsr)rYr{rFr|rqr}rzimplr$r$r%bias_acts rcCsf|dur dSt|tjr"|jdvs&J|jd}|jd}t|}t|}|dkrZ|dks^J||fS)N)rrrr rrr)rrrrsrwr=r5)ffwfhr$r$r%_get_filter_sizes  rcCsdd|jD}|S)NcSsg|] }t|qSr$)r5)rnszr$r$r%rpr?z%_get_weight_shape..)r=)wr=r$r$r%_get_weight_shapesrcCs^t|tr||g}t|ttfs$Jtdd|Ds:J|\}}|dkrR|dksVJ||fS)Ncss|]}t|tVqdSr_rrr5rnrYr$r$r% r?z!_parse_scaling..r)rrr5r2tupleall)scalingsxsyr$r$r%_parse_scalings rcCsrt|tr||g}t|ttfs$Jtdd|Ds:Jt|dkrZ|\}}||||g}|\}}}}||||fS)Ncss|]}t|tVqdSr_rrr$r$r%rr?z!_parse_padding..r )rrr5r2rrr;)paddingZpadxZpadypadx0padx1pady0pady1r$r$r%_parse_paddings    rcpucCs|dur d}tj|tjd}|jdvs*J|dks:J|jdkrN|tj}|durl|jdkoj|dk}|jdkr|s||}|j|rdndksJ|r||}|r| t t |j}|||jd}|j |d}|S) aConvenience function to setup 2D FIR filter for `upfirdn2d()`. Args: f: Torch tensor, numpy array, or python list of the shape `[filter_height, filter_width]` (non-separable), `[filter_taps]` (separable), `[]` (impulse), or `None` (identity). device: Result device (default: cpu). normalize: Normalize the filter so that it retains the magnitude for constant input signal (DC)? (default: True). flip_filter: Flip the filter? (default: False). gain: Overall scaling factor for signal magnitude (default: 1). separable: Return a separable filter? (default: select automatically). Returns: Float32 tensor of the shape `[filter_height, filter_width]` (non-separable) or `[filter_taps]` (separable). Nrr )rrr rr r9) r as_tensorrJrwnumelrnewaxisgersumflipr2r3r)rr normalize flip_filterr} separabler$r$r% setup_filters&     rcsfdd}|S)Ncs t|tjjr|Stt|Sr_)rr collectionsabcIterablerrrYnr$r%parsesz_ntuple..parser$)rrr$rr%_ntuples rr cKs|Sr_r$rY_r$r$r%r>r?r>)ryrurvZcuda_idxrZ has_2nd_gradcKstjj|Sr_)rnn functionalrelurr$r$r%r>r?ycKstjj||Sr_)rrr leaky_relu)rYrqrr$r$r%r> r?g?cKs t|Sr_)rtanhrr$r$r%r>"r?cKs t|Sr_rsigmoidrr$r$r%r>$r?cKstjj|Sr_)rrrelurr$r$r%r>&r?cKstjj|Sr_)rrrselurr$r$r%r>(r?cKstjj|Sr_)rrrsoftplusrr$r$r%r>*r?rcKst||Sr_rrr$r$r%r>,r? rY) r rZlrelurrrrrswishrc Cst|||||||dS)aPad, upsample, filter, and downsample a batch of 2D images. Performs the following sequence of operations for each channel: 1. Upsample the image by inserting N-1 zeros after each pixel (`up`). 2. Pad the image with the specified number of zeros on each side (`padding`). Negative padding corresponds to cropping the image. 3. Convolve the image with the specified 2D FIR filter (`f`), shrinking it so that the footprint of all output pixels lies within the input image. 4. Downsample the image by keeping every Nth pixel (`down`). This sequence of operations bears close resemblance to scipy.signal.upfirdn(). The fused op is considerably more efficient than performing the same calculation using standard PyTorch ops. It supports gradients of arbitrary order. Args: x: Float32/float64/float16 input tensor of the shape `[batch_size, num_channels, in_height, in_width]`. f: Float32 FIR filter of the shape `[filter_height, filter_width]` (non-separable), `[filter_taps]` (separable), or `None` (identity). up: Integer upsampling factor. Can be a single int or a list/tuple `[x, y]` (default: 1). down: Integer downsampling factor. Can be a single int or a list/tuple `[x, y]` (default: 1). padding: Padding with respect to the upsampled image. Can be a single number or a list/tuple `[x, y]` or `[x_before, x_after, y_before, y_after]` (default: 0). flip_filter: False = convolution, True = correlation (default: False). gain: Overall scaling factor for signal magnitude (default: 1). impl: Implementation to use. Can be `'ref'` or `'cuda'` (default: `'cuda'`). Returns: Tensor of the shape `[batch_size, num_channels, out_height, out_width]`. )updownrrr})_upfirdn2d_ref)rYrrrrrr}rr$r$r% upfirdn2d1s*rc CsRt|tjr|jdksJ|dur:tjddgtj|jd}t|tjrP|jdvsTJ|jtjkrf|jrjJ|j \}}} } ||} } ||} }|d|d|d|df\}}}}| ||| d| dg}tj j |d| dddd| dg}| ||| | | | g}tj j |t|dt|dt|dt|dg}|ddddt| d|j dt| dt| d|j dt| df}|||jd}||j}|s|tt|j}|tjtjf|dgdg|j}|jdkrt|||d }n(t||d|d }t||d|d }|dddddd|dd| f}|S) zOSlow reference implementation of `upfirdn2d()` using standard PyTorch ops. rNr)r rrrr r)inputweightgroups)rrrrsrwonesrJrr requires_gradr=rxrrpadmaxrrr2r3rrrr unsqueeze)rYrrrrrr} batch_size num_channelsZ in_heightZin_widthupxupydownxdownyrrrrr$r$r%r^s2  $$0T & $rc Cst|\}}||||f\} } } } t|\} }| | |dd| | |d| ||dd| ||dg}t|||||||dS)aEDownsample a batch of 2D images using the given 2D FIR filter. By default, the result is padded so that its shape is a fraction of the input. User-specified padding is applied on top of that, with negative values indicating cropping. Pixels outside the image are assumed to be zero. Args: x: Float32/float64/float16 input tensor of the shape `[batch_size, num_channels, in_height, in_width]`. f: Float32 FIR filter of the shape `[filter_height, filter_width]` (non-separable), `[filter_taps]` (separable), or `None` (identity). down: Integer downsampling factor. Can be a single int or a list/tuple `[x, y]` (default: 1). padding: Padding with respect to the input. Can be a single number or a list/tuple `[x, y]` or `[x_before, x_after, y_before, y_after]` (default: 0). flip_filter: False = convolution, True = correlation (default: False). gain: Overall scaling factor for signal magnitude (default: 1). impl: Implementation to use. Can be `'ref'` or `'cuda'` (default: `'cuda'`). Returns: Tensor of the shape `[batch_size, num_channels, out_height, out_width]`. rr )rrrr}r)rrr)rYrrrrr}rrrrrrrrrpr$r$r% downsample2ds  rc Cst|\}}t|\} } } } t|\} }| | |dd| | |d| ||dd| ||dg}t|||||||||dS)aBUpsample a batch of 2D images using the given 2D FIR filter. By default, the result is padded so that its shape is a multiple of the input. User-specified padding is applied on top of that, with negative values indicating cropping. Pixels outside the image are assumed to be zero. Args: x: Float32/float64/float16 input tensor of the shape `[batch_size, num_channels, in_height, in_width]`. f: Float32 FIR filter of the shape `[filter_height, filter_width]` (non-separable), `[filter_taps]` (separable), or `None` (identity). up: Integer upsampling factor. Can be a single int or a list/tuple `[x, y]` (default: 1). padding: Padding with respect to the output. Can be a single number or a list/tuple `[x, y]` or `[x_before, x_after, y_before, y_after]` (default: 0). flip_filter: False = convolution, True = correlation (default: False). gain: Overall scaling factor for signal magnitude (default: 1). impl: Implementation to use. Can be `'ref'` or `'cuda'` (default: `'cuda'`). Returns: Tensor of the shape `[batch_size, num_channels, out_height, out_width]`. rr )rrrr}r)rrrr)rYrrrrr}rrrrrrrrrrr$r$r% upsample2ds  rcs&eZdZdfdd ZddZZS)MinibatchStdLayerrcst||_||_dSr_)super__init__ group_sizer)rcrr __class__r$r%rs zMinibatchStdLayer.__init__c Cs|j\}}}}|jdur2tt|jt|n|}|j}||}||d||||} | | jdd} | jdd} | d } | jgdd} | d|dd} | |d||} tj || gdd}|S)NrrrErT)r rrr) r=rrminrrrxrWrVrrrL) rcrYNCHWGFr7rr$r$r%forwards( zMinibatchStdLayer.forward)rrhrirjrr __classcell__r$r$rr%rsrcs&eZdZd fdd ZddZZS) FullyConnectedLayerTr rrcslttjt||g||_|rFtjt|gt |nd|_ ||_ |t ||_ ||_dSr_)rrrr Parameterr:rfullrrJbias activationr weight_gain bias_gain)rc in_features out_featuresrrZ lr_multiplierZ bias_initrr$r%rs  &zFullyConnectedLayer.__init__cs|j|j}|j}|dur.|jdkr.||j}|jdkrr|durr||fddtj D}n&|t ||jj dd}|S)Nrr cs"g|]}|jdkrdndqS)rr)rwrmrr$r%rpr?z/FullyConnectedLayer.forward..)r|rF) rrrrrmatmultrxr3rwr)rcrYrr{outr$rr%rs  $zFullyConnectedLayer.forward)Tr rrrr$r$rr%rs rc Cs.t|\}}} } |s"|ddg}| dkr| dkr|dkr|dddgdfvr|s|ddkrt||dkr|dkr|dkr|j} |dd|| d|dg}|| d|| d| dg}n*|jtj d }|jtj d }t |||d }|jtj d S|rt nt } | |||||d S) zTWrapper for the underlying `conv2d()` and `conv_transpose2d()` implementations. r rrr)rr@rr memory_format)r)striderr) rrrrr=squeezerxrrcontiguous_formatr channels_lastr) rYrrrr transpose flip_weight out_channelsin_channels_per_groupkhkwin_shapeopr$r$r%_conv2d_wrappers8"$ rc  Cs,t|tjr|jdksJt|tjr<|jdkr<|j|jks@J|dusnt|tjrj|jdvrj|jtjksnJt|tr|dksJt|tr|dksJt|\} } } } t|\} }||||f\}}}}|dkr|| |dd7}|| |d7}|||dd7}|||d7}|dkrn|| |dd7}|| |d7}|||dd7}|||d7}| dkr| dkr|dkr|dkrt |||||||g|d}t ||||d}|S| dkr | dkr |dkr |dkr t ||||d}t |||||||g|d|d}|S|dkrb|dkrbt ||||||g|d }t |||||d }|S|dkr~|dkr| d d}n:| || || | | }| dd}| || | || | }|| d8}|| |8}|| d8}|| |8}t t| | d }t t| | d }t |||||g|d | d }t ||||||||||g|d|d}|dkrzt ||||d}|S|dkr|dkr||kr||kr|d kr|d krt ||||g||dSt ||dkr|nd|||||g|d|d}t ||||d}|dkr(t ||||d}|S)a2D convolution with optional up/downsampling. Padding is performed only once at the beginning, not between the operations. Args: x: Input tensor of shape `[batch_size, in_channels, in_height, in_width]`. w: Weight tensor of shape `[out_channels, in_channels//groups, kernel_height, kernel_width]`. f: Low-pass filter for up/downsampling. Must be prepared beforehand by calling setup_filter(). None = identity (default). up: Integer upsampling factor (default: 1). down: Integer downsampling factor (default: 1). padding: Padding with respect to the upsampled image. Can be a single number or a list/tuple `[x, y]` or `[x_before, x_after, y_before, y_after]` (default: 0). groups: Split input channels into N groups (default: 1). flip_weight: False = convolution, True = correlation (default: True). flip_filter: False = convolution, True = correlation (default: False). Returns: Tensor of the shape `[batch_size, num_channels, out_height, out_width]`. rNrrr )rYrrrr)rYrrr)rYrrrr}r)rYrrr)rYrrrrrT)rYrrrrrr)rYrrr}r)rYrrr)rYrrrr)rrrrsrwr rJr5rrrrrrxrr)rYrrrrrrrrrrrrrrZpx0px1Zpy0Zpy1ZpxtZpytr$r$r%conv2d_resample3sz&.   ((        & (& rcs<eZdZddddgddddffdd Zd d d ZZS) Conv2dLayerTr r)rrrrNFc st||_||_||_|dt|| |_|d|_dt ||d|_ t |j |_| rjtjntj} t||||gj| d} |rt|gnd}| rtj| |_|durtj|nd|_n(|d| |dur|d|nd|_dS)Nresample_filterr rrrr)rrrrrregister_bufferr conv_clamprrrrrtrvact_gainrrrr:rzerosrrrr)rc in_channelsr kernel_sizerrrrrr rZ trainablerrrr$r%rs&    zConv2dLayer.__init__cCsd|j|j}t|||j|j|j|jd}|j|}|jdurF|j|nd}t ||j |j ||d}|S)N)rYrrrrr)r|r}rz) rrrrrrrr r rrr)rcrYr}rr Z act_clamprr$r$r%rs  zConv2dLayer.forward)rrr$r$rr%rs$rcCs"tjrtjtjdSr_)rr is_available empty_cache ipc_collectr$r$r$r%torch_gcs  rseedcCs0t|tj|t|tj|dSr_)randomrrr manual_seedrmanual_seed_allrr$r$r%set_seeds   r)rrr)T)T)F)rBF)rrT)Nrr NNN)Nrr NNNr)rrrFrr)rrrFr)r rFrr)r rFrr)rrrFT)NrrrrTF).rHrtypingrrrrr itertoolsrrrr&r.r8rArSr[dictr\rrrrrrrrrZ to_2tuplerrtrrrrrModulerrrrrrr5rr$r$r$r%sx         #  /   - / ( (!  f0