U /f@s:ddlZddlmmZddlmZGdddeZdS)N)BASECFMcsLeZdZdejjdfdd Zeddd Zd d Z dd d Z Z S)ConditionalCFM@N) estimatorcsJtj||||d|j|_|j|_|j|_||dkr:|nd}||_dS)N)Zn_feats cfm_paramsn_spks spk_emb_dimr)super__init__ t_schedulertraining_cfg_rateinference_cfg_rater)self in_channelsrrr r __class__M/proj/MR_dataset/benson/CosyVoice-main-aug-19/cosyvoice/flow/flow_matching.pyr szConditionalCFM.__init__?c Cs^t||}tjdd|d|jd}|jdkrHdt|dtj}|j||||||dS)aForward diffusion Args: mu (torch.Tensor): output of encoder shape: (batch_size, n_feats, mel_timesteps) mask (torch.Tensor): output_mask shape: (batch_size, 1, mel_timesteps) n_timesteps (int): number of diffusion steps temperature (float, optional): temperature for scaling noise. Defaults to 1.0. spks (torch.Tensor, optional): speaker ids. Defaults to None. shape: (batch_size, spk_emb_dim) cond: Not used but kept for future purposes Returns: sample: generated mel-spectrogram shape: (batch_size, n_feats, mel_timesteps) rrdevicecosine?)t_spanmumaskspkscond)torch randn_likelinspacerr cospi solve_euler) rrr n_timesteps temperaturerrzrrrrforward!s  zConditionalCFM.forwardc Cs|d|d|d|d}}} g} tdt|D]} |||||||} |jdkr|||t|||dk r|t|ndt|} d|j| |j| } || | }|| }| || t|dkr6|| d|} q6| dS)aP Fixed euler solver for ODEs. Args: x (torch.Tensor): random noise t_span (torch.Tensor): n_timesteps interpolated shape: (n_timesteps + 1,) mu (torch.Tensor): output of encoder shape: (batch_size, n_feats, mel_timesteps) mask (torch.Tensor): output_mask shape: (batch_size, 1, mel_timesteps) spks (torch.Tensor, optional): speaker ids. Defaults to None. shape: (batch_size, spk_emb_dim) cond: Not used but kept for future purposes rrNr)rangelenrrr zeros_likeappend)rxrrrrrt_dtsolstepZdphi_dtZ cfg_dphi_dtrrrr$:s,$    zConditionalCFM.solve_eulercCs"|j\}}}tj|ddg|j|jd}|jdkrHdt|dtj}t|} dd|j || ||} |d|j | } |j dkrtj||jd|j k} || ddd}|| dd}|| ddd}| | ||| ||} tj| || |dd t|| jd}|| fS) aComputes diffusion loss Args: x1 (torch.Tensor): Target shape: (batch_size, n_feats, mel_timesteps) mask (torch.Tensor): target mask shape: (batch_size, 1, mel_timesteps) mu (torch.Tensor): output of encoder shape: (batch_size, n_feats, mel_timesteps) spks (torch.Tensor, optional): speaker embedding. Defaults to None. shape: (batch_size, spk_emb_dim) Returns: loss: conditional flow matching loss y: conditional flow shape: (batch_size, n_feats, mel_timesteps) r)rdtyperrrrr)sum) reduction)shaperrandrr4r r"r#r sigma_minr viewrsqueezeFmse_lossr5)rx1rrrrbr0r/r'yuZcfg_maskpredlossrrr compute_losscs    ,zConditionalCFM.compute_loss)rrN)rNN)NN) __name__ __module__ __qualname__rnnModuler inference_moder(r$rD __classcell__rrrrrs  )r)rZtorch.nn.functionalrH functionalr<Z&matcha.models.components.flow_matchingrrrrrrs