U dgý<ã@sddlmZddlmZddlmZddlZddlmZddlZddl m Z mZddlmZddl Z ddlZddlZddlZdZdZd)d d „Zdd„Zd d„Zdd„Zd*dd„Zd+dd„Zdd„Zd,dd„Zdd„Zdd„Zd d!„Zd-d#d$„Zd.d%d&„Zd'd(„Z dS)/é)Úabsolute_import)Údivision)Úprint_functionN)ÚParallelÚdelayed©Úpesqgà@gíµ ÷Æ°>c Csþg}|r~tj |¡rBtjj|dd}t|ƒdkrztjj|dd}n8t|ƒ*}|D]}| ¡ ¡}| |d¡qPW5QRX|St|ƒn}|D]b}| ¡ ¡}t|ƒdkrÆ|d|dt|dƒdœ}nt|ƒdkrä|d|dd œ}| |¡qŒW5QRX|S) aiReads input paths from a file or directory and configures them for processing. Args: input_path (str): Path to the input directory or file. decode (int): Flag indicating if decoding should occur (1 for decode, 0 for standard read). Returns: list: A list of processed paths or dictionaries containing input and label paths. Úwav)ÚextrÚflacééé)ÚinputsÚlabelsÚduration)rr)ÚosÚpathÚisdirÚlibrosaÚutilÚ find_filesÚlenÚopenÚstripÚsplitÚappendÚfloat)Ú input_pathÚdecodeZprocessed_listÚfidÚlineZpath_sZ tmp_pathsÚsample©r#úV/mnt/nas/mit_sg/shengkui.zhao/speech_codec/clear_speech_local/clearvoice/utils/misc.pyÚread_and_config_files( r%cCs&|rt |¡}ntj|dd„d}|S)aLoads the model checkpoint from the specified path. Args: checkpoint_path (str): Path to the checkpoint file. use_cuda (bool): Flag indicating whether to use CUDA for loading. Returns: dict: The loaded checkpoint containing model parameters. cSs|S©Nr#©ÚstorageÚlocr#r#r$ÚJóz!load_checkpoint..©Úmap_location)ÚtorchÚload)Úcheckpoint_pathÚuse_cudaÚ checkpointr#r#r$Úload_checkpoint=s r3cCs|jddS)zÂRetrieves the current learning rate from the optimizer. Args: optimizer (torch.optim.Optimizer): The optimizer instance. Returns: float: The current learning rate. rÚlr©Úparam_groups)Ú optimizerr#r#r$Úget_learning_rateMs r8c Cs–td |¡ƒtj |d¡}tj |d¡}tj |¡r<|}ntj |¡rN|}ntdƒdSt|dƒ}| ¡ ¡}W5QRXtj ||¡}td |¡ƒt j |dd „d } d| kr¼| d} n| } | ¡}| ¡D]¨}|| krþ||j | |j krþ| |||<qÐ| dd ¡| krD||j | | dd ¡j krD| | dd ¡||<qÐd|| krÐ||j | d|j krÐ| d|||<qÐ| |¡td |¡ƒdS)a%Reloads a model for evaluation from the specified checkpoint directory. Args: model (nn.Module): The model to be reloaded. checkpoint_dir (str): Directory containing checkpoints. use_cuda (bool): Flag indicating whether to use CUDA. Returns: None zReloading from: {}Úlast_best_checkpointÚlast_checkpointz4Warning: No existing checkpoint or best_model found!NÚrzCheckpoint path: {}cSs|Sr&r#r'r#r#r$r*rr+z!reload_for_eval..r,Úmodelzmodule.Úz/=> Reloaded well-trained model {} for decoding.)ÚprintÚformatrrÚjoinÚisfilerÚreadlinerr.r/Ú state_dictÚkeysÚshapeÚreplaceÚload_state_dict) r<Úcheckpoint_dirr1Ú best_nameÚ ckpt_nameÚnameÚfÚ model_namer0r2Úpretrained_modelÚstateÚkeyr#r#r$Úreload_for_evalXs6 0$ rQTc Cs¦tj |d¡}tj |¡rŽt|dƒ}| ¡ ¡}W5QRXtj ||¡}t||ƒ} |j| d|d| | d¡| d} | d}t dƒnt d ƒd } d }| |fS)aÇReloads the model and optimizer state from a checkpoint. Args: model (nn.Module): The model to be reloaded. optimizer (torch.optim.Optimizer): The optimizer to be reloaded. checkpoint_dir (str): Directory containing checkpoints. use_cuda (bool): Flag indicating whether to use CUDA. strict (bool): If True, requires keys in state_dict to match exactly. Returns: tuple: Current epoch and step. r2r;r<)Ústrictr7ÚepochÚstepz)=> Reloaded previous model and optimizer.z8[!] Checkpoint directory is empty. Train a new model ...r) rrr@rArrBrr3rGr>)r<r7rHr1rRrJrLrMr0r2rSrTr#r#r$Úreload_model‰s rUr2c Csttj |d ||¡¡}t | ¡| ¡||dœ|¡ttj ||¡dƒ}| d ||¡¡W5QRXt d|ƒdS)a½Saves the model and optimizer state to a checkpoint file. Args: model (nn.Module): The model to be saved. optimizer (torch.optim.Optimizer): The optimizer to be saved. epoch (int): Current epoch number. step (int): Current training step number. checkpoint_dir (str): Directory to save the checkpoint. mode (str): Mode of the checkpoint ('checkpoint' or other). Returns: None zmodel.ckpt-{}-{}.pt)r<r7rSrTÚwz=> Saved checkpoint:N) rrr@r?r.ÚsaverCrÚwriter>)r<r7rSrTrHÚmoder0rLr#r#r$Úsave_checkpoint§s ÿ ýýrZcCs|jD]}||d<qdS)aSets the learning rate for all parameter groups in the optimizer. Args: opt (torch.optim.Optimizer): The optimizer instance whose learning rate needs to be set. lr (float): The new learning rate to be assigned. Returns: None r4Nr5)Úoptr4Úparam_groupr#r#r$Úsetup_lrÁs r]é€>cCs(zt|||dƒ}Wnd}YnX|S)agCalculates the PESQ (Perceptual Evaluation of Speech Quality) score between clean and noisy signals. Args: clean (ndarray): The clean audio signal. noisy (ndarray): The noisy audio signal. sr (int): Sample rate of the audio signals (default is 16000 Hz). Returns: float: The PESQ score or -1 in case of an error. Úwbéÿÿÿÿr)ÚcleanÚnoisyÚsrÚ pesq_scorer#r#r$Ú pesq_lossÏs recCsPtdddd„t||ƒDƒƒ}t |¡}d|kr4dS|dd}t |¡ d¡S) a:Computes the PESQ scores for batches of clean and noisy audio signals. Args: clean (list of ndarray): List of clean audio signals. noisy (list of ndarray): List of noisy audio signals. Returns: torch.FloatTensor: A tensor of normalized PESQ scores or None if any score is -1. r`)Ún_jobscss |]\}}ttƒ||ƒVqdSr&)rre)Ú.0ÚcÚnr#r#r$Ú íszbatch_pesq..Nr g@Úcuda)rÚzipÚnpÚarrayr.ÚFloatTensorÚto)rarbrdr#r#r$Ú batch_pesqâs rqcCsd|d}|d}t ||¡}t |¡}t |¡}|d}|t |¡}|t |¡}t ||gd¡S)zéCompresses the power of a complex spectrogram. Args: x (torch.Tensor): Input tensor with real and imaginary components. Returns: torch.Tensor: Compressed magnitude and phase representation of the input. ).r).r g333333Ó?r ©r.ÚcomplexÚabsÚangleÚcosÚsinÚstack)ÚxÚrealÚimagÚspecÚmagÚphaseZ real_compressZ imag_compressr#r#r$Úpower_compressøs rcCsTt ||¡}t |¡}t |¡}|d}|t |¡}|t |¡}t ||gd¡S)aUncompresses the power of a compressed complex spectrogram. Args: real (torch.Tensor): Compressed real component. imag (torch.Tensor): Compressed imaginary component. Returns: torch.Tensor: Uncompressed complex spectrogram. g«ªªªªª @r`rr)rzr{r|r}r~Zreal_uncompressZimag_uncompressr#r#r$Úpower_uncompress s r€Fc Cs„|j}|j}|j}|j}|dkr8tj|dd |j¡}n4|dkrXtj|dd |j¡}nt d|›dƒdStj ||||||ddS) aXComputes the Short-Time Fourier Transform (STFT) of an audio signal. Args: x (torch.Tensor): Input audio signal. args (Namespace): Configuration arguments containing window type and lengths. center (bool): Whether to center the window. Returns: torch.Tensor: The computed STFT of the input signal. ÚhammingF©ÚperiodicÚhanningz In STFT, ú is not supported!N)ÚcenterÚwindowÚreturn_complex)Úwin_typeÚwin_lenÚwin_incÚfft_lenr.Úhamming_windowrpÚdeviceÚhann_windowr>Ústft)ryÚargsr†r‰rŠr‹rŒr‡r#r#r$r!srcCsÈ|j}|j}|j} |j} |dkr8tj|dd |j¡}n4|dkrXtj|dd |j¡}nt d|›dƒdSz"tj || | ||||||dd }Wn4t |¡} tj | | | ||||||dd }YnX|S) a…Computes the inverse Short-Time Fourier Transform (ISTFT) of a complex spectrogram. Args: x (torch.Tensor): Input complex spectrogram. args (Namespace): Configuration arguments containing window type and lengths. slen (int, optional): Length of the output signal. center (bool): Whether to center the window. normalized (bool): Whether to normalize the output. onsided (bool, optional): If True, computes only the one-sided transform. return_complex (bool): If True, returns complex output. Returns: torch.Tensor: The reconstructed audio signal from the spectrogram. rFr‚r„z In ISTFT, r…N) Ún_fftÚ hop_lengthÚ win_lengthr‡r†Ú normalizedÚonesidedÚlengthrˆ)r‰rŠr‹rŒr.rrprŽrr>ÚistftÚview_as_complex)ryr‘Zslenr†r•Zonsidedrˆr‰rŠr‹rŒr‡ÚoutputZ x_complexr#r#r$r˜>s<þ þr˜c CsB|j|jd}|j|jd}tjjj|d|||j|j|jdS)a%Computes the filter bank features from an audio signal. Args: audio_in (torch.Tensor): Input audio signal. args (Namespace): Configuration arguments containing window length, shift, and sampling rate. Returns: torch.Tensor: Computed filter bank features. iègð?)ÚditherÚframe_lengthÚframe_shiftÚnum_mel_binsÚsample_frequencyÚwindow_type) rŠÚ sampling_rater‹Ú torchaudioÚ complianceÚkaldiÚfbankÚnum_melsr‰)Zaudio_inr‘rœrr#r#r$Ú compute_fbankis þr§)r)TT)r2)r^)F)NFFNF)!Ú __future__rrrr.Útorch.nnÚnnÚnumpyrmÚjoblibrrrrÚsysrr¢Ú MAX_WAV_VALUEÚEPSr%r3r8rQrUrZr]rerqrr€rr˜r§r#r#r#r$Ús6 %1 +