U dg<@sddlmZddlmZddlmZddlZddlmZddlZddl m Z m Z ddl m Z ddl Z ddlZddlZddlZdZdZd)d d Zd d Zd dZddZd*ddZd+ddZddZd,ddZddZddZd d!Zd-d#d$Zd.d%d&Zd'd(Z dS)/)absolute_import)division)print_functionN)Paralleldelayedpesqg@gư>c Csg}|r~tj|rBtjj|dd}t|dkrztjj|dd}n8t|*}|D]}| }| |dqPW5QRX|St|n}|D]b}| }t|dkr|d|dt |dd}nt|dkr|d|dd }| |qW5QRX|S) aiReads input paths from a file or directory and configures them for processing. Args: input_path (str): Path to the input directory or file. decode (int): Flag indicating if decoding should occur (1 for decode, 0 for standard read). Returns: list: A list of processed paths or dictionaries containing input and label paths. wav)extrflac)inputslabelsduration)rr) ospathisdirlibrosautil find_fileslenopenstripsplitappendfloat) input_pathdecodeZprocessed_listfidlineZpath_sZ tmp_pathssampler#V/mnt/nas/mit_sg/shengkui.zhao/speech_codec/clear_speech_local/clearvoice/utils/misc.pyread_and_config_files(         r%cCs&|rt|}ntj|ddd}|S)aLoads the model checkpoint from the specified path. Args: checkpoint_path (str): Path to the checkpoint file. use_cuda (bool): Flag indicating whether to use CUDA for loading. Returns: dict: The loaded checkpoint containing model parameters. cSs|SNr#storagelocr#r#r$Jz!load_checkpoint.. map_location)torchload)checkpoint_pathuse_cuda checkpointr#r#r$load_checkpoint=s  r3cCs|jddS)zRetrieves the current learning rate from the optimizer. Args: optimizer (torch.optim.Optimizer): The optimizer instance. Returns: float: The current learning rate. rlr param_groups) optimizerr#r#r$get_learning_rateMs r8c Cstd|tj|d}tj|d}tj|r<|}ntj|rN|}n tddSt|d}|}W5QRXtj||}td|t j |dd d } d | kr| d } n| } | } | D]} | | kr| | j | | j kr| | | | <q| d d | krD| | j | | d d j krD| | d d | | <qd | | kr| | j | d | j kr| d | | | <q|| td|dS)a%Reloads a model for evaluation from the specified checkpoint directory. Args: model (nn.Module): The model to be reloaded. checkpoint_dir (str): Directory containing checkpoints. use_cuda (bool): Flag indicating whether to use CUDA. Returns: None zReloading from: {}last_best_checkpointlast_checkpointz4Warning: No existing checkpoint or best_model found!NrzCheckpoint path: {}cSs|Sr&r#r'r#r#r$r*rr+z!reload_for_eval..r,modelzmodule.z/=> Reloaded well-trained model {} for decoding.)printformatrrjoinisfilerreadlinerr.r/ state_dictkeysshapereplaceload_state_dict) r<checkpoint_dirr1 best_name ckpt_namenamef model_namer0r2pretrained_modelstatekeyr#r#r$reload_for_evalXs6      0$ rQTc Cstj|d}tj|rt|d}|}W5QRXtj||}t||} |j| d|d|| d| d} | d} t dnt d d } d } | | fS) aReloads the model and optimizer state from a checkpoint. Args: model (nn.Module): The model to be reloaded. optimizer (torch.optim.Optimizer): The optimizer to be reloaded. checkpoint_dir (str): Directory containing checkpoints. use_cuda (bool): Flag indicating whether to use CUDA. strict (bool): If True, requires keys in state_dict to match exactly. Returns: tuple: Current epoch and step. r2r;r<)strictr7epochstepz)=> Reloaded previous model and optimizer.z8[!] Checkpoint directory is empty. Train a new model ...r) rrr@rArrBrr3rGr>) r<r7rHr1rRrJrLrMr0r2rSrTr#r#r$ reload_models     rUr2c Csttj|d||}t||||d|ttj||d}|d||W5QRXt d|dS)aSaves the model and optimizer state to a checkpoint file. Args: model (nn.Module): The model to be saved. optimizer (torch.optim.Optimizer): The optimizer to be saved. epoch (int): Current epoch number. step (int): Current training step number. checkpoint_dir (str): Directory to save the checkpoint. mode (str): Mode of the checkpoint ('checkpoint' or other). Returns: None zmodel.ckpt-{}-{}.pt)r<r7rSrTwz=> Saved checkpoint:N) rrr@r?r.saverCrwriter>)r<r7rSrTrHmoder0rLr#r#r$save_checkpoints  rZcCs|jD] }||d<qdS)a Sets the learning rate for all parameter groups in the optimizer. Args: opt (torch.optim.Optimizer): The optimizer instance whose learning rate needs to be set. lr (float): The new learning rate to be assigned. Returns: None r4Nr5)optr4 param_groupr#r#r$setup_lrs r]>cCs(zt|||d}Wnd}YnX|S)agCalculates the PESQ (Perceptual Evaluation of Speech Quality) score between clean and noisy signals. Args: clean (ndarray): The clean audio signal. noisy (ndarray): The noisy audio signal. sr (int): Sample rate of the audio signals (default is 16000 Hz). Returns: float: The PESQ score or -1 in case of an error. wbr)cleannoisysr pesq_scorer#r#r$ pesq_losss  recCsPtddddt||D}t|}d|kr4dS|dd}t|dS) a:Computes the PESQ scores for batches of clean and noisy audio signals. Args: clean (list of ndarray): List of clean audio signals. noisy (list of ndarray): List of noisy audio signals. Returns: torch.FloatTensor: A tensor of normalized PESQ scores or None if any score is -1. r`)n_jobscss |]\}}tt||VqdSr&)rre).0cnr#r#r$ szbatch_pesq..Nr g @cuda)rzipnparrayr. FloatTensorto)rarbrdr#r#r$ batch_pesqs   rqcCsd|d}|d}t||}t|}t|}|d}|t|}|t|}t||gdS)zCompresses the power of a complex spectrogram. Args: x (torch.Tensor): Input tensor with real and imaginary components. Returns: torch.Tensor: Compressed magnitude and phase representation of the input. ).r).r g333333?r r.complexabsanglecossinstack)xrealimagspecmagphaseZ real_compressZ imag_compressr#r#r$power_compresss    rcCsTt||}t|}t|}|d}|t|}|t|}t||gdS)aUncompresses the power of a compressed complex spectrogram. Args: real (torch.Tensor): Compressed real component. imag (torch.Tensor): Compressed imaginary component. Returns: torch.Tensor: Uncompressed complex spectrogram. g @r`rr)rzr{r|r}r~Zreal_uncompressZimag_uncompressr#r#r$power_uncompress s   rFc Cs|j}|j}|j}|j}|dkr8tj|dd|j}n4|dkrXtj|dd|j}nt d|ddStj ||||||ddS) aXComputes the Short-Time Fourier Transform (STFT) of an audio signal. Args: x (torch.Tensor): Input audio signal. args (Namespace): Configuration arguments containing window type and lengths. center (bool): Whether to center the window. Returns: torch.Tensor: The computed STFT of the input signal. hammingFperiodichanningz In STFT,  is not supported!N)centerwindowreturn_complex) win_typewin_lenwin_incfft_lenr.hamming_windowrpdevice hann_windowr>stft)ryargsrrrrrrr#r#r$r!s rcCs|j}|j}|j} |j} |dkr8tj|dd|j} n4|dkrXtj|dd|j} nt d|ddSz"tj || | || ||||dd } Wn4t |} tj | | | || ||||dd } YnX| S) aComputes the inverse Short-Time Fourier Transform (ISTFT) of a complex spectrogram. Args: x (torch.Tensor): Input complex spectrogram. args (Namespace): Configuration arguments containing window type and lengths. slen (int, optional): Length of the output signal. center (bool): Whether to center the window. normalized (bool): Whether to normalize the output. onsided (bool, optional): If True, computes only the one-sided transform. return_complex (bool): If True, returns complex output. Returns: torch.Tensor: The reconstructed audio signal from the spectrogram. rFrrz In ISTFT, rN) n_fft hop_length win_lengthrr normalizedonesidedlengthr) rrrrr.rrprrr>istftview_as_complex)ryrZslenrrZonsidedrrrrrroutputZ x_complexr#r#r$r>s<     rc CsB|j|jd}|j|jd}tjjj|d|||j|j|jdS)a%Computes the filter bank features from an audio signal. Args: audio_in (torch.Tensor): Input audio signal. args (Namespace): Configuration arguments containing window length, shift, and sampling rate. Returns: torch.Tensor: Computed filter bank features. ig?)dither frame_length frame_shift num_mel_binssample_frequency window_type) r sampling_rater torchaudio compliancekaldifbanknum_melsr)Zaudio_inrrrr#r#r$ compute_fbankis r)r)TT)r2)r^)F)NFFNF)! __future__rrrr.torch.nnnnnumpyrmjoblibrrrrsysrr MAX_WAV_VALUEEPSr%r3r8rQrUrZr]rerqrrrrrr#r#r#r$s6      % 1     +