U ggZ,@s2ddlZddlZddlmZGdddejZdS)Ncs@eZdZdZfddZddZddZdd Zd d ZZ S) network_wrappera/ A wrapper class for loading different neural network models for tasks such as speech enhancement (SE), speech separation (SS), and target speaker extraction (TSE). It manages argument parsing, model configuration loading, and model instantiation based on the task and model name. cs$tt|d|_d|_d|_dS)z\ Initializes the network wrapper without any predefined model or arguments. N)superr__init__args config_path model_name)self __class__[/mnt/nas/mit_sg/shengkui.zhao/speech_codec/clear_speech_local/clearvoice/network_wrapper.pyr sznetwork_wrapper.__init__cCsdd|jd|_td}|jddtjd|jdtdd d |jd d td dd|jddtdd|jddtdd|jdddtdd|jddtddd|jdtd d!|jd"d#td$d%d|jd&d'td(d)d|jd*d+tdd,d|jd-d.td/d0d|jd1d2td3d4d|jd5d6td7d8d|jd9d:td(d;d|jdd?d|d|jg|_ d@S)Az Loads the arguments for the speech enhancement task using a YAML config file. Sets the configuration path and parses all the required parameters such as input/output paths, model settings, and FFT parameters. config/inference/.yamlSettings--configConfig file path)helpaction--mode inferenceModes: train or inferencetypedefaultr--checkpoint-dircheckpoint_dircheckpoints/FRCRN_SE_16KCheckpoint directorydestrrr --input-path input_pathzPath for noisy audio inputrrr --output-dir output_dirz#Directory for enhanced audio output --use-cudause_cudaEnable CUDA (1=True, 0=False)rrrr --num-gpunum_gpuNumber of GPUs to use --networkz2Select SE models: FRCRN_SE_16K, MossFormer2_SE_48Krr--sampling-rate sampling_rate> Sampling rate--one-time-decode-lengthone_time_decode_length<(Max segment length for one-pass decoding--decode-window decode_windowDecoding chunk sizez --window-lenZwin_lenizWindow length for framingz --window-incZwin_incdzWindow shift for framingz --fft-lenZfft_lenz!FFT length for feature extractionz --num-melsZnum_melszNumber of mel-spectrogram binsz --window-typeZwin_typeZhammingzWindow type: hamming or hanningN rr yamlargparseArgumentParser add_argumentActionConfigFilestrint parse_argsrrZparserr r r load_args_ses& znetwork_wrapper.load_args_secCshd|jd|_td}|jd|jdtjd|jdtdd d |jd d td dd|jddtdd|jddtdd|jdddtdd|jddtddd|jdtd d!|jd"d#td$d%d|jd&d'td(d)d|jd*d+td,d-d|jd.d/tdd0d|jd1d2td3d4d|jd5d6td7d8d|jd9d:td7d;d|jdd?d|d|jg|_ d@S)Aa  Loads the arguments for the speech separation task using a YAML config file. This method sets parameters such as input/output paths, model configurations, and encoder/decoder settings for the MossFormer2-based speech separation model. r rrrrrrrrrrrrrrrrr r!Path for mixed audio inputr"r#r$$Directory for separated audio outputr%r&r'r(r)r*r+r,r-z$Select SS models: MossFormer2_SS_16Kr.r/r0r1r2z --num-spksZnum_spkszNumber of speakers to separater3r4r5r6r7r8r9z--encoder_kernel-sizeZencoder_kernel_sizezKernel size for Conv1D encoderz--encoder-embedding-dimZencoder_embedding_dimr;z Embedding dimension from encoderz--mossformer-squence-dimZmossformer_sequence_dimz!Sequence dimension for MossFormerz--num-mossformer_layerZnum_mossformer_layerzNumber of MossFormer layersNr<rDr r r load_args_ss8s& znetwork_wrapper.load_args_sscCs$d|jd|_td}|jd|jdtjd|jdtdd d |jd d td dd|jddtdd|jddtdd|jdddtdd|jddtddd|jdtd d!|jd"d#td$d%d|jd&td'd!|jd(td)d!|jd*d+td,d-d|jd.d/tdd0d| d|jg|_ d1S)2z Loads the arguments for the target speaker extraction (TSE) task using a YAML config file. Parameters include input/output paths, CUDA configurations, and decoding parameters. r rrrrrFrrrrrrz%checkpoint_dir/AV_MossFormer2_TSE_16Krrr r!rGr"r#r$rHr%r&r'r(r)r*r+r,r-z