B ggZ,@s2ddlZddlZddlmZGdddejZdS)Ncs@eZdZdZfddZddZddZdd Zd d ZZ S) network_wrappera/ A wrapper class for loading different neural network models for tasks such as speech enhancement (SE), speech separation (SS), and target speaker extraction (TSE). It manages argument parsing, model configuration loading, and model instantiation based on the task and model name. cs$tt|d|_d|_d|_dS)z\ Initializes the network wrapper without any predefined model or arguments. N)superr__init__args config_path model_name)self) __class__[/mnt/nas/mit_sg/shengkui.zhao/speech_codec/clear_speech_local/clearvoice/network_wrapper.pyr sznetwork_wrapper.__init__cCsdd|jd|_td}|jddtjd|jdtdd d |jd d td dd|jddtdd|jddtdd|jdddtdd|jddtddd|jdtd d!|jd"d#td$d%d|jd&d'td(d)d|jd*d+tdd,d|jd-d.td/d0d|jd1d2td3d4d|jd5d6td7d8d|jd9d:td(d;d|jdd?d|d|jg|_ d@S)Az Loads the arguments for the speech enhancement task using a YAML config file. Sets the configuration path and parses all the required parameters such as input/output paths, model settings, and FFT parameters. zconfig/inference/z.yamlSettingsz--configzConfig file path)helpactionz--mode inferencezModes: train or inference)typedefaultr z--checkpoint-dircheckpoint_dirzcheckpoints/FRCRN_SE_16KzCheckpoint directory)destrrr z --input-path input_pathzPath for noisy audio input)rrr z --output-dir output_dirz#Directory for enhanced audio outputz --use-cudause_cudazEnable CUDA (1=True, 0=False))rrrr z --num-gpunum_gpuzNumber of GPUs to usez --networkz2Select SE models: FRCRN_SE_16K, MossFormer2_SE_48K)rr z--sampling-rate sampling_ratei>z Sampling ratez--one-time-decode-lengthone_time_decode_length<z(Max segment length for one-pass decodingz--decode-window decode_windowzDecoding chunk sizez --window-lenZwin_lenizWindow length for framingz --window-incZwin_incdzWindow shift for framingz --fft-lenZfft_leniz!FFT length for feature extractionz --num-melsZnum_melszNumber of mel-spectrogram binsz --window-typeZwin_typeZhammingzWindow type: hamming or hanningN) rr yamlargparseArgumentParser add_argumentActionConfigFilestrint parse_argsr)rparserr r r load_args_ses& znetwork_wrapper.load_args_secCshd|jd|_td}|jd|jdtjd|jdtdd d |jd d td dd|jddtdd|jddtdd|jdddtdd|jddtddd|jdtd d!|jd"d#td$d%d|jd&d'td(d)d|jd*d+td,d-d|jd.d/tdd0d|jd1d2td3d4d|jd5d6td7d8d|jd9d:td7d;d|jdd?d|d|jg|_ d@S)Aa  Loads the arguments for the speech separation task using a YAML config file. This method sets parameters such as input/output paths, model configurations, and encoder/decoder settings for the MossFormer2-based speech separation model. zconfig/inference/z.yamlr z--configzConfig file path)rr rz--moderzModes: train or inference)rrr z--checkpoint-dirrzcheckpoints/FRCRN_SE_16KzCheckpoint directory)rrrr z --input-pathrzPath for mixed audio input)rrr z --output-dirrz$Directory for separated audio outputz --use-cudarrzEnable CUDA (1=True, 0=False))rrrr z --num-gpurzNumber of GPUs to usez --networkz$Select SS models: MossFormer2_SS_16K)rr z--sampling-rateri>z Sampling ratez --num-spksZnum_spkszNumber of speakers to separatez--one-time-decode-lengthrrz(Max segment length for one-pass decodingz--decode-windowrzDecoding chunk sizez--encoder_kernel-sizeZencoder_kernel_sizezKernel size for Conv1D encoderz--encoder-embedding-dimZencoder_embedding_dimiz Embedding dimension from encoderz--mossformer-squence-dimZmossformer_sequence_dimz!Sequence dimension for MossFormerz--num-mossformer_layerZnum_mossformer_layerzNumber of MossFormer layersN) rrrrr r!r"r#r$r)rr%r r r load_args_ss8s& znetwork_wrapper.load_args_sscCs$d|jd|_td}|jd|jdtjd|jdtdd d |jd d td dd|jddtdd|jddtdd|jdddtdd|jddtddd|jdtd d!|jd"d#td$d%d|jd&td'd!|jd(td)d!|jd*d+td,d-d|jd.d/tdd0d| d|jg|_ d1S)2z Loads the arguments for the target speaker extraction (TSE) task using a YAML config file. Parameters include input/output paths, CUDA configurations, and decoding parameters. zconfig/inference/z.yamlr z--configzConfig file path)rr rz--moderzModes: train or inference)rrr z--checkpoint-dirrz%checkpoint_dir/AV_MossFormer2_TSE_16KzCheckpoint directory)rrrr z --input-pathrzPath for mixed audio input)rrr z --output-dirrz$Directory for separated audio outputz --use-cudarrzEnable CUDA (1=True, 0=False))rrrr z --num-gpurzNumber of GPUs to usez --networkz