U /¨Âf%#ã@sæddlmZddlZddlZddlZddlZddlmZddl m m Z ddl Z ddl Z ddlZddlZzddlZdZWn8ek r²edƒddlmZddlmZdZYnXddlmZmZmZmZmZmZGd d „d ƒZ dS) é)ÚpartialN)ÚCallableTz5failed to import ttsfrd, use WeTextProcessing instead)Ú NormalizerF)Úcontains_chineseÚ replace_blankÚreplace_corner_markÚremove_bracketÚspell_out_numberÚsplit_paragraphc @sreZdZdeeeeeeedœdd„Zdd„Zd d „Zd d „Z d d„Z ddd„Z dd„Z dd„Z dd„Zdd„ZdS)ÚCosyVoiceFrontEndÚFÚall)Ú get_tokenizerÚfeat_extractorÚcampplus_modelÚspeech_tokenizer_modelÚspk2infoÚinstructÚallowed_specialc Cs4|ƒ|_||_t tj ¡r dnd¡|_t ¡}tjj |_ d|_ tj ||dgd|_ tj ||tj ¡rjdndgd|_tj |¡r”tj||jd|_||_||_t ¡|_t|_|jrt ¡|_tj tj t¡¡} |j  d !| ¡¡d ksôt"d ƒ‚|j #d ¡|j $d ¡|j %d¡nt&d d d |_'t(ƒ|_)dS)NÚcudaÚcpuéZCPUExecutionProvider)Z sess_optionsZ providersZCUDAExecutionProvider)Ú map_locationz4{}/../../pretrained_models/CosyVoice-ttsfrd/resourceTz$failed to initialize ttsfrd resourceZpinyinF)Z remove_erhuaZ full_to_half)*Ú tokenizerrÚtorchÚdevicerÚ is_availableÚ onnxruntimeZSessionOptionsZGraphOptimizationLevelZORT_ENABLE_ALLZgraph_optimization_levelZintra_op_num_threadsZInferenceSessionÚcampplus_sessionÚspeech_tokenizer_sessionÚosÚpathÚexistsÚloadrrrÚinflectÚengineÚinflect_parserÚ use_ttsfrdÚttsfrdZTtsFrontendEngineÚfrdÚdirnameÚabspathÚ__file__Ú initializeÚformatÚAssertionErrorZ set_lang_typeZenable_pinyin_mixZset_breakmodel_indexÚ ZhNormalizerÚ zh_tn_modelÚ EnNormalizerÚ en_tn_model) ÚselfrrrrrrrÚoptionÚROOT_DIR©r7úG/proj/MR_dataset/benson/CosyVoice-main-aug-19/cosyvoice/cli/frontend.pyÚ__init__&s. "     zCosyVoiceFrontEnd.__init__cCsT|jj||jd}tj|gtjd |j¡}tj|jdgtjd |j¡}||fS)N©r©Údtyper) rÚencoderrÚtensorÚint32ÚtorÚshape)r4ÚtextZ text_tokenZtext_token_lenr7r7r8Ú_extract_text_tokenGs z%CosyVoiceFrontEnd._extract_text_tokenc Csªtj|dd}|j d|j ¡dj| ¡ ¡ ¡|j ¡djt j |j dgt j di¡d  ¡ ¡}tj|gtj d |j¡}tj|j dgtj d |j¡}||fS)Né€)Zn_melsrrér;)ÚwhisperZlog_mel_spectrogramrÚrunÚ get_inputsÚnameÚdetachrÚnumpyÚnpÚarrayrAr?ÚflattenÚtolistrr>r@r)r4ÚspeechÚfeatÚ speech_tokenÚspeech_token_lenr7r7r8Ú_extract_speech_tokenMs$ÿÿ  z'CosyVoiceFrontEnd._extract_speech_tokencCsvtj|dddd}||jddd}|j d|j ¡dj|jdd ¡  ¡i¡d  ¡  ¡}t   |g¡ |j¡}|S)NéPré€>)Z num_mel_binsÚditherZsample_frequencyT)ÚdimÚkeepdim©rX)ÚkaldiZfbankÚmeanrrGrHrIÚ unsqueezerrKrNrOrr>r@r)r4rPrQÚ embeddingr7r7r8Ú_extract_spk_embeddingUsý:z(CosyVoiceFrontEnd._extract_spk_embeddingcCsV| |¡jdd dd¡ |j¡}|jdd}tj|jdgtj d |j¡}||fS)NrrZrr;) rÚsqueezeÚ transposer@rr]rr>rAr?)r4rPÚ speech_featÚspeech_feat_lenr7r7r8Ú_extract_speech_feat_s"  z&CosyVoiceFrontEnd._extract_speech_featTc Cs| ¡}t|ƒr¬|jr&|j |d¡}n |j |¡}| dd¡}t|ƒ}t |ƒ}| dd¡}| dd¡}t |ƒ}t   dd |¡}d d „t |t|jj|jd d dddddDƒ}n\|jrÂ|j |d¡}n |j |¡}t||jƒ}dd „t |t|jj|jd ddddddDƒ}|dkr|S|S)NÚinputÚ r Ú.uã€z - u,u[,,]+$u。cSsg|]}|‘qSr7r7©Ú.0Úir7r7r8Ú ssz4CosyVoiceFrontEnd.text_normalize..r:ÚzhrUé<éF)Z token_max_nZ token_min_nZ merge_lenZ comma_splitcSsg|]}|‘qSr7r7rhr7r7r8rk|sÚen)Ústriprr'r)Zget_frd_extra_infor1Ú normalizeÚreplacerrrÚreÚsubr rrr=rr3r r&)r4rBÚsplitÚtextsr7r7r8Útext_normalizees:    þ   þ  z CosyVoiceFrontEnd.text_normalizecCs.| |¡\}}|j|d}||||dœ}|S)Nr^)rBÚtext_lenÚ llm_embeddingÚflow_embedding)rCr)r4Útts_textÚspk_idÚtts_text_tokenÚtts_text_token_lenr^Ú model_inputr7r7r8Ú frontend_sftƒszCosyVoiceFrontEnd.frontend_sftc Csx| |¡\}}| |¡\}}tjjddd|ƒ}| |¡\} } | |¡\} } | |¡} ||||| | | | | | | | dœ }|S)NrVi"V)Ú orig_freqÚnew_freq) rBrxÚ prompt_textÚprompt_text_lenÚllm_prompt_speech_tokenÚllm_prompt_speech_token_lenZflow_prompt_speech_tokenZflow_prompt_speech_token_lenZprompt_speech_featZprompt_speech_feat_lenryrz)rCÚ torchaudioÚ transformsZResamplerdrTr_)r4r{rƒÚprompt_speech_16kr}r~Zprompt_text_tokenZprompt_text_token_lenZprompt_speech_22050rbrcrRrSr^rr7r7r8Úfrontend_zero_shot‰s& ûz$CosyVoiceFrontEnd.frontend_zero_shotcCs*| |d|¡}|d=|d=|d=|d=|S)Nr rƒr„r…r†)rŠ)r4r{r‰rr7r7r8Úfrontend_cross_lingual˜s z(CosyVoiceFrontEnd.frontend_cross_lingualcCs8| ||¡}|d=| |d¡\}}||d<||d<|S)Nryz rƒr„)r€rC)r4r{r|Ú instruct_textrZinstruct_text_tokenZinstruct_text_token_lenr7r7r8Úfrontend_instruct¡s  z#CosyVoiceFrontEnd.frontend_instructN)r Fr )T)Ú__name__Ú __module__Ú __qualname__rÚstrÚboolr9rCrTr_rdrwr€rŠr‹rr7r7r7r8r $s*ùù !   r )!Ú functoolsrrrrKrLrFÚtypingrZtorchaudio.compliance.kaldiZ compliancer[r‡r rsr$r(r'Ú ImportErrorÚprintZtn.chinese.normalizerrr0Ztn.english.normalizerr2Zcosyvoice.utils.frontend_utilsrrrrr r r r7r7r7r8Ús(