o ž‚©e ã@s‚ddlZddlZddlmZmZmZdd„ZddgZdd„Zd d „Z d d „Z ddd„Z ddd„Z ddd„Z ddd„Zddd„ZdS)éN)Ú flatten_listÚ have_emojiÚ have_langidcCsddl}| d¡dS)NrÚpunkt)ÚnltkÚdownload)r©rú#/h2ogpt/src/tts_sentence_parsing.pyÚ setup_nltksr Ú sentence_listÚindexcCstgdd}|S)Nr)r r )Údict)Úsentence_staterrr Úinit_sentence_states rcCs$g}tD] }| ||¡qt|ƒS©N)Ú sentence_keysÚappendÚtuple)rÚretsÚkeyrrr Ú unpack_statesrcGs>ttƒD]\}}t||tƒr||||<q||||<q|Sr)Ú enumeraterÚ isinstanceÚlist)rÚargsÚkeyirrrr Ú pack_states réúcCsàt d|¡}g}g}d}|D]U}|dkrq| ¡r6|dkr*| d |¡¡g}d}q| |¡|t|ƒ7}q|t|ƒ|krX|rP| d |¡¡|g}t|ƒ}q| |¡d}q| |¡|t|ƒ7}q|rn| d |¡¡|S)a* Splits a sentence by spaces into smaller sentences, each with a maximum length of n characters, while preserving whitespace characters like new lines. # 250 due to [!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio. z(\s+)rÚÚ )ÚreÚsplitÚisspacerÚjoinÚlen)ÚsentenceÚnÚwordsÚ sentencesÚcurrent_sentenceÚcurrent_lengthÚwordrrr Úsplit_sentences(s4     r,Fécs„ddl}| ||d…¡}t‡fdd„|Dƒƒ}dd„|Dƒ}|r3|dkr3|d|…|d|d<|S|dkr@| |d|…¡|S)Nrcsg|]}t|ˆƒ‘qSr)r,©Ú.0Úx©Ú max_lengthrr Ú `sz"_get_sentences..cSsg|]}| ¡r|‘qSr)Ústripr.rrr r3bs)rÚ sent_tokenizerr)ÚresponseÚverboseÚ min_startr2rr(rr1r Ú_get_sentencesZs ýr9c CsÞt|ƒ\}}t||d…|dkrdnd|d}t|ƒdkrJ||d… |d¡}||t|dƒ7}| |d¡t|d|d}|t|||ƒdfS|rftd |¡|d}| d |¡¡|t|||ƒdfSdt|||ƒdfS) Nrr-)r8r7é)r7FÚ T)rr9r$r rÚclean_sentencerr#) r6rÚis_finalr7r r r(Ú index_deltaÚcleaned_sentencerrr Ú get_sentencels " r@cs8|dus t|ƒdkr|rtdƒdStjdd|tjd}tjdd|tjd}tjdd|tjd}| dd¡}| d d ¡}| d d ¡}| d d ¡}| d d¡}| dd¡}| dd¡}| dd¡}| dd¡}| dd¡}tr~ddl‰d ‡fdd„|Dƒ¡}t dd|¡}t dd|¡}|  ¡}|  d¡s¤|  d¡s¤|  d¡s¤|  d¡rª|dd…}|  d ¡s¾|  d!¡s¾|  d"¡s¾|  d#¡rÄ|d$d…}|d%krÊd&}|d'krÐd(}|d)krÖd*}|d+krÜd,}|d-krâd.}|d/krèd0}|d1krîd2}|d3krôd4}|d5krúd6}|d7krd8}t|ƒdkr|rtd9ƒdS|rtd:|ƒ|S);Nrzempty sentencerz ```.*?```)Úflagsz`.*?`z\(.*?\)z```z...r;ú(ú)zDr. zDoctor z w/ z with zH2O.aizaych two oh ae eye.zH2O.AIzh2o.aicsg|] }ˆ |¡s|‘qSr)Úis_emojir.©Úemojirr r3Ÿsz"clean_sentence..z (\d+)\.(\d+)z \1 dot \2u([^-]|\w)(\.|\。|\?|\!)z\1\2z. z? z! z, r:Ú.Ú?ú!ú,éz1.ÚOnez2.ÚTwoz3.ÚThreez4.ÚFourz5.ÚFivez6.ÚSixz7.ÚSevenz8.ÚEightz9.ÚNinez10.ÚTenzEMPTY SENTENCE after processingzSentence for speech: %s) r$Úprintr ÚsubÚDOTALLÚreplacerrFr#r4Ú startswith)r%r7rrEr r<ƒsl          ( (   r<cCsŒtsdSddl}t|ƒdkr<| |¡d ¡}|dkrd}||vr,td|›dƒd}n|}|r:td|›d |›ƒ|Sd}|rDtd ƒ|S) NÚenrr-Úzhzzh-cnz+Detected a language not supported by xtts :z, switching to english for nowz&Language: Predicted sentence language:z , using language for xtts:zPLanguage: Prompt is short or autodetect language disabled using english for xtts)rÚlangidr$Úclassifyr4rV)ÚpromptÚsupported_languagesr7r]Úlanguage_predictedÚlanguagerrr Údetect_languageÎs$ ürc)r)Fr-r)FF)F)Útextwrapr Ú src.utilsrrrr rrrrr,r9r@r<rcrrrr Ús 2  K