-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 31.3k • 413 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 2 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2410.06885
-
Movie Gen: A Cast of Media Foundation Models
Paper • 2410.13720 • Published • 99 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 47 -
Flow Matching for Generative Modeling
Paper • 2210.02747 • Published • 3 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 13
-
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Paper • 2502.18924 • Published • 15 -
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Paper • 2409.00750 • Published • 4 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 47 -
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Paper • 2409.10058 • Published • 2
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 66 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 37 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 64
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 31.3k • 413 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 2 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
-
Movie Gen: A Cast of Media Foundation Models
Paper • 2410.13720 • Published • 99 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 47 -
Flow Matching for Generative Modeling
Paper • 2210.02747 • Published • 3 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 13
-
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Paper • 2502.18924 • Published • 15 -
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Paper • 2409.00750 • Published • 4 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 47 -
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Paper • 2409.10058 • Published • 2
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 66 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 37 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 64