Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement Paper • 2409.09642 • Published Sep 15, 2024 • 1
Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement Paper • 2409.09642 • Published Sep 15, 2024 • 1
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24, 2024 • 22
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 351
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 281