UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published 11 days ago • 35
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation Paper • 2508.19209 • Published 11 days ago • 38
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Paper • 2410.10733 • Published Oct 14, 2024 • 8
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13 • 8
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published 23 days ago • 141
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 150
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 93
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 112
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11 • 99
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 61
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model Paper • 2505.17561 • Published May 23 • 31
Scaling Image and Video Generation via Test-Time Evolutionary Search Paper • 2505.17618 • Published May 23 • 42
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation Paper • 2501.09503 • Published Jan 16 • 14