Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures Paper • 1608.06037 • Published Aug 22, 2016 • 1
CIFAR10 to Compare Visual Recognition Performance between Deep Neural Networks and Humans Paper • 1811.07270 • Published Nov 18, 2018 • 1
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 249
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 112
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 522
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 98
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Paper • 2503.07703 • Published Mar 10 • 36
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Paper • 2503.00865 • Published Mar 2 • 65
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 70
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 154
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22, 2024 • 21