Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21 • 68
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Paper • 2505.13426 • Published May 19 • 13
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 57