WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
AI & ML interests
None defined yet.
Recent Activity
View all activity
Benchmarking Vision Language Models for Cultural Understanding
VisMin (visual minimal-change ) is a controlled benchmark and fine-tuned models trained on vismin training set e.g. VisMin-CLIP and VisMin-Idefics2.
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Benchmarking Vision Language Models for Cultural Understanding
Official artifacts for the paper, The Promise of RL for Autoregressive Image Editing (EARL).
VisMin (visual minimal-change ) is a controlled benchmark and fine-tuned models trained on vismin training set e.g. VisMin-CLIP and VisMin-Idefics2.