OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper โข 2402.17553 โข Published Feb 27, 2024 โข 26
Grounding Language Models to Images for Multimodal Inputs and Outputs Paper โข 2301.13823 โข Published Jan 31, 2023 โข 2
Generating Images with Multimodal Language Models Paper โข 2305.17216 โข Published May 26, 2023 โข 7