Xiaojian Ma's picture

2 5 36

Xiaojian Ma

jeasinema

·

http://jeasinema.github.io

AI & ML interests

None yet

Organizations

authored a paper 6 months ago

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20 • 41

authored a paper 10 months ago

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 52

authored 3 papers about 1 year ago

Task-oriented Sequential Grounding in 3D Scenes

Paper • 2408.04034 • Published Aug 7, 2024 • 8

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Paper • 2407.05282 • Published Jul 7, 2024 • 15

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Paper • 2407.00114 • Published Jun 27, 2024 • 13

authored a paper over 1 year ago

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18, 2024 • 13

authored 2 papers almost 2 years ago

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Paper • 2311.05997 • Published Nov 10, 2023 • 37

MindAgent: Emergent Gaming Interaction

Paper • 2309.09971 • Published Sep 18, 2023 • 13