generate a video from an image with a text prompt
Demo for multimodal understanding and generation
Generate synchronized video from video and audio