Generate audio for a video using captions and descriptions
Convert and upload Hugging Face models to MLX format