All terms
Multimodal
Video Generation
Synthesizing video clips from text, a starting image, or prior frames.
Definition
Video generation synthesizes video sequences from text prompts, a starting image, or prior video context. The central difficulty is maintaining temporal consistency and realistic motion across frames at acceptable cost. The field is advancing quickly, with offerings from several major labs, and quality, length, motion coherence, and controllability continue to improve. It overlaps closely with text-to-video methods.