Skip to main content
All terms
Multimodal

Video Generation

Synthesizing video clips from text, a starting image, or prior frames.

Definition

Video generation synthesizes video sequences from text prompts, a starting image, or prior video context. The central difficulty is maintaining temporal consistency and realistic motion across frames at acceptable cost. The field is advancing quickly, with offerings from several major labs, and quality, length, motion coherence, and controllability continue to improve. It overlaps closely with text-to-video methods.