Skip to main content
All terms
Multimodal

3D Generation

Creating three-dimensional shapes from text, a single image, or multiple views.

Definition

3D generation produces three-dimensional content—such as digital sculptures or other formats that store shape and depth—from text prompts, a single image, or multiple views. Some methods reuse what an existing image-making AI already knows, while others learn directly from 3D data. It extends image generation into the spatial world and is used in games, film, augmented and virtual reality, and robot simulation.