All terms
Multimodal
3D Generation
Creating three-dimensional shapes from text, a single image, or multiple views.
Definition
3D generation produces three-dimensional content—such as digital sculptures or other formats that store shape and depth—from text prompts, a single image, or multiple views. Some methods reuse what an existing image-making AI already knows, while others learn directly from 3D data. It extends image generation into the spatial world and is used in games, film, augmented and virtual reality, and robot simulation.