All terms
Multimodal
Text-to-Image
Generating a picture from a written description, usually with a diffusion model.
Definition
Text-to-image generation produces a picture from a written prompt, usually with a diffusion model that starts from random noise and gradually cleans it into an image, steered by the meaning of the prompt. It is one of the most popular uses of generative AI, behind systems such as DALL-E and Midjourney and open models like Stable Diffusion. The same approach also supports many editing tasks.