Skip to main content
All terms
Multimodal

Document AI

Systems that read, parse, and reason over documents like forms and invoices.

Definition

Document AI covers systems that read, parse, and reason over documents such as forms, contracts, invoices, and reports. They combine text extraction—often via optical character recognition—with layout understanding to capture tables, fields, and structure, increasingly using vision-language models. It feeds document content into automation and retrieval pipelines, and underpins tasks like invoice processing and contract review.