All terms
Multimodal
Object Detection
Finding and labeling objects in an image with bounding boxes.
Definition
Object detection finds and locates objects in an image, drawing a box (a bounding box) around each one and labeling what it is—combining naming the object with pinpointing where it sits. Well-known method families include R-CNN, YOLO, and DETR. It is foundational to robotics, self-driving cars, surveillance, and tying words to the right spot in an image within multimodal models.