Fusing object detection and region appearance for image-text alignment

Luca Del Pero, Philip Lee, James Magahern, Emily Hartley, Kobus Barnard, Ping Wang, Atul Kanaujia, Niels Haering

2011 (modified: 05 Nov 2022)ACM Multimedia 2011Readers: Everyone

Abstract: We present a method for automatically aligning words to image regions that integrates specific object classifiers (e.g., "car" detectors) with weak models based on appearance features. Previous strategies have largely focused on the latter, and thus have not exploited progress on object category recognition. Hence, we augment region labeling with object detection, which simplifies the problem by reliably identifying a subset of the labels, and thereby reducing correspondence ambiguity overall. Comprehensive testing on the SAIAPR TC dataset shows that principled integration of object detection improves the region labeling task.

0 Replies