Know What You See: Grounded localization of product components

Manan Soni; Abinesh Kanagarajan; Shyam Mohan

Know What You See: Grounded localization of product components

Manan Soni, Abinesh Kanagarajan, Shyam Mohan

Published: 18 Apr 2026, Last Modified: 22 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: component localization, component detection, object detection, open world object detection

TL;DR: We introduce Know What You See (KWYS), using textual knowledge to build a hierarchical component taxonomy that guides open-vocabulary detection, significantly improving product component localization while reducing hallucinations

Abstract: Many real-world decisions about products (e.g. how they function, how they should be used) depend on their components rather than the object as a whole. Accurately identifying product component has applications like automated defect detection, visual spare-parts search, and verified assembly. However, existing object detectors treat components as isolated objects, ignoring their inherent structure. We propose Know What You See (KWYS), where we localize components by grounding them using a textual knowledge base (e.g., manuals or web descriptions). KWYS converts raw text into a hierarchical component taxonomy, which then guides an open-vocabulary object detector using a hierarchical verification algorithm. We evaluate on 1,000 product images across 5 diverse categories, improving component localization accuracy by 11% along with reducing component hallucinations by 25%.

Submission Type: Emerging

Copyright Form: pdf

Submission Number: 244

Loading