Abstract: We study basic-level categories for describing visual concepts, and empirically observe context-dependant basic level names across thousands of concepts. We propose methods for predicting basic-level names using a series of classification and ranking tasks, producing the first large scale catalogue of basic-level names for hundreds of thousands of images depicting thousands of visual concepts. We also demonstrate the usefulness of our method with a picture-to-word task, showing strong improvement over recent work by Ordonez et al, by modeling of both visual and language context. Our study suggests that a model for naming visual concepts is an important part of any automatic image/video captioning and visual story-telling system.
0 Replies
Loading