Abstract: Attribute-based representations help machine learning models perform tasks based on human understandable concepts, allowing a closer human-machine collaboration. However, learning attributes that accurately reflect the content of an image is not always straightforward, as per-image ground truth attributes are often not available. We propose applying the Multiple Instance Learning (MIL) paradigm to attribute learning (AMIL) while only using class-level labels. We allow the model to under-predict the positive attributes, which may be missing in a particular image due to occlusions or unfavorable pose, but not to over-predict the negative ones, which are almost certainly not present. We evaluate it in the zero-shot learning (ZSL) setting, where training and test classes are disjoint, and show that this also allows to profit from knowledge about the semantic relatedness of attributes. In addition, we apply the MIL assumption to ZSL classification and propose MIL-DAP, an attribute-based zero-shot classification method, based on Direct Attribute Prediction (DAP), to evaluate attribute prediction methods when no image-level data is available for evaluation. Experiments on CUB-200-2011, SUN Attributes and AwA2 show improvements on attribute detection, attribute-based zero-shot classification and weakly supervised part localization.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We would like to thank the reviewers and the editor for their positive response and for offering us this opportunity. We have put our best effort in improving the paper according to the comments. Most importantly, we have changed the title of the paper to “Attribute prediction as multiple instance learning” and edited the text at multiple locations to clarify that our contribution aims towards improving attribute prediction, and that the proposed MIL-DAP ZSL method is not intended to improve on ZSL classification itself, but it is intended as a means to better evaluate the results on the attribute prediction task. We have added some additional ZSL references and further clarified why we are not able to use them in the context of our contribution (e.g. generative ZSL methods are not designed for attribute prediction). We have expanded the ablations section (4.4) in order to discuss the effects of only applying the negative or the positive losses. We have corrected the typos kindly pointed out by the reviewers and carefully read and further edited the manuscript for readability according to the reviewers’ comments. In the revised manuscript we highlight the main changes and additions in blue.
Assigned Action Editor: ~Kui_Jia1
Submission Number: 77