Learning Multi-Instance Enriched Image Representations via Non-Greedy Ratio Maximization of the L1-norm Distances
Abstract: Multi-instance learning (MIL) has demonstrated its usefulness in many real-world image applications in recent
years. However, two critical challenges prevent one from effectively using MIL in practice. First, existing MIL methods
routinely model the predictive targets using the instances of
input images, but rarely utilize an input image as a whole.
As a result, the useful information conveyed by the holistic
representation of an input image could be potentially lost.
Second, the varied numbers of the instances of the input images in a data set make it infeasible to use traditional learning models that can only deal with single-vector inputs. To
tackle these two challenges, in this paper we propose a novel image representation learning method that can integrate
the local patches (the instances) of an input image (the bag)
and its holistic representation into one single-vector representation. Our new method first learns a projection to preserve both global and local consistencies of the instances of
an input image. It then projects the holistic representation
of the same image into the learned subspace for information
enrichment. Taking into account the content and characterization variations in natural scenes and photos, we develop
an objective that maximizes the ratio of the summations of
a number of ℓ1-norm distances, which is difficult to solve
in general. To solve our objective, we derive a new efficient
non-greedy iterative algorithm and rigorously prove its convergence. Promising results in extensive experiments have
demonstrated improved performances of our new method
that validate its effectiveness.
0 Replies
Loading