Abstract: Learning group representation is a commonly concerned issue in tasks where the basic unit is a group, set or sequence.
The computer vision community tries to tackle it by aggregating the elements in a group based on an indicator either defined by human such as the quality or saliency of an element, or generated by a black box such as the attention score or output of a RNN.
This article provides a more essential and explicable view.
We claim the most significant indicator to show whether the group representation can be benefited from an element is not the quality, or an inexplicable score, but the \textit{discrimiability}.
Our key insight is to explicitly design the \textit{discrimiability} using embedded class centroids on a proxy set,
and show the discrimiability distribution \textit{w.r.t.} the element space can be distilled by a light-weight auxiliary distillation network.
This processing is called \textit{discriminability distillation learning} (DDL).
We show the proposed DDL can be flexibly plugged into many group based recognition tasks without influencing the training procedure of the original tasks. Comprehensive experiments on set-to-set face recognition and action recognition valid the advantage of DDL on both accuracy and efficiency, and it pushes forward the state-of-the-art results on these tasks by an impressive margin.
Code: https://www.dropbox.com/sh/j4gx4d8qebawl1i/AAAuKPircw50mbKHE03svpBda?dl=0
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2008.10850/code)
Original Pdf: pdf
9 Replies
Loading