Abstract: Sorting the values of an array is a fundamental routine in statistics and machine learning, one that is used to compute rank-based statistics, cumulative distribution functions (CDFs) and quantiles, or to select closest neighbors and preferred responses. Sorting algorithms carry out, however, sequences of operations that are intrinsically combinatorial or recursive, making them not difficult to backpropagate through. We propose in this paper a framework that builds upon optimal transport (OT) theory to provide approximate yet differentiable sorting operations. To do so, we leverage the fact that sorting can be seen as a particular instance of the OT problem on the real line between the values stored in the array of interest and a family of predefined sorted values, notably 1,...,n if the input array has n elements. Building on this link between OT and sorting, we also propose generalized CDFs and quantile operators by varying the number of bins m to which the input sequence is compared to. Because this amounts to using the so-called Kantorovich formulation of OT, we call these quantities split-sorts, CDFs and quantiles. We recover differentiable algorithms by approximating that OT problem using an entropic regularization, solved using a few Sinkhorn iterations. We demonstrate the usefulness of these operators in various learning settings, notably to minimize median errors and by defining a new type of activation function for neural networks: one that instead of applying a pointwise normalization to inputs transforms using mean and variance does so so that they match those of a predefined (or alternatively learned) quantile profile.
CMT Num: 3730
Code Link: https://github.com/google-research/google-research/tree/master/soft_sort
0 Replies
Loading