Stochastic Optimization of Sorting Networks via Continuous Relaxations

Aditya Grover; Eric Wang; Aaron Zweig; Stefano Ermon

Stochastic Optimization of Sorting Networks via Continuous Relaxations

Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon

Published: 21 Dec 2018, Last Modified: 22 Jun 2025ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Sorting input objects is an important step in many machine learning pipelines. However, the sorting operator is non-differentiable with respect to its inputs, which prohibits end-to-end gradient-based optimization. In this work, we propose NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, where every row sums to one and has a distinct argmax. This relaxation permits straight-through optimization of any computational graph involve a sorting operation. Further, we use this relaxation to enable gradient-based stochastic optimization over the combinatorially large space of permutations by deriving a reparameterized gradient estimator for the Plackett-Luce family of distributions over permutations. We demonstrate the usefulness of our framework on three tasks that require learning semantic orderings of high-dimensional objects, including a fully differentiable, parameterized extension of the k-nearest neighbors algorithm

Keywords: continuous relaxations, sorting, permutation, stochastic computation graphs, Plackett-Luce

TL;DR: We provide a continuous relaxation to the sorting operator, enabling end-to-end, gradient-based stochastic optimization.

Code: [![github](/images/github_icon.svg) ermongroup/neuralsort](https://github.com/ermongroup/neuralsort)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/stochastic-optimization-of-sorting-networks/code)

9 Replies

Loading