TL;DR: We give the algorithm for discrepancy minimization which runs in input-sparsity time.
Abstract: A recent work by [Larsen, SODA 2023] introduced a faster combinatorial alternative to Bansal's SDP algorithm for finding a coloring $x \in \\{-1, 1\\}^n$ that approximately minimizes the discrepancy $\mathrm{disc}(A, x) := \\| A x \\|_{\infty}$ of a real-valued $m \times n$ matrix $A$. Larsen's algorithm runs in $\widetilde{O}(mn^2)$ time compared to Bansal's $\widetilde{O}(mn^{4.5})$-time algorithm, with a slightly weaker logarithmic approximation ratio in terms of the hereditary discrepancy of $A$ [Bansal, FOCS 2010]. We present a combinatorial $\widetilde{O}(\mathrm{nnz}(A) + n^3)$-time algorithm with the same approximation guarantee as Larsen's, optimal for tall matrices where $m = \mathrm{poly}(n)$. Using a more intricate analysis and fast matrix multiplication, we further achieve a runtime of $\widetilde{O}(\mathrm{nnz}(A) + n^{2.53})$, breaking the cubic barrier for square matrices and surpassing the limitations of linear-programming approaches [Eldan and Singh, RS\&A 2018]. Our algorithm relies on two key ideas: (i) a new sketching technique for finding a projection matrix with a short $\ell_2$-basis using implicit leverage-score sampling, and (ii) a data structure for efficiently implementing the iterative Edge-Walk partial-coloring algorithm [Lovett and Meka, SICOMP 2015], and using an alternative analysis to enable ``lazy'' batch updates with low-rank corrections. Our results nearly close the computational gap between real-valued and binary matrices, for which input-sparsity time coloring was recently obtained by [Jain, Sah and Sawhney, SODA 2023].
Lay Summary: When you split a collection of items into two groups—say red and blue—you often want every predefined subset to be as evenly colored as possible. Mathematicians call the maximum imbalance across all subsets the discrepancy of the coloring. Discrepancy minimization is vital in areas ranging from computational geometry to data privacy, yet the best general-purpose algorithms were far too slow for today’s large, sparse data sets, sometimes taking days to finish. We introduce a new, purely combinatorial algorithm that balances real-valued matrices which enjoys the runtime depends on sparsity of the input data. These advances let practitioners generate low-discrepancy colorings for million-row problems in minutes instead of hours, unlocking faster solutions in optimization, randomized algorithms, and data analysis.
Primary Area: Optimization->Discrete and Combinatorial Optimization
Keywords: combinatorial optimization, algorithmic discrepancy theory, sketching, input-sparsity time
Submission Number: 816
Loading