The Phase Transition Phenomenon of Shuffled Regression

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: message passing, phase transition, permuted linear regression
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: This paper studies the phase transition phenomenon inherent within the shuffled (permuted) regression problem, which has found applications in databases, privacy, and data analysis, etc. For the permuted regression task: $\mathbf{Y} = \mathbf{\Pi}^{\natural}\mathbf{X}\mathbf{B}^{\natural}$, the goal is to recover the permutation matrix $\mathbf{\Pi}^{\natural}$ as well as the coefficient matrix $\mathbf{B}^{\natural}$. It has been empirically observed in prior studies that, when recovering $\mathbf{\Pi}^{\natural}$, there exists a phase transition phenomenon: the error rate drops to zero rapidly once the parameters reach certain thresholds. In this study, we aim to precisely identify the locations of the phase transition points by leveraging techniques from {\em message passing} (MP). \noindent In our analysis, we first transform the permutation recovery problem into a probabilistic graphical model. Then we leverage the analytical tools rooted in the message passing (MP) algorithm and derive an equation to track the convergence of the MP algorithm. By linking this equation to the branching random walk process, we are able to characterize the impact of the \emph{signal-to-noise-ratio} (SNR) on the permutation recovery. Depending on whether the signal is given or not, we separately investigate the oracle case and the non-oracle case. The bottleneck in identifying the phase transition regimes lies in deriving closed-form formulas for the corresponding critical points, but only in rare scenarios can one obtain such precise expressions. To tackle this challenge, we propose the Gaussian approximation method, which allows us to obtain the closed-form formulas in almost all scenarios. In the oracle case, our method can fairly accurately predict the phase transition SNR. In the non-oracle case, our proposed algorithm can predict the maximum allowed number of permuted rows and uncover its dependency on the sample number. Numerical experiments suggest that the observed phase transition points are well aligned with our theoretical predictions. This is an exciting exploration. We hope that our study will motivate exploiting message passing algorithms (and related techniques) as a new effective tool for studying permuted regression problems. For example, identifying the phase transition locations for sparse permuted recovery~\citep{zhang2021sparse} would be a highly interesting (and challenging) application. In this paper, we also briefly illustrate the use of the message passing algorithm for {\em partial correspondence recovery}, an important problem that deserves its own thorough study.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4607
Loading