Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Published: 04 Apr 2023, Last Modified: 04 Apr 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Functions of the ratio of the densities $p/q$ are widely used in machine learning to quantify the discrepancy between the two distributions $p$ and $q$. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well-separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators do perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities $\{m_k\}_{k=1}^K$ and trains a multi-class logistic regression to classify the samples from $p, q$ and $\{m_k\}_{k=1}^K$ into $K+2$ classes. We show that if these auxiliary densities are constructed such that they overlap with $p$ and $q$, then a multi-class logistic regression allows for estimating $\log p/q$ on the domain of any of the $K+2$ distributions and resolves the distribution shift problems of the current state-of-the-art methods. We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Added URL to the code repo - Updated the manuscript to incorporate the changes made during the review process. Details are listed below: In response to the follow-up from reviewer 58JZ: - Updated the discussion section to further clarify the impact of the auxiliary distribution on the estimator. In response to reviewer jg6m's requested changes, we have made the following changes: - Updated the abstract and discussion section to clarify the finite-sample performance claim. - Updated the introduction to highlight the practical relevance of addressing density chasm. - Updated the key contributions to highlight that MDRE 'resolves' the distribution-shift issue by construction. - Updated the first sentence of section 3.3 to clarify that it is an example of Remark 3.5. In response to reviewer 58JZ's requested changes, we have made the following changes: - Updated section 4.3 to clarify in which experiments parameter-sharing is allowed. - Updated the sign in eq 6. - Updated the labels in Figure 1. In response to reviewer 775Q's requested changes, we have made the following changes: - Updated remark 3.3 on page 6 - Added new appendix A with the consistency proof.
Assigned Action Editor: ~Lijun_Zhang1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 585