Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression
Abstract: Functions of the ratio of the densities $p/q$ are widely used in machine learning to quantify the discrepancy between the two distributions $p$ and $q$. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well-separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators do perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities $\{m_k\}_{k=1}^K$ and trains a multi-class logistic regression to classify the samples from $p, q$ and $\{m_k\}_{k=1}^K$ into $K+2$ classes. We show that if these auxiliary densities are constructed such that they overlap with $p$ and $q$, then a multi-class logistic regression
allows for estimating $\log p/q$ on the domain of any of the $K+2$ distributions and resolves the distribution shift problems of the current state-of-the-art methods.
We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Added URL to the code repo
- Updated the manuscript to incorporate the changes made during the review process. Details are listed below:
In response to the follow-up from reviewer 58JZ:
- Updated the discussion section to further clarify the impact of the auxiliary distribution on the estimator.
In response to reviewer jg6m's requested changes, we have made the following changes:
- Updated the abstract and discussion section to clarify the finite-sample performance claim.
- Updated the introduction to highlight the practical relevance of addressing density chasm.
- Updated the key contributions to highlight that MDRE 'resolves' the distribution-shift issue by construction.
- Updated the first sentence of section 3.3 to clarify that it is an example of Remark 3.5.
In response to reviewer 58JZ's requested changes, we have made the following changes:
- Updated section 4.3 to clarify in which experiments parameter-sharing is allowed.
- Updated the sign in eq 6.
- Updated the labels in Figure 1.
In response to reviewer 775Q's requested changes, we have made the following changes:
- Updated remark 3.3 on page 6
- Added new appendix A with the consistency proof.
Code: https://blackswhan.com/mdre/
Assigned Action Editor: ~Lijun_Zhang1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 585
Loading