Semi-supervised regression with skewed data via adversarially forcing the distribution of predicted values

Dae-Woong Jeong; Kiyoung Kim; Changyoung Park; Sehui Han; Woohyung Lim

Semi-supervised regression with skewed data via adversarially forcing the distribution of predicted values

Dae-Woong Jeong, Kiyoung Kim, Changyoung Park, Sehui Han, Woohyung Lim

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Semi-supervised learning, Adversarial, regression

Abstract: Advances in scientific fields including drug discovery or material design are accompanied by numerous trials and errors. However, generally only representative experimental results are reported. Because of this reporting bias, the distribution of labeled result data can deviate from their true distribution. A regression model can be erroneous if it is built on these skewed data. In this work, we propose a new approach to improve the accuracy of regression models that are trained using a skewed dataset. The method forces the regression outputs to follow the true distribution; the forcing algorithm regularizes the regression results while keeping the information of the training data. We assume the existence of enough unlabeled data that follow the true distribution, and that the true distribution can be roughly estimated from domain knowledge or a few samples. During training neural networks to generate a regression model, an adversarial network is used to force the distribution of predicted values to follow the estimated ‘true’ distribution. We evaluated the proposed approach on four real-world datasets (pLogP, Diamond, House, Elevators). In all four datasets, the proposed approach reduced the root mean squared error of the regression by around 55 percent to 75 percent compared to regression models without adjustment of the distribution.

One-sentence Summary: We propose a new approach to improve the regression models trained with a skewed dataset by using a semi-supervised learning framework with an adversarial network to force the distribution of the predicted values to follow the true distribution.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=RiO5shw9eC

12 Replies

Loading