Contrastive estimation reveals topic posterior information to linear models

Christopher Tosh; Akshay Krishnamurthy; Daniel Hsu

Contrastive estimation reveals topic posterior information to linear models

Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: contrastive learning, self-supervised learning, representation learning, theory

Abstract: Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data. In the context of document classification under topic modeling assumptions, we prove that contrastive learning is capable of recovering a representation of documents that reveals their underlying topic posterior information to linear models. We apply this procedure in a semi-supervised setup and demonstrate empirically that linear classifiers with these representations perform well in document classification tasks with very few training examples.

One-sentence Summary: This paper demonstrates that contrastive learning on text data produces representations that are linearly related to underlying topic structure.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=SuIbwkZkkE

6 Replies

Loading