Clustering Entity Specific Embeddings Towards a Prescribed Distribution

Connor Heaton; Prasenjit Mitra

Clustering Entity Specific Embeddings Towards a Prescribed Distribution

Connor Heaton, Prasenjit Mitra

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Representation learning, emotion recognition, long-tail partial-label learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A better way for extracting entity specific embeddings and clustering them towards a prescribed distribution

Abstract: Now ubiquitous in deep learning is the transformer architecture, which has advanced the state-of-the-art (SOTA) in a variety of disciplines. When employed with a bidirectional attention mask, a special [CLS] token is often appended to the sequence being processed, serving as a summary of the sequence as a whole once processed. While directly useful in many applications, the processed [CLS] embedding loses utility when asked to perform an entity-specific task given a multi-entity sequence - when processing a multi-speaker dialogue, for example, the [CLS] will describe the entire dialogue not a particular utterance. Existing approaches to address this often either involve redundant computation or non-trivial post-processing outside of the transformer. We propose a general, efficient method for deriving entity-specific embeddings \textit{completely within} the transformer architecture, and demonstrate how the approach yields SOTA results in the domains of natural language processing (NLP) and sports analytics (SA), an exciting, relatively unexplored problem space. Furthermore, we propose a novel approach for deep-clustering towards a prescribed distribution in the absence of labels. Previous approaches towards distribution aware clustering required ground-truth labels, which are not always available. In addition to uncovering interesting signal in the domain of sport, we show how our distribution-aware clustering method yields new cluster-based SOTA on the task of long-tail partial-label learning (LT-PLL). Code available upon publication.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6373

Loading