MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Savya Khosla; Kushal Kafle; Simon Jenni; Handong Zhao; John Collomosse; Jing Shi

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Savya Khosla, Kushal Kafle, Simon Jenni, Handong Zhao, John Collomosse, Jing Shi

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: decoder-only LLMs, representation learning, text infilling, generation, unified model

TL;DR: We propose MAGNET, a method that adapts a pretrained LLM for representation learning and infilling tasks while preserving its original text generation capabilities.

Abstract:

While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, these unidirectional and bidirectional models are typically trained independently with distinct objectives (generation or representation learning) thereby missing the potential opportunity for one objective to enhance the other. In this work, we introduce MAGNET, an adaptation of decoder-only LLMs that enhances their capabilities in generating robust representations and infilling missing text spans, while retaining their original text generation capabilities. MAGNET employs three self-supervised training objectives and introduces an attention mechanism that combines bidirectional and causal attention, enabling unified training across all objectives. We show that LLMs adapted using MAGNET can outperform state-of-the-art text encoders on token-level and sentence-level representation learning tasks. We also demonstrate that MAGNET enhances the base LLM's ability to generate contextually appropriate text infillings by enabling it to take future context into consideration. Lastly, we show that, unlike other bidirectional language models for representation learning, the LLMs adapted using MAGNET can still perform open-ended text generation.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3348

Loading