everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, these unidirectional and bidirectional models are typically trained independently with distinct objectives (generation or representation learning) thereby missing the potential opportunity for one objective to enhance the other. In this work, we introduce MAGNET, an adaptation of decoder-only LLMs that enhances their capabilities in generating robust representations and infilling missing text spans, while retaining their original text generation capabilities. MAGNET employs three self-supervised training objectives and introduces an attention mechanism that combines bidirectional and causal attention, enabling unified training across all objectives. We show that LLMs adapted using MAGNET can outperform state-of-the-art text encoders on token-level and sentence-level representation learning tasks. We also demonstrate that MAGNET enhances the base LLM's ability to generate contextually appropriate text infillings by enabling it to take future context into consideration. Lastly, we show that, unlike other bidirectional language models for representation learning, the LLMs adapted using MAGNET can still perform open-ended text generation.