Abstract: Recent advancements in language models (LMs) have demonstrated remarkable adaptability across diverse tasks, excelling in both discriminative and generative domains with impressive multitasking capabilities. Recent attention of LMs has shifted towards non-autoregressive diffusion models, leveraging denoising generation for sequence-to-sequence modeling. However, the extent to which current diffusion-based LMs can handle multitasking remains unclear. In this study, we introduce a novel framework tailored to designing a diffusion model for multi-task language modeling. Inspired by latent image diffusion models, our approach involves a general transformer-based diffusion model leveraging pretrained encoders, facilitating multi-task learning with adaptable input embedding encoders. We define a diffusion loss within the trainable decoder's latent space, which interacts with any encoder via a cross-attention mechanism. This framework establishes a flexible non-autoregressive LM capable of handling potentially noisy data by leveraging robust instruction embeddings from encoders, enabling instruction tuning. We demonstrate the efficacy of our model across various setups, including single-task and multi-task scenarios, showing its ability to produce high-quality outputs by effectively utilizing and merging training task information in the continuous latent space.
Paper Type: short
Research Area: Machine Learning for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
0 Replies
Loading