Mark My Words: Repurposing LLMs for Specialized Domains via Ability Tokens

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LLM Adaptation, Specialized Domains
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a markup-style language extension to adapt pretrained general LMs to speicialized domains.
Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language understanding and generation. However, their capabilities wane in highly specialized domains, such as biomedical sciences, which are sparsely represented in the pretraining corpus. In this work, we explore how to repurpose general LMs as specialized task solvers. We introduce a novel and systematic framework for adding markup-style language extensions (which we term *`ability tokens"*) to pretrained LMs. These tokens are learned embeddings appended to the LM's embedding matrix, preserving the pretrained weights and the model's original capabilities. We introduce two types of ability tokens: *domain markers*, which delimit and aid in the processing of specialized inputs (e.g., molecular formulas), and *functional tokens*, which guide the model on how to leverage these inputs to solve specific tasks (e.g., predicting molecule properties). During inference, these tokens are inserted into the input text to wrap specialized information and provide problem context. Experimental results show that (i) our markup extensions significantly boost performance in various specialized domains, such as protein and molecular property prediction, matching and outperforming expert models specifically tailored to these tasks, and (ii) we can learn the ability tokens separately and combine them in a modular fashion, achieving zero-shot generalization to unseen tasks. Overall, our framework offers a promising method to enhance LMs with domain-specific knowledge while maintaining their general capacities.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5697
Loading