Adapting Chat Language Models Using Only Target Unlabeled Language Data

TMLR Paper4876 Authors

16 May 2025 (modified: 10 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Vocabulary expansion (VE) is the de-facto approach to language adaptation of large language models (LLMs) by adding new tokens and continuing pre-training on target data. While this is effective for base models trained on unlabeled data, it poses challenges for chat models trained to follow instructions through labeled conversation data. Directly adapting the latter with VE on target unlabeled data may result in forgetting chat abilities. While ideal, target chat data is often unavailable or costly to create for low-resource languages, and machine-translated alternatives are not always effective. To address this issue, previous work proposed using a base and chat model from the same family. This method first adapts the base LLM with VE on target unlabeled data and then converts it to a chat model by adding a chat vector (CV) derived from the weight difference between the source base and chat models. We propose ElChat, a new language adaptation method for chat LLMs that adapts a chat model directly on target unlabeled data, without a base model. It elicits chat abilities by injecting information from the source chat model. ElChat offers more robust and competitive target language and safety performance while achieving superior English, chat, and instruction-following abilities compared to CV.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have made the following key changes to address reviewer feedback, with such revisions highlighted in blue within the manuscript: * **Clarified Chat vs. Base Model Distinction**: We have clarified the distinction between chat and base models in Figure 1's caption and the second paragraph on page 1. (Addressing Reviewer nw3R) * **Clarified the Importance of Vocabulary Expansion for Recent LLMs**: We have acknowledged in the first paragraph of Section 1 that recent LLMs have large vocabularies, but they still suffer from overfragmentation in underrepresented languages with an example. (Addressing Reviewer nw3R) * **Stated ElChat's Novelty**: We now explicitly state that ElChat's novelty lies in its strategic combination of existing techniques, rather than individual components, in the second paragraph of Section 3. (Addressing Reviewers nw3R, f5vz, and BFxi) * **Expanded SLERP Rationale**: We have provided a more detailed explanation for the primary use of SLERP over linear merging in Footnote 5 on page 5. (Addressing Reviewer BFxi) * **Enhanced Methodological Description**: The methodology section has been substantially expanded for improved clarity, including a new conceptual figure (Figure 2) illustrating ElChat's processes and a step-by-step description of each process. (Addressing Reviewers f5vZ and nw3R) * **Clarified Evaluation Data Translation**: We have explicitly mentioned that the TruthfulQA, ToxiGen, and ImplicitHate evaluations use target language translated data (following Cahyawijaya et al., 2024) in the first paragraph on page 7. (Addressing Reviewer f5vZ) * **Conducted Additional Experiments**: Per suggestions, we have incorporated the results from the following new experiments: * **Qwen3 14B**: We have included analysis for Amharic, Bengali, and Telugu using Qwen3 14B, with results in Appendix C.3. This generally confirms ElChat's efficacy, though target language improvements are less pronounced where Qwen3 officially supports the language (i.e. Bengali and Telugu). (Addressing Reviewer f5vZ) * **BBH**: Results for all models and languages on BBH are now included in Figure 4 and Tables 11, 12, and 13. The trends align with other English-centric benchmarks. (Addressing Reviewer BFxi) * **AlpacaEval v2.0**: Results for selected languages (Amharic, Bengali, and Telugu) are available in Table 10, with a brief mention in the third paragraph of "Chat and Instruction-following" on page 9. (Addressing Reviewer BFxi) * **Refined ElChat vs. Chat Vector Discussion**: We have clarified when Chat Vector outperforms ElChat and vice versa for target language tasks in the third paragraph of Section 5.2. (Addressing Reviewer BFxi) * **Expanded Ethical Considerations**: Our discussion on the risks associated with deploying adapted chat LMs has been expanded within the Ethical Considerations section. (Addressing Reviewer BFxi)
Assigned Action Editor: ~Ruoyu_Sun1
Submission Number: 4876
Loading