Expanding the Web, Smaller Is Better: A Comprehensive Study in Post-training

27 Sept 2024 (modified: 03 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Post-training, Continual Learning, Large Language Models
Abstract: General-purpose large language models (GLLMs) like GPT-4 and LLaMA have demonstrated exceptional performance across a wide range of tasks. However, their performance often falls short in domain- or task-specific applications, where deeper, specialized knowledge is essential, while maintaining general knowledge remains crucial for handling broader, unseen tasks. Post-training has been widely applied to make LLMs specialized, typically consisting of multiple stages, including DomainAdaptive Pre-Training (DAPT) and Supervised Fine-Tuning (SFT). In this work, we conduct a comprehensive study on three key aspects of post-training taking Finance as a target domain: (1) the distinct roles of DAPT and SFT in post-training, (2) strategies to mitigate knowledge forgetting across stages, and (3) evaluation methods that capture both general and domain-specific capabilities. Our results show that DAPT and SFT require distinct training objectives, joint training of DAPT and SFT is essential for maintaining stage knowledge and encouraging knowledge transfer across stages, and replay mechanisms are critical for preventing forgetting. Evaluation should encompass general, seen, and unseen tasks for a complete assessment. Based on these insights, we developed a Joint-and-Replay post-training recipe and built LLaMA3-8B-Fin, a smaller yet more powerful stateof-the-art financial LLM trained through post-training. Despite its smaller size, LLaMA3-8B-Fin surpasses larger models like GPT-4o and LLaMA3.1-70b on both seen and unseen financial tasks while retaining general knowledge, demonstrating that a well-structured post-training can “expand the web” of capabilities in smaller LLMs, enabling them to outperform much larger models.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11414
Loading