LLM Merging Competition Technical Report for NeurIPS 2024: Efficiently Building Large Language Models through Merging

Yizhen Zhang; Yang Ding; Jie Wu; Yujiu Yang

LLM Merging Competition Technical Report for NeurIPS 2024: Efficiently Building Large Language Models through Merging

Yizhen Zhang, Yang Ding, Jie Wu, Yujiu Yang

Published: 12 Dec 2024, Last Modified: 12 Dec 2024LMC 2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model merging, Multi-task, Large language model

TL;DR: We experimented with a range of base models and merging strategies, ultimately choosing Llama3-8B-Instruct and its variants as our foundation model, merged using the DARE-TIES strategy.

Abstract: We present our solution for the LLM Merging Competition: Building LLMs Efficiently through Merging at NeurIPS 2024. We experimented with a range of base models and merging strategies, ultimately choosing Llama3-8B-Instruct and its variants as our foundation model, merged using the DARE-TIES strategy. To further improve inference-time performance, we incorporated few-shot enhancement and chain-of-thought prompting techniques. We secured 1st place on the released public dataset with a score of 0.83, and achieved a score of 0.41 in the Finals.

Submission Number: 2

Loading