Mixture-of-Experts in Prompt Optimization

Ruochen Wang; Sohyun An; Minhao Cheng; Tianyi Zhou; Sung Ju Hwang; Cho-Jui Hsieh

Mixture-of-Experts in Prompt Optimization

Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: LLM, Prompt Engineering, Prompt Optimization, Mixture-of-Experts

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Large Language Models (LLMs) exhibit strong generalization power in adapting to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design process. While these methods demonstrated promising results, they also restricted the output space of the search problem to a demo-free instruction. Such simplification significantly limits their performance, as a single demo-free instruction might not be able to cover the entire problem space of the targeted task due to its complexity. To alleviate this issue, we adopt the Mixture-of-Expert paradigm to divide the problem space into homogeneous regions, each governed by a specialized expert. To further improve the coverage of each expert, we expand their prompts to contain both an instruction and several demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into clusters based on their semantic similarity and assign a cluster to each expert; (2) instruction assignment: A region-based joint search is applied to optimize an instruction complementary to the demo cluster for each expert, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), outperforms prior art by up to 43% on benchmark NLP tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7119

Loading