Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models

Sumuk Shashidhar; Abhinav Chinta; Vaibhav Sahai; Zhenhailong Wang; Heng Ji

Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models

Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Zhenhailong Wang, Heng Ji

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Language Modeling and Analysis of Language Models

Submission Track 2: Theme Track: Large Language Models and the Future of NLP

Keywords: large-language-model, open-source, self-refinement, ranking-metric, cost-analysis

TL;DR: Evaluation of domain-agnostic, untargeted, self-refinement on open source llms and cost-sensitive ranking of SoTA refined open source models.

Abstract: The dominance of proprietary LLMs has led to restricted access and raised information privacy concerns. The SoTA open-source alternatives are crucial for information-sensitive and high-volume applications but often lag behind in performance. To address this gap, we propose (1) A generalized variant of iterative self-critique and self-refinement devoid of external influence. (2) A novel ranking metric - Performance, Refinement, and Inference Cost Score (PeRFICS) - to find the optimal model for a given task considering refined performance and cost. Our experiments show that SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2\% from their baseline performance. Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74\% improvement overall and up to a 25.39\% improvement in high-creativity, open ended tasks on the Vicuna benchmark. Vicuna-13B takes it a step further and outperforms ChatGPT post-refinement. This work has profound implications for resource-constrained and information-sensitive environments seeking to leverage LLMs without incurring prohibitive costs, compromising on performance and privacy. The domain-agnostic self-refinement process coupled with our novel ranking metric facilitates informed decision-making in model selection, thereby reducing costs and democratizing access to high-performing language models, as evidenced by three case studies on personal computing, gaming and enterprise solutions.

Submission Number: 4895

Loading