DMORL: Distributed Multi-Objective Reinforcement Learning Framework for Fine-Tuning Large Language Models in Counsellor Reflection Generation

ACL ARR 2025 February Submission4499 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advances in reinforcement learning (RL) fine-tuning methods for large language models (LLMs) show promise in addressing multi-objective tasks but still have revealed significant challenges, including complex objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging distributed learning principles, we introduce a distributed multi-objective RL fine-tuning (DMORL) framework that simultaneously trains multiple models with individual objectives while optimizing their aggregation. Our method aggregates the last hidden states of local models to influence the final generation, supported by a hierarchical grid search algorithm that selects optimal weight combinations stepwise. This approach optimizes aggregation weights and significantly reduces the complexity of its selection process. We evaluate DMORL on a counsellor reflection generation task, using text-classification LLMs to score responses and reward RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of DMORL against existing baselines: significantly lower and more stable training consumption (17,529±1,650 data points and 6,573±147.43 seconds), improved scalability and explainability, and comparable performance across multiple objectives.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: multi-task learning, generative models, reinforcement learning, optimization methods
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: english
Submission Number: 4499
Loading