Scalable Evaluation of Online Facilitation Strategies via Synthetic Simulation of Discussions

Scalable Evaluation of Online Facilitation Strategies via Synthetic Simulation of Discussions

ACL ARR 2025 May Submission3971 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Limited large-scale evaluations exist for facilitation strategies of online discussions due to significant costs associated with human involvement. An effective solution is synthetic discussion simulations using LLMs to create initial pilot experiments. We propose a simple, generalizable, LLM-driven methodology to prototype the development of LLM facilitators, and produce high-quality synthetic data without human involvement. We use our methodology to test whether current facilitation strategies can improve the performance of LLM facilitators. We find that, while LLM facilitators significantly improve synthetic discussions, there is no evidence that the application of more elaborate facilitation strategies proposed in modern Social Science research lead to further improvements in discussion quality, compared to more basic approaches. Additionally, we find that small LLMs (such as Mistral Nemo 12B) can perform comparably to larger models (such as LLaMa 70B), and that special instructions must be used for instruction-tuned models to induce toxicity in synthetic discussions. We confirm that each component of our methodology contributes substantially to high quality data via an ablation study. We release an open-source framework XXX (pip install xxx), which implements our methodology. We also release a large, publicly available dataset containing LLM-generated and LLM-annotated discussions using multiple open-source LLMs.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, language resources, automatic creation and evaluation of language resources, NLP datasets, automatic evaluation, metrics, reproducibility, statistical testing for evaluation

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 3971

Loading