LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Communities in Long Form Question Answering
Abstract: Retail investing is on the rise, and users are more dependent on rapidly growing finance-related online communities than ever to educate themselves. However, recent years have positioned Large Language Models (LLMs) as powerful question-answering (QA) tools, shifting users away from interacting in online communities towards discourse with a AI-driven chat interfaces.
These AI-driven tools are currently limited by the availability of labelled data and the challenge of capturing financial knowledge.
Therefore, in this work, we curate a QA preference dataset SocialFinanceQA for fine-tuning and aligning LLMs, extracted from more than 7.4 million submissions and 82 million comments from 2008 to 2022 in Reddit's 15 largest finance communities.
Additionally, we propose a novel framework SocialQA-Eval as a generally-applicable method to evaluate generated QA responses.
We evaluate various LLMs fine-tuned on this dataset, using traditional metrics, LLM-based evaluation, and human annotation -- our results demonstrate the value of high-quality Reddit data, with even state-of-the-art LLMs improving on simpler and more specific responses.
We contribute to the field with our domain-specific dataset, a range of detailed experiments with a comprehensive and generalizable evaluation, and a
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: human evaluation; automatic evaluation; domain adaptation; sociolinguistics; NLP datasets; evaluation methodologies
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 221
Loading