Keywords: LLM Fairness, LLM Alignment, finance, uncertainty
Abstract: Large language models (LLMs) are increasingly deployed in financial contexts, raising critical concerns about reliability, alignment, and susceptibility to adversarial manipulation. While prior finance-related benchmarks assess LLMs' capabilities in sentiment analysis (SA), question answering (QA), and named entity recognition (NER), they are often restricted to short context and therefore fail to demonstrate LLM capacities in weighing positives vs negatives and making decisions under a long financial context, which mimics the actual investment decision situation. We introduce Fin-Herding (financial herding under long and uncertain financial context), a benchmark for evaluating LLM investment decision-making when faced with uncertainty and possible human-biased opinions. Fin-Herding includes 8868 long firm-specific analyst reports, including both negative and positive aspects of firms analyzed by sophisticated analysts with investment ratings (Bullish/Neutral/Bearish) spanning from various industries. We present large language models with firm analyst reports with/without analyst investment ratings as well as fake rating, respectively, to get investment ratings generated by LLMs. We compare LLM investment rating with analyst rating as well as quasi-true-label based on real-time stock return. Our experimental results reveal that there is a significant increase in herding score(captures the extent to which LLMs follow analyst ratings) across models when presenting analyst ratings in context, ranging from 5\% to 10\%. Removing human opinions can encourage LLMs to think independently, regardless of right or wrong. We believe that Fin-Herding can advance future research in the area of automatic investment trading using an LLM agent.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 8252
Loading