Forecasting Credit Ratings: A Case Study where Traditional Methods Outperform Generative LLMs

Felix Drinkall, Janet B. Pierrehumbert, Stefan Zohren

Published: 2025, Last Modified: 26 Jan 2026COLING Workshops 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large Language Models (LLMs) have been shown to perform well for many downstream tasks. Transfer learning can enable LLMs to acquire skills that were not targeted during pre-training. In financial contexts, LLMs can sometimes beat well-established benchmarks. This paper investigates how well LLMs perform at forecasting corporate credit ratings. We show that while LLMs are very good at encoding textual information, traditional methods are still very competitive when it comes to encoding numeric and multimodal data. For our task, current LLMs perform worse than a more traditional XGBoost architecture that combines fundamental and macroeconomic data with high-density text-based embedding features. We investigate the degree to which the text encoding methodology affects performance and interpretability.

External IDs:dblp:conf/coling/DrinkallPZ25