CCLUPE: Benchmark for Credit Context Log Understanding and Prediction Evaluation

ACL ARR 2026 January Submission8102 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: credit analysis, dataset, benchmark
Abstract: While Large Language Models (LLMs) have demonstrated transformative potential in credit risk assessment, existing evaluation frameworks primarily focus on general financial NLP tasks, failing to capture the specialized reasoning required by professionals. To bridge this gap, we introduce the Credit Context Log Understanding and Prediction Evaluation (CCLUPE) benchmark. CCLUPE addresses the unique challenges of the Chinese credit market, where assessment relies heavily on synthesizing nuanced transaction logs and inferring latent financial behaviors. Unlike previous benchmarks, CCLUPE specifically targets Expenditure and Spending Pattern Recognition, evaluating the ability of LLMs to integrate heterogeneous inputs combining textual descriptions with time-series transactional data to perform causal inference and multi-stage reasoning. The dataset encompasses over 4,000 high-quality samples across personal and micro-enterprise client profiles, featuring 7 major log types and 16 subtypes. We ensure data integrity through a rigorous validation mechanism involving over 20 professional annotators. Furthermore, we enter Log-Score, a robust evaluation metric that incorporates log misunderstanding penalties and multi-dimensional capability assessment. Extensive experiments demonstrate that even state-of-the-art (SOTA) models exhibit unsatisfactory performance on these high-stakes tasks. CCLUPE serves as a rigorous testbed for the next generation of financial LLMs, ensuring their robustness for deployment in complex real-world credit scenarios. Our dataset and evaluation protocol are available at \url{https://anonymous.4open.science/r/CCLUPE-6C34}
Paper Type: Long
Research Area: Financial Applications and Time Series
Research Area Keywords: Credit Analysis
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English, Chinese
Submission Number: 8102
Loading