Reinforcement Learning for Debt Pricing: A Case Study in Financial Services

Bruno Brandão; Luana Guedes Barros Martins; Bryan Lincoln Marques de Oliveira; Luckeciano Carvalho Melo; Murilo Lopes da Luz; Eduardo Garcia; Marcos vinicius da Silva; Renato Gnecco Avelar; Arlindo Rodrigues Galvão Filho; Anderson Da Silva Soares; Telma Woerle de Lima Soares

Reinforcement Learning for Debt Pricing: A Case Study in Financial Services

Bruno Brandão, Luana Guedes Barros Martins, Bryan Lincoln Marques de Oliveira, Luckeciano Carvalho Melo, Murilo Lopes da Luz, Eduardo Garcia, Marcos vinicius da Silva, Renato Gnecco Avelar, Arlindo Rodrigues Galvão Filho, Anderson Da Silva Soares, Telma Woerle de Lima Soares

Published: 13 Jun 2025, Last Modified: 28 Jun 2025RL4RS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, debt recovery, offline RL, lifetime value modeling, multi-armed bandits, financial applications, discount policy optimization

TL;DR: Offline RL with LTV-based rewards and bandit orchestration at a large financial institution improved collection values.

Abstract: Traditional static discount policies in debt recovery often fail to adapt to diverse debtor behaviors and evolving market dynamics. This research developed, evaluated, and deployed a comprehensive reinforcement learning (RL) system for optimizing discount policies to maximize recovered debt and minimize negotiation costs within a large financial institution. Our methodology encompassed developing sophisticated lifetime value (LTV) models as dynamic reward functions, implementing multi-armed bandit (MAB) meta-policies for autonomous policy evaluation and selection, and exploring diverse RL approaches including Imitation Learning and Offline RL. Key findings demonstrate superior performance of RL-driven discount policies, achieving lower average discounts and higher collection values in production compared to established baselines. The LTV models proved crucial for handling delayed feedback, while MAB meta-policies effectively orchestrated policy deployment in live operational settings. This work demonstrates the practical viability of applying advanced RL techniques to real-world financial challenges.

Submission Number: 18

Loading