Multimodal Banking Dataset: Understanding Client Needs through Event Sequences

Dzhambulat Mollaev; Alexander Kostin; Postnova Maria; Ivan Karpukhin; Ivan A Kireev; Gleb Gennadjevich Gusev; Andrey Savchenko

Multimodal Banking Dataset: Understanding Client Needs through Event Sequences

Dzhambulat Mollaev, Alexander Kostin, Postnova Maria, Ivan Karpukhin, Ivan A Kireev, Gleb Gennadjevich Gusev, Andrey Savchenko

26 Sept 2024 (modified: 06 Jul 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: multimodal, multi-temporal, event sequence

Abstract: Financial organizations collect a huge amount of data about clients that typi- cally has a temporal (sequential) structure and is collected from multiple sources (modalities). However, despite the urgent practical need, developing deep learn- ing techniques suitable to handle such data is limited by the absence of large open- source multi-source real-world datasets of event sequences. To fill this gap mainly caused by security reasons, we present the industrial-scale publicly available mul- timodal banking dataset, MBD, that contains more than 2M corporate clients with several data sources: 950M bank transactions, 1B geo position events, 5M em- beddings of dialogues with technical support and monthly aggregated purchases of four bank’s products. All entries are properly anonymized from real proprietary bank data. Moreover, we introduce a novel multimodal benchmark incorporating our MBD and two open-source financial datasets. We provide numerical results demonstrating the superiority of fusion baselines over single-modal techniques for each task. Moreover, our anonymization techniques still save all significant information for introduced downstream tasks.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6922

Loading