FinDA: A New Dataset for Query-focused and Trustworthy Document Analysis Generation

18 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Natural Language Processing; Text Generation; Documemt Processing
Submission Guidelines: I certify that this submission complies with the submission instructions as described on
Abstract: Financial documents such as company earnings reports are crucial for informed decision-making or targeted information-seeking. Generating tailored and trustworthy analyses from these reports, could provide immense value to individuals and financial professionals. Such tailored reports often delve deep into the intricate details and narratives of financial data and encompass information from multiple modalities including tables and text, offering contextual insights rather than merely extracting surface-level facts. However, existing document question answering and summarization datasets and methods typically focus on generic factoid-type information from text-only documents. In contrast, generating query-tailored analysis over financial documents is more challenging as it requires models to perform expert-like reasoning over long documents that contains both tables and text. We therefore present \dataset, an expert-curated dataset of xxx query-analysis pairs over xxx company earning reports across various industries. We investigate a set of popular large language models, along with various prompting techniques for long-form text generation, on \dataset. We also develop and thoroughly investigate the potential of applying muli-agent collaboration pipeline on this new task. Experimental results demonstrate that ...
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1314