Abstract: This paper introduces FIRE (\textbf{FI}nancial \textbf{R}elation \textbf{E}xtraction), a sentence-level dataset of named entities and relations within the financial sector. Comprising \nInstance instances, the dataset encapsulates \nEntType named entity types along with \nRelType relation types. The textual data was collected from public financial reports and financial news articles, effectively capturing a wide array of financial information about a business including, but not limited to, corporate structure, business model, revenue streams, and market activities such as acquisitions. The full dataset was labeled by a single annotator to minimize labeling noise. Detailed annotation guidelines are provided, as well as an open-source, web-based text labeling tool aimed at streamlining annotation. The labeling time for each sentence was recorded during the labeling process. We show how this feature, along with curriculum learning techniques, can be used to improved a model's performance. The FIRE dataset is designed to serve as a valuable resource for training and evaluating machine learning algorithms in the domain of financial information extraction, as well as a resource for financial analysts to automatically and efficiently extract critical information from financial documents. The dataset and the code to reproduce our experimental results are available at \url{https://github.com/blinded_for_review}. The repository for the labeling tool can be found at \url{https://github.com/blinded_for_review}.
Paper Type: long
Research Area: Information Extraction
Contribution Types: Reproduction study, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading