MTNER: Multiple Tender Named Entities Recognition and Classification from Unstructured Tender Documents
Abstract: The business sector generates vast amounts of data daily through contracts, reports, and tenders, containing valuable insights that need extraction. Despite advancements in AI for information extraction, extracting information from tenders remains underexplored. This study focuses on extracting and classifying named entities from unstructured tender documents to improve tender management. We developed a document segmentation approach, utilizing various text extraction tools on tender PDF tables. A custom text analyzer was created for text normalization, keyword identification, and segmentation into header, body, and footer. By discarding body text and combining header and footer, we reduced text complexity. Challenges due to unstructured PDF tables were addressed using rules and regular expressions to extract and classify tender named entities. This method enhances the usability and analysis of tender documents by accurately identifying and categorizing tender entities.
External IDs:dblp:conf/icuimc/Abbas0KS25
Loading