Abstract: Complex knowledge extraction is a significant topic of research. Previous methodologies were limited to extracting only the overall relations among multiple entities and they necessitated the predefined entity types, rendering them inapplicable in industrial domains such as Information and Communication Technology (ICT), where numerous entities are undefined. To address these challenges, this paper introduces a novel representation for complex knowledge: attributed triple, which consists of a primary triple coupled with auxiliary attribute-value pairs, enabling a fine-grained depiction of both the overall inter-entity relations and the pairwise relations between entities. Based on this representation, we propose an attributed triple extraction model. This model employs multi-label classifiers to identify all potential relations before extracting relevant entities with sequence taggers based on the BIO pattern, where B represents Begin, I representsInside and O represents Outside. This process makes the extraction of primary triples and auxiliary pairs without predefining entity types. Subsequently, a linear combination discriminator is designed to assess the semantic feasibility of candidate combinations formed by concatenating primary triplets and auxiliary pairs. Furthermore, contrastive learning is adopted to enhance the model’s ability of representation with insufficient training data. To better evaluate the model’s performance over extracting attributed triple, we construct a high-quality dataset based on data from ICT corpus. Our model demonstrates substantial and consistent superiority over baselines across various metrics. Our code is publicly available. https://github.com/sid0527/Attributed-Triple-Extraction
Loading