Abstract: Relation triplets extraction (RTE) aims to extract all potential triplets of subject-object entities and the corresponding relation in a sentence, which is an extremely essential procedure in information extraction and knowledge graph construction. In the literature, dominant studies normally focus on text only, which neglects the fact that other modalities may have effective information available for entities and relations (e.g., the image accompanying the text on Twitter). Therefore, in this paper, we introduce a multi-modal task for relation triplets extraction (MRTE), which considers both text and image modalities. To well handle the multi-modality and achieve all potential triplets extraction, we propose a novel multi-modal approach by resorting to a unified machine reading comprehension (MRC) framework with multiple queries, namely MUMRC. Specifically, we design a unified framework to conduct multi-modal entity extraction, and multi-modal relation classification depending on all potential subject-object entity pairs, which recasts both sub-tasks as a machine reading comprehension (MRC) problem. In this unified MRC model, we incorporate multiple questions into original text data and leverage visual information to enhance the textual semantics. Extensive experiments on the MNRE dataset demonstrate that the proposed approach significantly outperforms the state-of-the-art baselines of textual RTE and multi-modal information extraction.
Loading