Content- and Topology-Aware Representation Learning for Scientific Multi-Literature

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: NLP Applications
Submission Track 2: Machine Learning for NLP
Keywords: Representation Learning, Graph-Text Joint Learning, Wasserstein Distance
TL;DR: This work focuses on learning both semantic and relatedness representation for scientific multi-literature and proposes a novel learning frame which learns content and citation information jointly.
Abstract: Representation learning forms an essential building block in the development of natural language processing architectures. To date, mainstream approaches focus on learning textual information at the sentence- or document-level, unfortunately, overlooking the inter-document connections. This omission decreases the potency of downstream applications, particularly in multi-document settings. To address this issue, embeddings equipped with latent semantic and rich relatedness information are needed. In this paper, we propose SMRC$^{2}$, which extends representation learning to the multi-document level. Our model jointly learns latent semantic information from content and rich relatedness information from topological networks. Unlike previous studies, our work takes multi-document as input and integrates both semantic and relatedness information using a shared space via language model and graph structure. Our extensive experiments confirm the superiority and effectiveness of our approach. To encourage further research in scientific multi-literature representation learning, we will release our code and a new dataset from the biomedical domain.
Submission Number: 669
Loading