Unity in Diversity: Learning Distributed Heterogeneous Sentence Representation for Extractive Summarization
Abstract: Automated multi-document extractive text summarization is
a widely studied research problem in the field of natural language understanding. Such extractive mechanisms compute
in some form the worthiness of a sentence to be included
into the summary. While the conventional approaches rely on
human crafted document-independent features to generate a
summary, we develop a data-driven novel summary system
called HNet, which exploits the various semantic and compositional aspects latent in a sentence to capture document
independent features. The network learns sentence representation in a way that, salient sentences are closer in the vector
space than non-salient sentences. This semantic and compositional feature vector is then concatenated with the documentdependent features for sentence ranking. Experiments on the
DUC benchmark datasets (DUC-2001, DUC-2002 and DUC2004) indicate that our model shows significant performance
gain of around 1.5-2 points in terms of ROUGE score compared with the state-of-the-art baselines.
0 Replies
Loading