Taking Advantage of Out-of-Corpus Information for Citation Network Clustering

Steven H Lee, Taesun Moon, Hal Daume III

May 08, 2013 (modified: May 08, 2013) ICML 2013 PeerReview submission readers: everyone
  • Decision: oral
  • Abstract: In this paper we explore the use of several popular clustering and graph partitioning algorithms as a method of generating clusters of related scientific documents and suggest a simple graph augmentation technique for taking advantage of external information. We show that by hallucinating nodes for scientific documents that are cited but not present in the original data set, we can improve performance of clustering algorithms.