Hybrid Document Indexing with Spectral EmbeddingDownload PDFOpen Website

2007 (modified: 12 Nov 2022)HLT-NAACL (Short Papers) 2007Readers: Everyone
Abstract: Document representation has a large impact on the performance of document retrieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bag-of-words representation with spectral embedding. This method accounts for the specifics of the document collection and also uses semantic similarity information based on a large scale statistical analysis. Clustering experiments showed improvements over the traditional tf-idf representation and over the spectral methods based solely on the document collection.
0 Replies

Loading