- Keywords: text mining, iterative algorithm, semantic annotation
- TL;DR: Iterative enrichment of documents with annotations of relating documents from the same compotition of documents.
- Abstract: A reference library can be described as a corpus of an individual composition of documents containing related work of research, documents of favorite authors, or proceedings of a conference. The documents in the corpus may change over time; new documents extend the corpus while other documents are sorted out. A subset of documents may contain meaningful annotations describing the documents' content while other documents contain only few and non meaningful annotations. Enriching documents with meaningful annotations is beneficial for the performance of applications like semantic search, content aggregation, automated relationship discovery, query answering and information retrieval. However, enriching a document with meaningful annotations is non-trivial. Available (semi-) automatic annotation tools ignore the individual composition of documents in corpora by annotating documents with generic named-entity related data. In this paper, we present an unsupervised corpus-driven annotation enrichment approach considering the composition of documents in a corpus and use an EM-like algorithm to iteratively enrich weakly annotated documents with meaningful annotations of related documents from the same corpus.
- Archival status: Archival
- Subject areas: Machine Learning, Question Answering, Information Integration