Abstract: Automatically detecting controversy on the Web is a useful capability for a search engine to help users review web content with a more balanced and critical view. The current state-of-the art approach is to find K-Nearest-Neighbors in Wikipedia to the document query, and to aggregate their controversy scores that are automatically computed from the Wikipedia edit-history features. In this paper, we discover two major weakness in the prior work and propose modifications. First, the generated single query from document to find KNN Wikipages easily becomes ambiguous. Thus, we propose to generate multiple queries from smaller but more topically coherent paragraph of the document. Second, the automatically computed controversy scores of Wikipedia articles that depend on "edit war" features have a drawback that without an edit history, there can be no edit wars. To infer more reliable controversy scores for articles with little edit history, we smooth the original score from the scores of the neighbors with more established edit history. We show that the modified framework is improved by up to 5% for binary controversy classification in a publicly available dataset.
0 Replies
Loading