A High-Precision Health-relatedness Score for Phrases to Mine Cause-Effect Statements from the WebDownload PDF

Anonymous

17 Dec 2021 (modified: 05 May 2023)ACL ARR 2021 December Blind SubmissionReaders: Everyone
Abstract: The measurement of the health-relatedness of a phrase is important when mining the web at scale for health information, e.g., when building a search engine or when carrying out health-sociological analyses. We propose a new termhood scoring scheme that allows for the prediction of the health-relatedness of phrases at high precision. An evaluation on several corpora of cause--effect statements (heuristically and professionally labeled) yields about 60\%~recall at over 90\%~precision, outperforming state-of-the-art vocabulary-based approaches and performing on par with BERT while being less resource-demanding. A new resource of over 4~million health-related cause--effect statements is compiled, such as ``Studies show that stress induces insomnia.'', which explicitly connect symptoms (`stress') as claimed causes for conditions (`insomnia'). It consists of over 4~million sentences from more than 2~million unique web pages and 234,000 unique websites.
Paper Type: long
Consent To Share Data: yes
0 Replies

Loading