Abstract: Many patients readily share experiences about their medical conditions and treatments on online social media, which makes these platforms a potentially valuable source of infor-mation on adverse drug reactions (ADRs). In this work, the detection of mentions of AD Rs in Reddit posts is approached as a multi-label classification problem. A dataset of 537 annotated posts was created by supplementing a publicly available dataset with freshly collected and annotated posts. The labels were mapped to the Medical Dictionary for Regulatory Activities (MedDRA) and their distribution within each MedDRA level guided the creation of 12 data subsets. On each data subset, we applied 4 different multi-label learning methods - Binary Relevance (BR), Classifier Chains (CC), Label Powerset (LP) and random k-Iabelsets (RAkEL), each associated with 4 different base classifiers: Decision Trees (DT), Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM). The best F-scores were with DT on the data subset based on the 20 most frequent labels at MedDRA Preferred Term (PT) level. The best hamming loss was with the data subset based on all labels at PT level. The type of multi-label learning method did not appear to influence performance significantly. Our results show a promising direction in the use of multi-label classification of ADRs from social media posts for pharmacovigilance purposes.
Loading