Abstract: We present a machine learning model to predict pharmacogenetic (PGx) guidelines using a novel dataset assembled from the PharmGKB knowledge base, incorporating chemical structure, genetic information and recommendation annotations. To facilitate modeling, free text recommendations were processed via domain expertise assisted NLP to create three categories: Standard Dose, Adjusted Dose, and Alternate Drug. We compared several models including Multi-Layer Perceptron, K Nearest Neighbors, Random Forest, Logistic Regression, Linear SVC, and XGBoost. XGBoost excelled, combining predictive power with explainability, achieving an accuracy of 89.14% and F1 scores from 0.85-0.90, with precision and recall of 0.83-0.97 and 0.82-0.97 respectively. Such models can accelerate PGx guidelines, enhancing personalized medicine in clinical settings.
Loading