A Comparative Study of Tag SNP Selection Using Clustering

Sujay Saha, Riddhiman Dasgupta, Anirban Ghose, Koustav Mullick, Kashi Nath Dey

Published: 01 Jan 2014, Last Modified: 27 Jan 2025ICAA 2014EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. SNPs are confirmed as a major factor in human genome polymorphisms, and are found to be suitable as a genetic marker for disease characteristics. SNPs hold much promise as a basis for genome-wide disease-gene association. Determining the relationship between disease complexity and SNPs requires complex genotyping for large SNP data sets, and is thus very expensive and labor-intensive. In this paper, we attempt two novel approaches to solve the problem of tag SNP selection, one using self-organizing maps (SOM) for clustering the SNPs and the other using Fuzzy C Means clustering. Both the above methods have been shown to select a more optimal set of tag SNPs which capture the remaining SNPs more efficiently as compared to Haploview Tagger, thus satisfying the goal of tag SNP selection in a more suitable way.