Abstract: In this information age, the DNA information itself, as well as the genomic variations of individuals are popular examples of big data to be processed. In case of analyzing thousands of individuals, the size of the data set is getting so much larger which requires big data processing technologies. In order to support the studies in bioinformatics, specifically on genomic variants and population genetics, we have implemented B3SafirBiyo, a framework with the recent big data technologies; web-based user interfaces, Spark engine and machine learning libraries. We have demonstrated the efficiency of basic filtering, querying operations on large variant files. The performance of the population clustering on 1000 genome dataset is also presented in this work.
External IDs:dblp:conf/bigdataconf/DongelT17
Loading