The Poisson Margin Test for Normalization-FreeSignificance Analysis of NGS Data

Adam Kowalczyk, Justin Bedo, Thomas Conway, Brayn Beresford-Smith

17 Sept 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: The Poisson Margin Test for Normalization-FreeSignificance Analysis of NGS DataADAM KOWALCZYK, JUSTIN BEDO, THOMAS CONWAY, and BRYAN BERESFORD-SMITHABSTRACTThe current methods for the determination of the statistical significance of peaks and re-gions in next generation sequencing (NGS) data require an explicit normalization step tocompensate for (global or local) imbalances in the sizes of sequenced and mapped libraries.There are no canonical methods for performing such compensations; hence, a number ofdifferent procedures serving this goal in different ways can be found in the literature.Unfortunately, the normalization has a significant impact on the final results. Differentmethods yield very different numbers of detected ‘‘significant peaks’’ even in the simplestscenario of ChIP-Seq experiments that compare the enrichment in a single sample relativeto a matching control. This becomes an even more acute issue in the more general case of thecomparison of multiple samples, where a number of arbitrary design choices will be re-quired in the data analysis stage, each option resulting in possibly (significantly) differentoutcomes. In this article, we investigate a principled statistical procedure that eliminates theneed for a normalization step. We outline its basic properties, in particular the scaling upondepth of sequencing. For the sake of illustration and comparison, we report the results of re-analyzing a ChIP-Seq experiment for transcription factor binding site detection. In order toquantify the differences between outcomes, we use a novel method based on the accuracy ofin silicoprediction by support vector machine (SVM) models trained on part of the genomeand tested on the remainder. See Kowalczyk et al. (2009) for supplementary material.

0 Replies