Abstract: We introduce a new private regression setting we call \textit{Private Regression in Multiple Outcomes} (PRIMO), inspired by the common situation where a data analyst wants to perform a set of $l$ regressions while preserving privacy, where the features $X$ are shared across all $l$ regressions, and each regression $i \in [l]$ has a different vector of outcomes $y_i$. Naively applying existing private linear regression techniques $l$ times leads to a $\sqrt{l}$ multiplicative increase in error over the standard linear regression setting. We apply a variety of techniques including sufficient statistics perturbation (SSP) and geometric projection-based methods to develop scalable algorithms that outperform this baseline across a range of parameter regimes. In particular, we obtain \textit{no dependence on l} in the asympotic error when $l$ is sufficiently large. We apply our algorithms to the task of private genomic risk prediction for multiple phenotypes using data from the 1000 Genomes project and the Database of Genotypes and Phenotypes (dbGaP). Empirically, we find that even for values of $l$ far smaller than the theory would predict, our projection-based method improves the accuracy relative to the variant that doesn't use the projection.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Audra_McMillan1
Submission Number: 3270
Loading