- TL;DR: Performant online and minibatch algorithms for mixed missing imputation using Gaussian Copula
- Keywords: mixed data, ordinal data, Gaussian copula, missing values, imputation, online
- Abstract: Many data science algorithms require complete observations, making missing value imputation an important step in many data processing pipelines. Imputation is also of independent interest for applications such as recommender systems. To address real-world big data problems, imputation algorithms must handle mixed data, containing ordinal, boolean, and continuous variables, and such algorithms must be highly scalable. In this work we develop a semi-parametric online algorithm for mixed missing value imputation using a Gaussian Copula. This online algorithm improves on the speed of its offline counterpart by an order of magnitude, with similar accuracy. The online method can also improve on the offline method by adapting to a changing data distribution.