Unsupervised Functional Dependency Discovery for Data PreparationDownload PDF

20 Mar 2019, 20:37 (edited 02 Jul 2019)ICLR 2019 Workshop LLD Blind SubmissionReaders: Everyone
  • Keywords: Functional Dependencies, Sparse Regression, Structure Learning, L1-regularization, Weak Supervision
  • Abstract: We study the problem of functional dependency (FD) discovery to impose domain knowledge for downstream data preparation tasks. We introduce a framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can scale to large data instances with millions of tuples and hundreds of attributes, while recovering true FDs across a diverse array of synthetic datasets, even in the presence of noisy data. Overall, our methods show an average F1 improvement of 2× against state-of-the-art FD discovery methods. Our system also obtains better F1 in downstream data repairing task than manually defined FDs.
3 Replies