Abstract: Functional Dependencies (FDs) define attribute relationships based on syntactic equality, and, when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We motivate the need to include context in data cleaning in order to account for the subjective nature of data quality. We enhance dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms and is-a hierarchies defined by an ontology. We study the data and ontology repair problem for a set of OFDs, and propose an algorithm that finds the best ontological interpretation of the data that minimizes the number of repairs.
Loading