Abstract: Quality assessment of data models can be a challenging task due to its subjective nature. For the schemaless, heterogeneous and diverse group of databases falling under the NoSQL umbrella, quality is generally operation and performance oriented, and no quality assessment framework exists. As a first step in shaping our understanding of NoSQL database model quality, this paper investigates the perceived usability of quality evaluation frameworks adopted from Entity Relationship (ER) modeling to the context of NoSQL databases. A first evaluation is performed on the three most widely used ER quality frameworks, where they are assessed for their usefulness, ease of use and suitability in the context of NoSQL databases. Based on the results of this assessment, a second evaluation is performed on the best scoring framework. This evaluation is comprised of a real use case adoption of the framework to assess the quality of NoSQL database models. This paper merges targeted crowdsourcing, Stack overflow data mining and white-box classification to gain insights into the concept of NoSQL database model quality, its characterizing features and the trade-offs it involves. This work illustrates the first investigation of ER-defined quality framework to NoSQL on a sample of diverse NoSQL schemas and using both industrial and academic participants. A decision tree is utilized to describe the heuristics of data model assessment, and an analysis is performed to identify inter-annotator disagreement, quality criterion importance, and quality trade-offs. In the absence of works approaching NoSQL data model quality assessment, this paper aims to lay groundwork and present preliminary insights on quality characterization in the context of NoSQL, as well as highlight current gaps, limitations and potential improvements.
Loading