SchemaDB: A Dataset for Structures in Relational Data

Published: 01 Jan 2022, Last Modified: 28 Sept 2024AusDM 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper we introduce the SchemaDB dataset; a collection of relational database schemas in both sql and graph formats. Databases are not commonly shared publicly for reasons of privacy and security, and so the corresponding schema for these databases are often not available for study. Consequently, an understanding of database structures in the wild is lacking, and most easily found examples of schema found publicly belong to common development frameworks or are derived from textbooks or engine benchmarks. SchemaDB contains 2,500 samples of relational schema found in public code repositories which have been standardised to MySQL syntax. We provide our gathering and transformation methodology, summary statistics, structural analysis, and discuss potential downstream research tasks in several domains.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview