CADRE: A Cloud-Based Data Service for Big Bibliographic Data

Xiaoran Yan, Guangchen Ruan, Dimitar Nikolov, Matthew Hutchinson, Chathuri Peli Kankanamalage, Benjamin Serrette, James R. McCombs, Alan Walsh, Esen Tuna, Valentin Pentchev

Published: 2021, Last Modified: 26 Oct 2023CIKM 2021Readers: Everyone

Abstract: Large bibliographic data sets hold the promise of revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities. Providing high-quality data services for large network datasets such as the Microsoft Academic Graph, which contains more than two billion citation links, poses significant difficulties for universities. Data systems based on the property graph model are capable of delivering efficient graph query services for large networks. However, real-life queries often combine multiple types of data models. To satisfy the needs of different user groups, we developed and deployed a cloud-based data system consisting of scalable graph and text-indexed query engines. For non-expert users, the property graph model also presents a technological barrier. To alleviate the steep learning curve, we designed an intuitive graphical user interface for query-building. For advanced users, a scalable notebook service in our platform provides a more flexible computing environments where the query results can be further analyzed. These systems form the data-backbone of the Collaborative Archive and Data Research Environment (CADRE), which provides efficient and high-quality bibliographic data services to eleven large public universities in North America.

0 Replies