VIVO-Studio: towards a software that facilitates and standardizes the process of data transformation from institutional sources to VIVO for multidisciplinary teams

May 13, 2021 (edited May 13, 2021)VIVO 2021 Conference Submission
  • Keywords: VIVO encasulation, Development tool, Ontology Editing, Data Transformation Tool, Eclipse Base Application
  • TL;DR: A critical issue that ensures the success of VIVO’s data integration is related to the quality of the collaboration between professionals. We have developed VIVO-Studio, a software tool that facilitates the tasks in the data transformation process.
  • Abstract: An essential step in the production of VIVO's institutional instance is the data provisioning with relevant, coherent, up to date and valid data. Complex data transformation processes must be developed and implemented within the institution by several specialists coming from various fields of expertise. In most cases, the data transformation process can be divided into the following tasks : selecting the data to be extracted; extracting data from various institutional data sources; converting tabular data representation into knowledge graphs; editing ontologies and vocabularies in RDF/S; data mapping from a source vocabulary to VIVO's vocabulary; data ingestion in a local VIVO instance and a final step of verifying and validating the data managed by VIVO. The process involves the collaboration of several professionals who, on the one hand, design the data transformation rules specific to each institution, and on the other hand, implement the transformation rules in various software modules in order to automate the VIVO's data loading. A critical issue that ensures the success of institutional data integration in VIVO is related to the quality of the collaboration between the various professionals involved, the simplicity of carrying out the recurring tasks (e.g.: migrating dataset from JSON notation to TURTLE notation, or executing SPARQL queries in a graph) and access of encapsulated and preconfigured services (e.g. a local and pre-installed VIVO instance). To facilitate collaboration, we have developed VIVO-Studio, a software tool with scalable, adaptive and incremental features that facilitates and standardize the tasks required in the data transformation process. VIVO-Studio is intended for computer scientists at all levels as well as for ontologists responsible for data quality, whether they are librarians, researchers, data scientists or database administrators. VIVO-Studio encapsulates a set of software tools such as a Tomcat server, access to Java APIs from Apache-Jena and OWLApi, an Apache Fuseki server, Apache Kafka services and many others. During our presentation, we will present the various contexts and use cases for VIVO-Studio. Thereafter, we will present the component architecture of VIVO-Studio as well as the principal functionalities which compose it and which will be, as a demonstration guide, supported by some screen captures.
