Enriching OpenCitations Meta with Subject Areas

28 Jul 2023 (modified: 01 Aug 2023)InvestinOpen 2023 OI Fund SubmissionEveryoneRevisionsBibTeX
Funding Area: Critical shared infrastructure / Infraestructura compartida critica
Problem Statement: Current research assessment is flawed, and initiatives such as CoARA (https://coara.eu/) and DORA (https://sfdora.org/) seek to reform it. OpenCitations is an independent, community-led, not-for-profit Open Science infrastructure with the mission to harvest, archive and openly disseminate accurate and comprehensive bibliographic metadata describing the world's academic publications and the scholarly citations that link them, so that our users (researchers, policy makers, libraries, etc.) may adopt and use these as trustworthy components of their day-to-day work. In particular, we strive to enable their use for the creation of transparent metrics that will support informed judgements by academic peers and facilitate the reproducibility of research assessment exercises. With OI Fund support, OpenCitations will expand the metadata holdings within its new bibliographic metadata database, OpenCitations Meta, by including the subject areas of journal articles and other publications, using terms from OCLC’s FAST, and by aligning its top-level terms to WoS Research Areas, Scopus Subject Categories, Dimensions, and SCImago subject terms. Such disciplinary information will allow researchers and policy makers to focus on publication subject areas, to measure the flow of ideas from one discipline to another, and to develop new open metrics that can measure the multi-disciplinarity of publications from individual research institutes, funding streams, or researchers.
Proposed Activities: Extending OpenCitations metadata with subject areas requires the following activities, and will result in publication of specific descriptions, datasets and software resources, with appropriate open licenses (CC-BY, CC0, ISC). 1. A taxonomy for subject areas Many subject area classifications exist, but few are appropriate or available as Linked Open Data to maximise their reuse. We will use the Topical subset of FAST (https://www.oclc.org/en/fast.html), a respected hierarchical English-language subset of the US National Library of Congress subject headings. Developed by OCLC and used, for example, by Harvard University Library and the British Library, it is available as Linked Open Data under an ODC-By License, and is thus interoperable with existing OpenCitations metadata. We will develop new software to pull appropriate terms from FAST into OpenCitations Meta during ingestion of new bibliographic resources. 2. Taxonomy alignment FAST contains 1.7 million terms, of which we will use terms from the upper levels of the Topical hierarchy. To facilitate the reuse of FAST terms within OpenCitations Meta, we will create alignments to interlink top-level FAST subject categories with the other existing simple classifications, for example the WoS Research Areas, Scopus Subject Categories, Dimensions and SCImago subject terms. This will allow one to use the OpenCitations subject categories as a bridge to make these other subject classifications interoperable. The alignments will be formalised in SKOS, compliant with the FAIR principles, and will be published within the SPAR Ontologies (currently managed by OpenCitations) to guarantee their long-term sustainability. 3. Extension of OpenCitations Data Model (OCDM) The OCDM, used in OpenCitations to store and expose all its bibliographic and citation data, will be extended to permit FAST subject areas to be associated with the bibliographic resources that the OCDM describes. Together with the OCDM, all the software libraries developed to handle OCDM-compliant data will be extended to enable other applications to align with the new extension, thus permitting the easy association of subject areas via software code. 4. Extension of the OpenCitations Meta ingestion process The ingestion process of OpenCitations Meta is the means whereby new bibliographic metadata is included within OpenCitations. It will be critical to extend the codebase for this ingestion process, to enable us to provide as additional input the FAST subject areas associated with bibliographic resources. 5. Enriching OpenCitations Using all the resources from (1-4), and the external sources that provide the actual subject categories for individual ingested bibliographic publications, we will prepare appropriate input tables for use by the OpenCitations Meta ingestion process so that subject areas information can be added to the collection. This software will be launched systematically, every time OpenCitations Meta is updated.
Openness: For OpenCitations, 'open' is its founding value and prime mission. All OpenCitations services and data are free and open and will always remain so: data under a CC0 License using Semantic Web (Linked Data) technologies, publications Open Access under CC-By, and software open source under an ISC license. The enrichment of OpenCitations Meta with subject areas will be celebrated not only as a development for OpenCitations' technical infrastructure, but also as an added value for the scholarly community. OpenCitations' support by the OI Fund will be announced on OpenCitations' social media platforms, and on the OpenCitations' blog with a post presenting the planned activities. Additionally, all the phases of the work will be described step-by-step on OpenCitations’ public roadmap. Being selected for the OI Fund will enable OpenCitations to collaborate with IOI to enhance the existing open science infrastructure and to create global awareness around the importance of the open provision of bibliographical data for a more equitable research environment. IOI will be presented as a key OpenCitations partner during community-centred talks at international conferences, including the Workshop on Open Citations and Open Scholarly Metadata (October 2023), organized by OpenCitations. This IOI development will be also listed among other OpenCitations partner projects, with a complete description of the funded activities and outcomes, in a dedicated section on OpenCitations’ website.
Challenges: The work we propose has several challenges, of which one is the alignment of the adopted FAST taxonomy with other existing ones (step 2). Such an accurate alignment is crucial to enable the interoperability of the subject areas we will define with those in other classification schemes, thus enabling the crosswalk from one scheme to another. However, defining this type of crosswalk is challenging, since seemingly identical concepts contained in the various schemes may have different semantic nuances that are not easy to align. Another challenge centres round the modification of the existing OpenCitations codebase to include the new feature that enables the handling of subject areas (step 3 and 4). This challenge is purely implementative, since every modification of the code must guarantee that the current process continues to work properly and that the code remains back-compatible with the previous version. The last technical challenge is related to the work to enable the ingestion of new data containing subject area information into OpenCitations Meta (step 5). Here, a careful analysis of the source data must be done which enables us to handle situations in which there are possible ambiguities and inconsistencies relating to the subject areas specified for bibliographic resources in the original source, that must be appropriately handled. The final challenge will be to find an individual with an appropriate skill set to undertake the work described.
Neglectedness: This is the very first time we apply for funds to address this specific project. However, in the past, OpenCitations has applied for funding to enable us to extend our technical infrastructure more generally. In particular, we applied successfully to calls for proposals by the Wellcome Trust, the Wikimedia Foundation, and the Sloan Foundation. In addition, we have participated in projects funded by the European Union (OpenAIRE Nexus, RISIS2, GraspOS), and we were selected for community support in the SCOSS second funding cycle, which enables us to receive donations and membership fees from various organisations that we use for salaries and day-to-day maintenance of the OpenCitations computational infrastructure.
Success: We will measure the success of the work by using the following dimensions: (a) views and downloads of the new taxonomy alignment; (b) number of different classification schemes aligned; (c) number of subject areas data (i.e. <bibliographic resource> <has subject area> <X>) added to OpenCitations; (d) number of publications (report, scientific paper, datasets, software, other research outcomes, etc.) published on this work.
Total Budget: 24,382.67 USD
Budget File: pdf
Affiliations: Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna - that is the legal entity in charge of managing OpenCitations
LMIE Carveout: No, it does not fit in this category.
Team Skills: The OpenCitations team gathers a group of young and highly motivated researchers appointed at the Research Centre for Open Scholarly Metadata (University of Bologna), under the guidance of OpenCitations’ Directors prof. Silvio Peroni and prof. David Shotton. Through weekly meetings, the team coordinates with the Directors in maintaining the OpenCitations data services, developing new functionalities, undertaking administrative and secretarial tasks, and for the maintenance and development of the technical infrastructure. The various educational backgrounds of the team members - ranging from the Humanities to Computer Science - together with a common interest in the Digital Humanities and in Open Science, make OpenCitations a research infrastructure in which the relationship and the mutual enrichment between its workers count as much as the sharing of expertise with its community partners. The OpenCitations community involves numerous partnerships, including R&D projects, funding consortia and community networks. It is an active member of the SCOSS Family network and the POSI-posse group. The support received from OI Fund will be used to appoint a new developer to work on the enrichment of OpenCitations Meta with subject areas. This person will be fully involved in OpenCitations activities and its community network, by participating in the regular team meetings, to report the developments of the work, receive collective feedback and share knowledge.
Submission Number: 56
Loading