Breaking the pattern

Yuri Gallo; Matteo De Toffoli

Breaking the pattern

Yuri Gallo, Matteo De Toffoli

Published: 05 Feb 2025, Last Modified: 05 Feb 2025WD&R differentformatEveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Authors Biographies: Yuri Gallo is a librarian at the University of Milan. He has a MD in Historical Sciences and is a graduate of the School of Archival, Paleographic and Diplomatic Studies of the State Archives of Milan. He is mostly interested in archives, special collections and academic libraries’ heritage. His duties include the acquisition, management and promotion of special collections and supporting university projects. He has attended higher education courses in the bibliographic-library field including the summer schools “Linked data for cultural heritage” at Alma Mater Studiorum - University of Bologna and “Linked data for digital humanities” at the University of Oxford. AIB and Wikimedia Italia associate, he is a member of the AIB National Commission for Special Libraries, Archives and Author Libraries and the Wikimedia University Commission. He has published journal articles and volume contributions, curated exhibitions, given papers at conferences and lectured in academic and professional settings. Matteo De Toffoli holds a MD in Philosophical Sciences and a PhD in Political Sciences. His main research interests revolve around the public use of concepts of post-truth, fake news, conspiracy theories and populism, with particular attention for its repercussions on democratic politics. From 2023, he works as librarian at the Philosophy Library at the University of Milan, where he is involved in acquisition, stack management, weeding and collection review.

Keywords: Stack management, Data management, Academic libraries, Wikidata

TL;DR: Using Wikidata to reorganize the thematic sections of a 80 thousand books library

Abstract: The Philosophy Library of the University of Milan holds about 80 thousand volumes. Most of the collection is open-shelved and divided into two main parts: “History of Philosophy” gathers all publications up to the end of the 19th century and is organized in a chronological order; "Contemporary Philosophy” comprises texts by authors from 20th century onwards and is divided along both a thematic and a linguistic criterion. Over the years, the lack of updates in the thematic sectors has led to a growing imbalance in the distribution of the volumes in contemporary philosophy section: those belonging to innovative areas of research or fields that are not covered by the current subdivision have mostly been assigned to the linguistic sectors, which have consequently become extremely large. Therefore, the content affinities between contiguous volumes are lost and user orientation is compromised. With the aim of revising this scheme, we decided to increase the number and variety of thematic sectors. By so doing, we could prioritize the allocation of new volumes within them, allow an easier relocation of those placed in the linguistic sectors, minimize the internal variance within each sector while maximizing the external variance between sectors. The main problem we faced was reducing the amount of work and arbitrariness involved in identifying the volumes to be moved into the new thematic sectors. Therefore, we identified a test sector and designed a workflow capable of automating at least part of the selection work. To decide which texts should be moved to the test sector, we compared the database used in the library to assign shelfmarks (containing a list of authors and the related sectors) with lists drawn from two qualified disciplinary sources. A Wikidata dataset was used to maximize matches between lists and reduce noise. The SPARQL query was based on the Date of birth, Field of work, and Occupation properties. Wikidata was also used to identify, in our local database, only the entries having the property Person and obtain data in a format useful for subsequent analysis. After normalizing the data to make them comparable, we cross-referenced the four lists to obtain a matrix in which each author was either present or absent. All entries that did not appear in any of the three external lists (the two qualified sources and the Wikidata dataset) were eliminated, thus leaving us with a preliminary list of approximately 400 authors that were candidates for the new sector. Each of these names was then reviewed individually to define whether it belonged to the test sector or a different one. The list was organized chronologically according to date of birth of the authors and used to plan the transfer of the volumes to the new shelves. The use of Wikidata thus enabled us to reduce data analysis from an initial dataset of over 20 thousand entries to just 400 with a high degree of precision and retrieval.

Format: Paper (20 minutes presentation)

Submission Number: 27

Loading