UniDive Tools and Methods for Measuring Linguistic Diversity in NLP Datasets

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0
Keywords: Linguistic diversity, NLP benchmarks, measurements, software, linguistic typology, meta-linguistic categories, in-text categories
Working Group: WG4: Quantifying and promoting diversity
Abstract: One of the prominent objectives of the UniDive WG4 has been to develop tools and methods for measuring linguistic diversity at different levels. As a result, several Python libraries have been created in the activities related to WG4 addressing various aspects of measuring linguistic diversity. In this proposed presentation, we report on the work in progress aiming at demonstrating the use of these libraries for measuring linguistic diversity in NLP datasets at scale.
WG4 Tasks: Task 4.5: Measuring diversity of NLP benchmarks
Tracks For Type Of Contribution: Work in progress
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 39
Loading