Data Hazards: An open-source vocabulary of ethical hazards for data-intensive projects

Natalie Zelenka, Nina H. Di Cara, Euan Bennet, Phil Clatworthy, Huw Day, Ismael Kherroubi Garcia, Susana Roman Garcia, Vanessa Aisyahsari Hanschke, Emma Siân Kuwertz

Published: 01 Mar 2025, Last Modified: 17 Oct 2025Journal of Responsible TechnologyEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Understanding the potential for downstream harms from data-intensive technologies requires strong collaboration across disciplines and with the public. Having shared vocabularies of concerns reduces the communication barriers inherent in this work. The Data Hazards project [url] contains an open-source, controlled vocabulary of 11 hazards associated with data science work, presented as ‘labels’. Each label has (i) an icon, (ii) a description, (iii) examples, and, crucially, (iv) suggested safety precautions. A reflective discussion format and resources have also been developed. These have been created over three years with feedback from interdisciplinary contributors, and their use evaluated by participants (N=47). The labels include concerns often out-of-scope for ethics committees, like environmental impact. The resources can be used as a structure for interdisciplinary harms discovery work, for communicating hazards, collecting public input or in educational settings. Future versions of the project will develop through feedback from open-source contributions, methodological research and outreach.

External IDs:doi:10.1016/j.jrt.2025.100110