A Dataset for Efforts Towards Achieving the Sustainable Development Goal of Safe Working Environments
Keywords: Occupational Health and Safety, Labour Inspections, Machine Learning, Checklists, Long Tailed Classification
TL;DR: Labour Inspection Checklist Dataset
Abstract: Among United Nations' 17 Sustainable Development Goals (SDGs), we highlight SDG 8 on Decent Work and Economic Growth. Specifically, we consider how to achieve subgoal 8.8, "protect labour rights and promote safe working environments for all workers [...]", in light of poor health, safety and environment (HSE) conditions being a widespread problem at workplaces. In EU alone, it is estimated that more than 4000 deaths occur each year due to poor working conditions. To handle the problem and achieve SDG 8, governmental agencies conduct labour inspections and it is therefore essential that these are carried out efficiently. Current research suggests that machine learning (ML) can be used to improve labour inspections, for instance by selecting organisations for inspections more effectively. However, the research in this area is very limited, in part due to a lack of publicly available data. Consequently, we introduce a new dataset called the Labour Inspection Checklists Dataset (LICD), which we have made publicly available. LICD consists of 63634 instances where each instance is an inspection conducted by the Norwegian Labour Inspection Authority. LICD has 577 features and labels. The dataset provides several ML research opportunities; we discuss two demonstration experiments. One experiment deals with the problem of selecting a relevant checklist for inspecting a given target organisation. The other experiment concerns the problem of predicting HSE violations, given a specific checklist and a target organisation. Our experimental results, while promising, suggest that achieving good ML classification performance is difficult for both problems. This motivates future research to improve ML performance, inspire other data analysis efforts, and ultimately achieve SDG 8.
Supplementary Material: zip
Dataset Url: The dataset is published under the following URL: https://doi.org/10.18710/7U6TZP The code from the demonstration experiments described in the paper is also included as a part of the supplementary material. The code must be opened in Jupyter Notebook.
License: The data set is released under a CC0 License.
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes