Beyond Protected Attributes: Disciplined Detection of Systematic Deviations in DataDownload PDF

Published: 21 Nov 2022, Last Modified: 05 May 2023TSRML2022Readers: Everyone
Keywords: TrustworthyAI, Subgroup discovery, Fairness
TL;DR: We showed the importance of Automatic Stratification (AutoStrat) when analysing data on the subgroup level for systematic deviations beyond protected attributes.
Abstract: Finding systematic deviations of an outcome of interest in data and models is an important goal of trustworthy and socially responsible AI. To understand systematic deviations at a subgroup level, it is important to look beyond \emph{predefined} groups and consider all possible subgroups for analysis. Of course this exhaustive enumeration is not possible and there needs to be a balance of exploratory and confirmation analysis in socially-responsible AI. In this paper we compare recently proposed methods for detecting systematic deviations in an outcome of interest at the subgroup level across three socially-relevant data sets. Furthermore, we show the importance of looking through all possible subgroups for systematic deviations by comparing detected patterns using only protected attributes against patterns detected using the entire search space. One interesting pattern found in the OULAD dataset is that while having a high course load and not being from the highest socio-economic decile of UK regions makes students 2.3 times more likely to fail or withdraw from courses, being from Ireland or Wales mitigates this risk by 37%. This pattern may have been missed if we focused our analysis on the protected groups of gender and disability only. Python code for all methods, including the most recently proposed "AutoStrat" are available on open-sourced code repositories.
4 Replies