Improving Fairness in AI-Powered Recruitment: An Interpretable Resume Screening System

Natalia A. Agapova; Rustam A. Lukmanov

Improving Fairness in AI-Powered Recruitment: An Interpretable Resume Screening System

Natalia A. Agapova, Rustam A. Lukmanov

Published: 15 Mar 2026, Last Modified: 20 Mar 20262026 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: resume classification, algorithmic bias, fairness in machine learning, explainable AI, integrated gradients

TL;DR: This paper analyzes bias in neural resume classification systems and introduces a fairness-aware resume screening model guided by explainable feature attributions.

Abstract: Modern automated resume screening systems are typically based on neural text classification models that encode a resume as a feature representation $x \in \mathcal{X}$ and predict a discrete label corresponding to candidate category, suitability level, or job role. Such models commonly produce class logits $\mathbf{z}_\theta(x)$ parameterized by $\theta$, which are converted into class probabilities via the softmax function: $$ p_\theta(y=c \mid x) = \frac{\exp(z_{\theta,c}(x))}{\sum_{k=1}^{C_0} \exp(z_{\theta,k}(x))}, $$ where $C_0$ denotes the number of target classes in the baseline classification system. These models are typically trained using cross-entropy loss and deployed as the first stage of automated candidate filtering. Despite their effectiveness, resume classifiers may encode implicit bias through correlations between predictions and non-job-related or proxy textual features. To study this effect, we analyze feature influence using Integrated Gradients, which assign an attribution score to each input feature: $$ IG_i(x) = (x_i - x_i') \int_0^1 \frac{\partial f_\theta(x' + \alpha(x - x'))}{\partial x_i} d\alpha, $$ where $x_i$ denotes the $i$-th input feature, $x'$ is a baseline representation, and $f_\theta$ is a scalar model output such as a class logit or score. This analysis reveals systematic dependencies on features that should be irrelevant to candidate evaluation. Building on these observations, we evaluate multiple debiasing techniques and propose an interpretability-guided approach to bias mitigation. The model is trained by minimizing a composite objective $$ L = L_{cls} + \lambda R_{attr}, $$ where $L_{cls}$ denotes classification loss and $R_{attr}$ penalizes model reliance on proxy features identified through attribution analysis. This formulation allows explainable analysis to guide the development of fairer resume screening models.

Submission Number: 143

Loading