Position: The Categorization of Race in ML is a Flawed Premise

Miriam Doh; Benedikt Höltgen; Piera Riccio; Nuria M Oliver

Position: The Categorization of Race in ML is a Flawed Premise

Miriam Doh, Benedikt Höltgen, Piera Riccio, Nuria M Oliver

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 Position Paper Track spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This position paper argues that AI research must move beyond rigid racial taxonomies, which reinforce essentialist views and overlook the complexity of racial discrimination.

Abstract: This position paper critiques the reliance on rigid racial taxonomies in machine learning, exposing their U.S.-centric nature and lack of global applicability—particularly in Europe, where race categories are not commonly used. These classifications oversimplify racial identity, erasing the experiences of mixed-race individuals and reinforcing outdated essentialist views that contradict the social construction of race. We suggest research agendas in machine learning that move beyond categorical variables to better address discrimination and social inequality.

Lay Summary: Many machine learning tools and bias-detection methods still rely on fixed race groups—White, Black, Asian—borrowed from U.S. census labels. This position paper argues that such broad categories flatten complex identities, erase mixed-race experiences, and treat race as if it were a biological fact, which risks embedding stereotypes into everything from hiring software to image processing. Instead of using those rigid labels, the paper recommends dropping categorical race variables and focusing on the real traits that drive discrimination in each context—skin tone, facial features, spoken language, nationality, and other locally relevant characteristics. Because the attributes that matter vary by setting, it calls for a participatory process: working directly with affected communities and domain experts to choose the right mix of traits for each application. By shifting away from simplistic race labels toward flexible, multi-dimensional assessments, discrimination can be more accurately detected and mitigated. This move promises models that are both more equitable and more attuned to the rich diversity of human identities.

Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)

No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.

Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.

Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.

Paper Verification Code: Y2ViN

Permissions Form: pdf

Primary Area: Social, Ethical, and Environmental Impacts

Keywords: Race categories, Algorithmic Fairness, Face Analysis, Datasets, Interdisciplinary Research

Flagged For Ethics Review: true

Submission Number: 169

Loading