The Limitations of Data, Machine Learning and Us

Ricardo Baeza-Yates

Published: 2024, Last Modified: 06 Feb 2025SIGMOD Conference Companion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Machine learning (ML), particularly deep learning, is being used everywhere. However, not always is applied well or has ethical and/or scientific issues. In this keynote we first do a deep dive in the limitations of supervised ML and data, its key input. We cover small data, datification, bias, and evaluating success instead of harm, among other limitations. The second part is about ourselves using ML, including different types of social limitations and human incompetence such as cognitive biases, pseudoscience, or unethical applications. These limitations have harmful consequences such as discrimination, misinformation, and mental health issues, to mention just a few. In the final part we discuss regulation on the use of AI and responsible principles that can mitigate the problems outlined above.