Data Distribution Insights into Trustworthy Machine Learning

Zeming Wei

Published: 14 May 2025, Last Modified: 22 May 2025PKU B.S. ThesisEveryoneCC BY 4.0

Abstract: In recent years, machine learning (ML) has made milestone advancements across a variety of applications. However, it also faces significant trustworthiness issues that raise concerns about its practical implementation in real-world scenarios. These issues include perspectives like transparency, robustness, and safety, which can result in unreliable, misleading, or harmful outcomes in their deployment. For example, the complex nature of deep neural networks (DNNs) makes their decisions difficult to interpret. Additionally, adversarial examples and jailbreaking attacks can lead ML models to produce inaccurate or toxic outcomes. This thesis aims to provide new insights into these research problems from the data distribution perspective. Complementary to existing trustworthy ML research that emphasizes optimization or model design, this thesis explores the potential of data distribution for understanding and defending ML models, which has notable potential as model sizes grow and traditional methods may encounter scalability bottlenecks. By analyzing and utilizing characteristics of specific data distributions, this thesis proposes theories and algorithms motivated by them for mechanism interpretation, robust generalization, and alignment inspection across different types of ML models.