Fundamental Limits of Data Utility: A Case Study for Data-Driven Identity Authentication

Published: 01 Jan 2021, Last Modified: 17 Apr 2025IEEE Trans. Comput. Soc. Syst. 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Big data can help with providing valuable perceptions into business activities and disclosing the potential benefits. Advances in machine learning and deep learning technologies make it easier to achieve significant performance in a wide range of domains from city planning and marketing analysis to credit evaluation and identity theft detection. However, it still requires great efforts in selecting efficient learning algorithms and precise model parameters that are deemed confidential in the light of experience. Also worth noting is that there is a fundamental gap between impracticable business requirements and the available value of data reflected. The data holder or data service provider may not have a clear understanding of data interference. The solution to these two problems depends on the capability of predicting the data utility in advance, which raises a fundamental question: to what degree is the data utility predictable? In this work, we present a primary analytical framework for information-theoretic bounds of data utility and utilize the current state-of-the-art and representative algorithms to obtain the achievable lower bounds on a real-world data set. The gap between theoretical upper bounds and achievable lower bounds indicates that the achievable lower bounds can still be optimized for performance.
Loading