Abstract: This paper is concerned with coded machine learning: protecting machine learning algorithms from noise in test data by an informed channel coding approach. Unlike with traditional data storage, we do not seek to ensure that all test data is correctly read from storage and used as a noiseless input to the algorithm. Rather, we seek to protect data in a way that minimizes the effect on the algorithm output (i.e., minimizes a loss compared to the hypothetical noiseless output). We focus on the case where the collected test data, derived from low-power sensors and devices, is inherently noisy. We show that a smart replication strategy is an effective choice to reduce the impact on the algorithm output for linear regression algorithms. We focus on two scenarios. The first case is where the regression model is fixed, and we must allocate a fixed budget of redundancy for our replication scheme (in order to minimize the loss on the output due to noisy test data). Analyzing this case is necessary to build our understanding for the second case which is more novel. The second case involves a scenario where we may learn an optimized model and jointly protect it. We illustrate the advantages of our approach with practical experiments.
0 Replies
Loading