Abstract: Two important recent trends are the proliferation of learning algorithms along with the massive increase of data stored on unreliable storage mediums. These trends impact each other; noisy data can have an undesirable effect on the results provided by learning algorithms. Although traditional tools exist to improve the reliability of data storage devices, these tools operate at a different abstraction level and therefore ignore the data application, leading to an inefficient use of resources. In this paper we propose taking the operation of learning algorithms into account when deciding how to best protect data. Specifically, we examine several learning algorithms that operate on data that is stored on noisy mediums and protected by error-correcting codes with a limited budget of redundancy; we develop a principled way to allocate resources so that the harm on the output of the learning algorithm is minimized.
0 Replies
Loading