Abstract: As machine learning algorithms continue to improve, collecting
training data becomes increasingly valuable. At the same time,
increased focus on data collection may introduce compounding privacy concerns. Accessibility projects in particular may put vulnerable populations atrisk, as disability statusissensitive, and collecting
data from small populations limits anonymity. To help address privacy concerns while maintaining algorithmic performance on machine learning tasks, we propose privacy-enhancing distortions of
training datasets. We explore this idea through the lens of sign language video collection, which is crucial for advancing sign language
recognition and translation. We present a web study exploring signers’ concerns in contributing to video corpora and their attitudes
about using flters, and a computer vision experiment exploring sign
language recognition performance with fltered data. Our results
suggest that privacy concerns may exist in contributing to sign language corpora, that flters (especially expressive avatars and blurred
faces) may impact willingness to participate, and that training on
more fltered data may boost recognition accuracy in some cases.
0 Replies
Loading