Abstract: While NLP models significantly impact our lives, there are rising concerns about privacy invasion. Although federated learning enhances privacy, attackers may recover private training data by exploiting model parameters and gradients. Therefore, protecting against such an embedding attack remains an open challenge. We propose Subword Embedding from Bytes (SEB) and encode subwords to byte sequences using neural networks that are harder to retrieve in attacks. Importantly, our method requires a smaller memory with only $256$ bytes of vocabulary while keeping efficiency with the same input length as usual. Thus, our solution outperforms conventional approaches by preserving privacy without sacrificing efficiency or accuracy. Our experiments show SEB can effectively protect against embedding-based attacks from recovering original sentences in federated learning. Meanwhile, we verify that SEB obtains comparable and even more accurate prediction results over standard subword embedding methods in machine translation, sentiment analysis, and language modeling with even lower time and space complexity.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English, German
0 Replies
Loading