The Impact of an Adversary in a Language Model

Zhengzhong Liang, Gregory Ditzler

Published: 2018, Last Modified: 22 Jun 2023SSCI 2018Readers: Everyone

Abstract: Neural networks have been quite successful at complex classification tasks. Furthermore, they have the ability to learn information from a large volume of data. Unfortunately, not all of the sources available are secure and there is a possibility that an adversary in the environment has the malicious intention to poison a training dataset to cause the neural network to have a poor generalization error. Therefore, it is important to observe how susceptible a neural network is to the free parameters (i.e., gradient thresholds, hidden layer size, etc.) and the availability of adversarial data. In this work, we study the impact of an adversary for language models with Long Short-Term Memory (LSTM) networks and its configurations. We experimented with the Penn Tree Bank (PTB) dataset and adversarial text that was sampled from works in a different era. Our results show that there are several effective ways to poison such an LSTM language model. Furthermore, from our experiments, we are able to provide suggestions about the steps that can be taken to reduce the impact of such attacks.

0 Replies