Feature Noising for Log-Linear Structured Prediction

Sida I. Wang, Mengqiu Wang, Stefan Wager, Percy Liang, Christopher D. Manning

2013 (modified: 16 Jul 2019)EMNLP 2013Readers: Everyone

Abstract: NLP models have many and sparse features, and regularization is key for balancing model overfitting versus underfitting. A recently repopularized form of regularization is to generate fake training data by repeatedly adding noise to real data. We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. We show how to apply this method to structured prediction using multinomial logistic regression and linear-chain CRFs. We tackle the key challenge of developing a dynamic program to compute the gradient of the regularizer efficiently. The regularizer is a sum over inputs, so we can estimate it more accurately via a semi-supervised or transductive extension. Applied to text classification and NER, our method provides a >1% absolute performance gain over use of standardL2 regularization.

0 Replies