- Abstract: There are many applications for informal language understanding tasks in the real world. However, because informal language understanding tasks suffer more from data noise than formal ones, there is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved the performance of formal language understanding tasks did not achieve the performance on informal language much. Although the formal tasks and informal tasks are similar in purpose, their language models significantly differ from each other. We propose a data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. In the data annealing procedure, the training set contains mainly formal text data at first; then we gradually increase the proportion of the informal text data during the training process. We validate the data annealing procedure on three natural language understanding tasks: named entity recognition (NER), part-of-speech (POS) tagging, and chunking with two popular neural network models, LSTM and BERT. When BERT is fine-tuned with our learning procedure, it outperforms all the state-of-the-art models on the three informal tasks.