Multi-Stage Pre-Training for Math-Understanding: $\mu^2$(AL)BERTDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Understanding mathematics requires not only comprehending natural language, but also mathematical notation. For mathematical language modeling, current pre-training methods for transformer-based language models which were originally developed for natural language need to be adapted. In this work, we propose a multi-stage pre-training scheme including natural language and mathematical notation that is applied on ALBERT and BERT, resulting in two models that can be fine-tuned for downstream tasks: $\mu^2$ALBERT and $\mu^2$BERT. We show that both models outperform the current state-of-the-art model on Answer Ranking. Furthermore, a structural probing classifier is applied in order to test whether operator trees can be reconstructed from the models' contextualized embeddings.
Paper Type: long
0 Replies

Loading