Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context
Abstract: Physical measurements constitute a large portion of numbers in academic papers, engineering reports, and web tables. Current benchmarks fall short of properly evaluating numeracy of pretrained language models on measurements, hindering research on developing
new methods and applying them to numerical tasks. To that end, we introduce a novel
task, Masked Measurement Prediction (MMP),
where a model learns to reconstruct a number
together with its associated unit given masked
text. MMP is useful for both training new
numerically informed models as well as evaluating numeracy of existing systems. In order to address this task, we introduce a new
Generative Masked Measurement (GeMM)
model that jointly learns to predict numbers
along with their units. We perform finegrained analyses comparing our model with
various ablations and baselines. We use linear
probing of traditional pretrained transformer
models (RoBERTa) to show that they significantly underperform jointly trained numberunit models, highlighting the difficulty of this
new task and the benefits of our proposed pretraining approach. We hope this framework accelerates the progress towards building more
robust numerical reasoning systems in the future.
0 Replies
Loading