Language Model for Earth Science: Exploring Potential Downstream Applications as well as Current Challenges
Abstract: The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These new NLTs are being used in speech to text and vice versa, automated text classification, sentiment analysis, topic modeling, text summarization, and cognitive assistants. While Earth science has no shortage of unstructured data such as journal and conference papers, little efforts have focused on harnessing NLTs for knowledge extraction and supporting the scientific process. This paper surveys the use of language models in different science. BERT-E, a new Earth science-specific language model, is presented. BERT-E is generated using a transfer learning solution. A language model that has already been trained for general Science (SciBERT) is fine-tuned using abstracts and full text extracted from various Earth science-related articles. A downstream keywords classification application is used for evaluation, and the use of BERT-E shows improved performance. The need to develop a robust set of benchmarks in evaluating the language model such as BERT-E is discussed. Finally, example applications are presented to inspire additional ideas for applications using domain-specific language models.
Loading