- Keywords: representation learning, word2vec, universal sentence encoder, bert, geological embeddings, analogs
- Abstract: Geology lays at the foundation of the oil and gas industry and a good understanding of geology in each newly drilled well can make or break an exploration project with a price tag in the millions of dollars. Over the past decades, each drilled well have been extensively analyzed, where geology and other petrophysical properties were interpreted by experts and rigorously documented. As this creates a valuable source of information for future drilling success, most of it is stored in PDF files in knowledge silos of companies. Recent advancements in cloud technologies and machine learning techniques are enabling the future to be open-source and access to these technical documents is providing a broad geological knowledge of the different basins in the world. In this work, we focus on geology reports of wells drilled in the Norwegian Sea with the goal to learn numerical representations for geological descriptions in these fields and utilize these representations to find worldwide geological analogues. The automation of analog identification can improve expert interpretation, exploration success, and save a significant amount of effort and time for oil and gas companies. We will present numerical encoding approaches we took in the pursuit of capturing representations of geological knowledge from files as well as challenges faced during this work and road map towards GilBERT; Geologically informed language modeling with BERT, for the use in geology-based NLP applications in oil-and-gas (O&G) industry.