Automatic Summarization of Technical Documents in the Oil and Gas Industry

Published: 01 Jan 2019, Last Modified: 07 Nov 2024BRACIS 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We address extractive summarization of technical documents in the oil and gas industry, a major and urgent task due to the large volume of critical reports in that industry. We examine five distinct state-of-the-art extractive algorithms; to assess performance, a new open dataset was created using the open access Journal of Petroleum Exploration and Production Technology (JPEPT). Abstracts for papers in this journal were used as ground truths for summarization. Algorithms were refined to work with these documents in the best possible way. Our most effective algorithm achieved a state-of-the-art ROUGE-2 score of 0.123, taking 83 minutes to summarize the entire JPEPT dataset.
Loading