A Text Mining Pipeline for Mining the Quantum Cascade Laser Properties

Published: 01 Jan 2023, Last Modified: 25 Jan 2025ADBIS (Short Papers) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The development of the Terahertz laser technology in quantum cascade lasers (qcl) has brought about great potential for industrial applications. These lasers are based on the Terahertz electromagnetic waves, in the frequency range from about 100 GHz to 10 THz. There is need to understand the structure of the laser and its influence on the performance in order to optimize the design process. One way of collating this information is by having ontologies and knowledge bases capturing the various qcl designs and their performance characteristics. Majority of the laser design data is usually contained in scientific literature. The main drawback of such textual data sources is their unstructured nature. The complex nature of the laser design and the varying author language styles poses some level of difficulty in retrieving this information. Owing to this, the existing methods needs improvement in order retrieve the laser information at a high precision (with minimal number of incorrect records extracted) and minimized number of correct records not extracted. In this paper, we tackle this initial challenge by proposing a text mining pipeline for mining the qcl properties by extending the grammar rules of a conditional random field (CRF) based model using a rule-based approach. The properties of interest include: hetero-structure (laser stacking properties), working temperature, lasing frequency, laser thickness and the optical power. We evaluate the pipeline on sample open access journal papers from AIP, OPTICA and IOP Publishers.
Loading