Machine Learning for Ionization Potentials and Photoionization Cross Sections of Volatile Organic Compounds
Abstract: Molecular ionization potentials (IP) and photoionization
cross sections (σ) can affect the sensitivity of photoionization detectors
(PIDs) and other sensors for gaseous species. This study employs several
methods of machine learning (ML) to predict IP and σ values at 10.6 eV
(117 nm) for a dataset of 1251 gaseous organic species. The explicitness of
the treatment of the species electronic structure progressively increases
among the methods. The study compares the ML predictions of the IP and σ
values to those obtained by quantum chemical calculations. The ML
predictions are comparable in performance to those of the quantum
calculations when evaluated against measurements. Pretraining further
reduces the mean absolute errors (ε) compared to the measurements. The
graph-based attentive fingerprint model was most accurate, for which εIP =
0.23 ± 0.01 eV and εσ = 2.8 ± 0.2 Mb compared to measurements and
computed cross sections, respectively. The ML predictions for IP correlate well with both the measured IPs (R2 = 0.88) and with IPs
computed at the level of M06-2X/aug-cc-pVTZ (R2 = 0.82). The ML predictions for σ correlated reasonably well with computed
cross sections (R2 = 0.66). The developed ML methods for IP and σ values, representing the properties of a generalizable set of
volatile organic compounds (VOCs) relevant to industrial applications and atmospheric chemistry, can be used to quantitatively
describe the species-dependent sensitivity of chemical sensors that use ionizing radiation as part of the sensing mechanism, such as
photoionization detectors.
Loading