Scientific Olfactory Information Extraction: Toward a Unified NLP Framework for Chemosensory Knowledge Discovery

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Information Extraction, Named Entity Recognition, Relation Extraction, Olfaction, Biomedical NLP, Scientific Literature, Annotation Scheme
Abstract: Olfaction plays a key role in fields such as chemistry, neuroscience, and food science, yet the scientific literature on smell remains largely unstructured. Chemists report odor descriptors, thresholds, and intensity scales, neuroscientists describe perceptual judgments and receptor activity, and environmental studies measure odor pollution and volatile emissions (Poivet et al., 2018, Breer, 2003, Nie et al., 2020). Across these domains, the same perceptual quality of the chemical amyl mercaptan might be called “putrid” in a sensory study, “thiol-like” in chemistry, and “sulfurous” in neuroscience, making knowledge extraction difficult. Existing work provides foundations for sensory semantics (Hörberg et al., 2022) and smell extraction from historic text (Menini et al., 2023), but none address the extraction of olfactory knowledge from scientific literature. Particularly, chemical terminology, biomedical terminology, and sensory measurements from unstructured scientific texts have not been integrated into a unifying framework. This gap limits our ability to mine scientific literature for applications such as constructing chemosensory knowledge graphs and linking molecular structure to odor perceptual quality, a key task in computational neuroscience for olfaction. We propose Scientific Olfactory Information Extraction (Sci-OIE), an NLP task that spans chemistry, neuroscience, and food science, which aims to extract structured knowledge about olfaction from scientific literature. We developed a preliminary annotation scheme that includes 9 entities (Odor Quality, Odor Source, Measurement, Instrument, Experiencer, Intensity, Smell Test, Olfactory Dysfunction, Disease) and 5 relations (DETECTED BY, ASSOCIATED WITH, CAUSES, DIAGNOSES, HAS COMPONENT) and conducted a pilot annotation of 39 title and abstracts. Our preliminary results show that 97 of 162 individual odor source entities are multi-word, ranging from chemical names ‘phenyl ethanol’ to more complicated spans such as ‘fear and anxiety body odors’. Additionally, across these 39 abstracts, 25 Odor Qualities (descriptions) are present in abstracts (average of ~2 per abstract), and full-text annotation of 5 articles found an average of ~13 Odor Quality entities per randomly selected full-text article, due to the longer length. Qualitative analysis of these 5 articles found that in 2 of the 5 articles, mentions of human participants describing odor sources were referenced in the abstract, but the Odor Quality entities themselves were found only in the full-text, motivating future exploration of full-text annotations. We will present our ongoing work on our annotation scheme, benchmark dataset, and baseline models, building towards olfactory knowledge discovery from scientific literature. Heinz Breer. 2003. Olfactory receptors: molecular basis for recognition and discrimination of odors. Analytical and Bioanalytical Chemistry, 377(3):427–433. Thomas Hörberg, Maria Larsson, and Jonas K. Olofsson. 2022. The Semantic Organization of the English Odor Vocabulary. Cognitive Science, 46(11):e13205. Stefano Menini, Teresa Paccosi, Serra Sinem Tekiroğlu, and Sara Tonelli. 2023. Scent Mining: Extracting Olfactory Events, Smell Sources and Qualities. In Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, and Stan Szpakowicz, editors, Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 135–140, Dubrovnik, Croatia. Association for Computational Linguistics. Erqi Nie, Guodi Zheng, and Chuang Ma. 2020. Characterization of odorous pollution and health risk assessment of volatile organic compound emissions in swine facilities. Atmospheric Environment, 223:117233. Erwan Poivet, Narmin Tahirova, Zita Peterlin, Lu Xu, Dong-Jing Zou, Terry Acree, and Stuart Firestein. 2018. Functional odor classification through a medicinal chemistry approach. Science Advances, 4(2):eaao6086.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 162
Loading