PlantPhenoLM: Phenotype-Genotype Mapping Inference with Multi-Turn LLM Reasoning and Selective Prediction

Rajashik Datta; Sanjan Baitalik; Amit Kumar Das; Sruti Das Choudhury

PlantPhenoLM: Phenotype-Genotype Mapping Inference with Multi-Turn LLM Reasoning and Selective Prediction

Rajashik Datta, Sanjan Baitalik, Amit Kumar Das, Sruti Das Choudhury

Published: 28 Dec 2025, Last Modified: 08 Mar 2026AAAI 2026 Bridge LMReasoningEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Phenotype-genotype mapping, Large Language Models, High-throughput phenotyping, Retrieval-augmented generation, Selective prediction, Evidence-grounded reasoning

Abstract: Accurate genotype prediction of plants from their high-throughput phenotypic measurements has great potential to accelerate breeding workflows. However, practical deployment requires more than predictions - practitioners need calibrated confidence, evidence-based explanations, and safe avoidance when the phenotype evidence is ambiguous. We introduce PlantPhenoLM, a novel algorithm that wraps a standard phenotype classifier with (i) retrieval-based evidence from phenotypically similar plants and (ii) a Large Language Model (LLM)-based reasoning layer. PlantPhenoLM implements an explicit evidence-fusion score-based selective prediction policy for a reliable and interpretable outcome. Across cross-validation (aggregated $n{=}42$ held-out plants), PlantPhenoLM achieves strong top-$k$ recovery (top-5 $\approx 0.95$ across modes) and modest gains in top-1 accuracy, demonstrating the efficacy of the algorithm.

Submission Number: 112

Loading