First Comprehensive Benchmark for Tailored Small Molecule-Binding Aptamer Design

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: aptamer, small molecule, binding, prediction, generalization, SELEX, machine learning, benchmark, de-novo design
TL;DR: We present a unified benchmark dataset and baseline evaluation for aptamer–small molecule binding prediction.
Abstract: Aptamers are emerging as robust recognition elements for diagnostics and therapeutics, yet computational discovery pipelines remain limited to proteins, leaving small-molecule binding largely unexplored. To fill this gap, we present the first unified benchmark for aptamer–small molecule interactions, built from seven curated sources and comprising 2,210 annotated pairs, 1,430 unique DNA- and RNA-based aptamers, and 496 ligands spanning a broad chemical space. Over half of the entries include quantitative binding affinities, enabling both classification and regression tasks, while synthetic negatives generated via cross-pair sampling allow to rationally balance the dataset. Using this dataset, we conducted a systematic benchmarking study across multiple splitting and representation strategies for both aptamers and ligands. Our experiments covered discrete encodings, pretrained embeddings, and hybrid fusion schemes, evaluated with both shallow and deep learning (DL) models. This analysis establishes stable baselines for binding prediction and reveals the strengths and weaknesses of sequence- and embedding-based features. Beyond classification, we also provide the first regression baselines isolating the impact of aptamer-molecule compositional information on quantitative binding affinity estimation. This framework represents the next step toward scalable, data-driven aptamer discovery beyond SELEX-based single target-centered models and large scale computational screening using molecular docking.
Submission Number: 464
Loading