Track: Track 2: Dataset Proposal Competition
Keywords: Retrosynthesis, Learning To Rank models, LTR, AI for chemistry, AI for retrosynthesis, AI for science
Abstract: Single-step retrosynthesis needs both accurate
first-ranked suggestions and candidate lists that
are rich enough for downstream selection. We
study this as a proposal-selection decomposition. Our system, RETROSPECT, combines
a single Transformer proposal model, which
we call the ChemAlign Transformer, with a
LambdaMART reranker over structural, reactiontemplate, upstream-score, and optional DFTderived descriptors. The generator is trained with
hybrid root-aligned and random SMILES augmentation, Pre-LayerNorm, tied embeddings, exponential moving average weights, and a differentiable atom-balance auxiliary loss. On the full
USPTO-50K test set of 5,007 reactions, the generator reaches 55.00% top-1 and 86.18% top-10
exact-match accuracy with 99.86% top-1 validity.
On the merged candidate-pool benchmark used
for reranking, which contains 5,007 test products
and about 111 candidates per product, a LambdaMART model trained on the structural feature
set reaches 59.4% top-1 with 0.7171 mean reciprocal rank. Feature ablations show that upstream
proposal score and template-frequency statistics
provide most of the reranking signal, while DFT
and reaction-center DFT features provide smaller
and less consistent gains. These results support a
modular view of retrosynthesis: stronger singlemodel proposal and learned candidate selection
are complementary, and the proposal model can
serve as a drop-in component for ensemble systems such as RetroChimera (Maziarz et al., 2024).
Submission Number: 48
Loading