Keywords: Large Language Model, Viral Surveillance, Novelty Detection, Foundation Models, Influenza
TL;DR: We use a transfomer model to generate Influenza viral sequences and explore the evolution of the virual population.
Abstract: Current influenza surveillance relies on hemagglutination inhibition assays that are retrospective, time-consuming, and increasingly unreliable. While viral genome sequencing outpaces serological testing, extracting actionable surveillance intelligence from sequence data remains challenging. We present a framework integrating large language models for viral sequence generation with unsupervised novelty detection to prioritize potentially antigenic variants.
Our approach fine-tunes genSLM on influenza hemagglutinin sequences using codon-level tokenization. We implement two complementary generation strategies: temperature-based autoregressive generation for routine surveillance scenarios, and reward-guided beam search enabling insertions/deletions for pandemic emergence modeling. The reward-guided approach parallels reinforcement learning from human feedback techniques used in ChatGPT, balancing model likelihood with biological constraints.
We apply genomic measures transform variant assessment from single sequence prediction to population-based risk stratification. Validation against the 2009 pandemic emergence confirms biological relevance. Our framework demonstrates the an application of reward-guided generation to viral surveillance, establishing new paradigms for AI-augmented pandemic preparedness.
Submission Number: 1
Loading