Leveraging Biokinetic Knowledge Priors for Data-Scarce Bioprocess Modeling

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: bioprocess, physics informed, growth curve, ordinary differential equation, biokinetic ode, simulation, pretraining, fed-batch process, batch process
TL;DR: We inject biokinetic ODE knowledge into deep learning via two priors : simulation pretraining (data) and ODE-structured decoders (structural). Biokinetic prior injection improves performance at data scarce bioprocess prediction tasks.
Abstract: While deep learning has accelerated drug discovery, its impact on biomanufacturing, the production stage in which candidate molecules are scaled up in bioreactors, has been considerably more limited. The reason is data scarcity: bioreactor experiments are expensive, require days to weeks to complete, and are rarely shared in public form, so generic neural decoders tend to overfit. Microbial growth dynamics, in contrast, have been described by biokinetic ordinary differential equation (ODE) models for several decades; how this knowledge should be injected into a neural network has not been studied systematically. We compare two orthogonal channels for injecting biokinetic priors on a single task with a shared backbone: simulation pre-training, where synthetic curves drawn from biokinetic ODEs pre-train a generic decoder, and architecture-level priors, where the ODE is embedded directly in the decoder. Across 11 datasets and 7 microbial species, both channels improve over no-prior baselines. Simulation pre-training is the more effective of the two: a generic decoder with simulation pre-training attains R² ≈ 0.515, matching a fully bio-structured decoder (0.554) trained on real data alone. The two channels therefore act as substitutes, and biokinetic specificity is the key factor: random-curve simulation fails, and pre-training outperforms joint training. Together, these results position simulation pre-training as a practical, data-efficient strategy for deploying deep learning in data-scarce bioprocess settings.
Submission Number: 258
Loading