Abstract: Thermal object detection must remain reliable as object and
background appearance drifts across time of day, weather,
and season. We tackle this challenge with an appearance-
guided Mixture of Experts (MoE) that learns to route
each image to a subset of specialized backbones. A self-
supervised appearance encoder produces embeddings that
drive a lightweight router; experts are pretrained on clus-
ters of these embeddings to encourage specialization, and
all experts share a single detection head to avoid the linear
growth in parameters typical of ensembles. At inference,
we adopt a tuning-free, compute-aware policy that activates
the fewest experts whose cumulative routing probability ex-
ceeds a fixed threshold. Training is stabilized with com-
plementary batch- and sample-level load-balancing losses
that prevent expert collapse and promote diverse routing.
On LTDv2 (natural long-term drift) and FLIR ADAS (sim-
ulated drift), our MoE achieves the highest peak accuracy
and superior month-to-month ranking consistency, demon-
strating that appearance-guided routing provides more re-
liable performance across diverse thermal conditions than
monolithic scaling. The result is a practical and scalable
detector that remains accurate under distribution shift and
adapts its compute at test time.
Loading