Abstract: This work presents the first condensation approach for
procedural video datasets used in temporal action segmen-
tation. We propose a condensation framework that lever-
ages generative prior learned from the dataset and network
inversion to condense data into compact latent codes with
significant storage reduced across temporal and channel
aspects. Orthogonally, we propose sampling diverse and
representative action sequences to minimize video-wise re-
dundancy. Our evaluation on standard benchmarks demon-
strates consistent effectiveness in condensing TAS datasets
and achieving competitive performances. Specifically, on
the Breakfast dataset, our approach reduces storage by over
500×while retaining 83% of the performance compared to
training with the full dataset. Furthermore, when applied to
a downstream incremental learning task, it yields superior
performance compared to the state-of-the-art.
Loading