Keywords: Multimodal Learning, Temporal Graphs, Graph Neural Network, Representation Learning
Abstract: Multimodal temporal data analysis presents a challenge: it needs to strike a balance between high resolution for capturing sudden events and a wide temporal range for scalability. This often results in vast graph models that can be computationally intractable. Current approaches tend to either break the sequences into fixed-length segments or trim edges to stay within budget constraints, often at the cost of fidelity.
We introduce EAMC–C2SG, a novel framework that dynamically compresses temporal streams into segments tailored to events and creates a sparse graph model that respects temporal ordering. By curbing the proliferation of nodes and edges, our design achieves strict budget control while reducing complexity from a quadratic to a near-linear scale with respect to sequence length.
Our framework preserves valuable information in multimodal temporal data and, when tested on extensive clinical datasets (MIMIC-IV + CXR) and diverse cross-domain benchmarks (TimeMMD), achieves state-of-the-art predictive accuracy with markedly lower latency and memory usage. Beyond raw performance, EAMC–C2SG also offers interpretable segmentations and insightful graph diagnostics, making it a scalable and transparent solution for multimodal temporal learning.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 3247
Loading