Student Lead Author Indication: Yes
Keywords: multimodal datasets, data improvement, dataset filtering, modality alignment, noise reduction, multimodal tasks.
TL;DR: A novel filtering method using the Unified Filtering Score improves multimodal dataset quality and enhances downstream task performance by evaluating and optimizing modality alignment.
Abstract: Multimodal models have made significant strides in handling diverse downstream tasks, yet the quality of the datasets they rely on remains a critical challenge. While large-scale datasets encompassing multiple modalities like image, text, and audio are crucial for training such models, these datasets often contain noisy data, which hampers their performance. Existing approaches primarily filter datasets based on pairwise modality alignment, which is insufficient for datasets with three or more modalities. To address this, we propose a novel filtering method leveraging the Unified Filtering Score (UF-Score), which evaluates data quality by considering the mean and variance of alignment scores across all possible modality pairs. Using modality-specific encoders, alignment scores are computed via cosine similarity within a shared embedding space. Our approach effectively filters low-quality data, retaining subsets that maximize alignment quality. Experiments demonstrate that this method significantly improves performance across multimodal tasks, even with reduced dataset sizes.
Submission Number: 25
Loading