DriMM: Drilling Multimodal Model for Time-Series and Text in the Era of Large Models

Sebastiaan Buiting; Soumyadipta Sengupta; Abdallah Benzine; Amine EL KHAIR; Imane Khaouja; Youssef Tamaazousti

DriMM: Drilling Multimodal Model for Time-Series and Text in the Era of Large Models

Sebastiaan Buiting, Soumyadipta Sengupta, Abdallah Benzine, Amine EL KHAIR, Imane Khaouja, Youssef Tamaazousti

Published: 09 Jun 2025, Last Modified: 03 Jul 2025FMSD @ ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Learning, Contrastive Learning, Time Series Data, Time Series, Text Data, Drilling Operations, Foundation Models, Zero-Shot Learning, Cross-Modal Retrieval, Moirai, Moment, ICML

TL;DR: DriMM uses multimodal contrastive learning with large time series and text models to align drilling sensor data and reports, enabling cross-modal retrieval and zero-shot activity classification.

Abstract: Multimodal contrastive learning can align time series sensor data with textual descriptions, but its use in industrial settings is still rare. This paper introduces DriMM, a Drilling Multimodal Model that learns joint representations from time series sensor data and textual activity labels from Daily Drilling Reports. DriMM uses large models for time series and pretrained language models to build a shared embedding space across modalities. Our experiments show that DriMM enables cross-modal retrieval and zero-shot classification of drilling activities. As a side effect, the learned mono-modal representations also improve linear probing classification accuracy compared to generic pretrained baselines. These results demonstrate the potential of large models for multimodal learning in domain-specific industrial tasks.

Submission Number: 73

Loading