A table is worth a thousand pictures: Multi-modal contrastive learning in house burning classification in wildfire events
Keywords: multimodal classification, wildfires, remote-sensing, large language models
TL;DR: Multimodal classification using LLMs and vision models outperforms traditional methods in classifying house burning in wildfire events.
Abstract: Wildfires have increased in frequency and duration over the last decade in the Western United States. This not only poses a risk to human life, but also results in billions of dollars in private and public infrastructure damages. As climate change potentially worsens the frequency and severity of wildfires, understanding their risk is critical for human adaptation and optimal fire prevention techniques. However, current fire spread models are often dependent on idealized fire and soil parameters, hard to compute, and not predictive of property damage. In this paper, we use a multimodal model with image and text embeddings that allows both image and text representations in the same latent space, to predict which houses will burn down in the event of wildfires. Our results indicate that the DE model achieves better performance than the unimodal baselines for image-only and text-only models (i.e. ResNet50 and XGBoost), and text or vision only models. Moreover, following other models in the literature, it outperform these models also in low-data regimes.
Submission Number: 29
Loading