Tabular Deep Learning vs Classical Machine Learning for Urban Land Cover Classification

Published: 24 Nov 2025, Last Modified: 24 Nov 20255th Muslims in ML Workshop co-located with NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Urban Land Cover, Tabular Data, Tabular Deep Learning, Applied Machine Learning, TDL vs GBDT, Remote Sensing
TL;DR: On UCI ULC, ensembles (CatBoost, RF) lead; TDL competes with class-weighted CE use ensembles as baselines, add imbalance-aware TDL for complex/minority-recall cases.
Abstract: Urban Land Cover (ULC) classification plays a crucial role in urban planning, environmental monitoring, and sustainable development. We study this task using the ULC dataset from the UCI Machine Learning Repository, which includes tabular features derived from high-resolution aerial imagery across nine classes (e.g., roads, trees, grass, water). The dataset presents typical remote sensing challenges, including high dimensionality, heterogeneous features, and class imbalance. In a unified, reproducible pipeline, we benchmark classical machine learning models (e.g., Logistic Regression, SVM, Random Forest, XGBoost, CatBoost) against Tabular Deep Learning (TDL) models (TabNet, FT-Transformer, TabTransformer, TabSeq, and 1D CNNs). To address class imbalance, we employ weighted cross-entropy loss for TDL models and evaluate performance using accuracy, macro-precision, macro-recall, macro-F1, AUC-ROC, and confusion matrices. Our results show that while tree ensembles remain strong general baselines, TDL models can match or exceed their performance when non-linear interactions are significant and imbalance handling is effective, providing complementary advantages for urban land cover mapping. See code: https://github.com/mtesha/tdl-vs-ml-urbanlandcover
Submission Number: 75
Loading