IndiaSat: A Pixel-Level Dataset for Land-Cover Classification on Three Satellite Systems - Landsat-7, Landsat-8, and Sentinel-2

Published: 01 Jan 2021, Last Modified: 31 Aug 2024COMPASS 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Land-cover (LC) classification is required for land management and planning models, and is increasingly done through remote sensing data. Supervised machine learning methods applied to satellite imagery can help with high-resolution LC classification but demand a labeled dataset for training and evaluation of the models. The availability of such datasets is limited though, especially for developing regions like in India. We describe a large pixel-level dataset, IndiaSat, that we have curated and provided for open use, consisting of 180,414 pixels labeled into four LC classes: greenery, water bodies, barren land, and built-up area. Initial labels are obtained through the crowd-sourced mapping platform Open Street Maps (OSM), and then manually curated and corrected. We describe our data cleaning methodology and ensure spatial diversity across different geographic regions in the country. We show that the IndiaSat dataset can be used to train simple classifiers deployed on commodity platforms like Google Earth Engine (GEE) for three popular and openly accessible satellite systems: Landsat-7, Landsat-8, and Sentinel-2, with high accuracy, and to additionally build LC change detection models to determine pixel-level changes over a sequence of several years.
Loading