Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

Published: 05 Apr 2023, Last Modified: 05 Apr 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: A key challenge for much of the machine learning work on remote sensing and earth observation data is the difficulty in acquiring large amounts of accurately labeled data. This is particularly true for semantic segmentation tasks, which are much less common in the remote sensing domain because of the incredible difficulty in collecting precise, accurate, pixel-level annotations at scale. Recent efforts have addressed these challenges both through the creation of supervised datasets as well as the application of self-supervised methods. We continue these efforts on both fronts. First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility. Second, we extend this dataset with the release of 3600 large, high-resolution (10cm/pixel), full-field, red-green-blue and near-infrared images for pre-training. Third, we incorporate the Pixel-to-Propagation Module Xie et al. (2021b) originally built on the SimCLR framework into the framework of MoCo-V2 Chen et al.(2020b). Finally, we demonstrate the usefulness of this data by benchmarking different contrastive learning approaches on both downstream classification and semantic segmentation tasks. We explore both CNN and Swin Transformer Liu et al. (2021a) architectures within different frameworks based on MoCo-V2. Together, these approaches enable us to better detect key agricultural patterns of interest across a field from aerial imagery so that farmers may be alerted to problematic areas in a timely fashion to inform their management decisions. Furthermore, the release of these datasets will support numerous avenues of research for computer vision in remote sensing for agriculture.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=dGXSfzn1b0&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: `(1) Editions for Reviews:` Dear all reviewers, We have updated our manuscript according to the reviewers' feedback. Changes are highlighted with a different color. More precisely: 1. We have added more descriptions of the Fine-Grained Segmentation Dataset to make the segmentation task clearer for reviewer **GJ9C**. Details can be found in Section 3.3. 2. We have added a visualization and related information of temporal information/temporal contrast in Section 3.2 and Section 4.3 to address the concerns of reviewer **GJ9C**. 3. We have added experimental details to compare the training process between the supervised segmentation and our self-supervised learning method following the suggestions of reviewer **osPe** in Section 5.3.2. 4. We have modified Table 2 with the number of parameters of trained models to reflect the model's size, as reviewer **osPe**suggested. 5. We have added additional descriptions of raw images in Section 3.2 and further discussion of data augmentation pipelines in Section 5.1 to address the concerns of **VzZZ**. 6. We have given further discussion and analysis of Table 1 in Section 5.3.1 to illustrate why MoCo-PixPro shows relatively poor performance when backbones are small, as pointed out by reviewer **VzZZ**. 7. We have clarified the details of the selection methods of positive pairs in Section 4.2.2 to address the concerns of reviewers **VzZZ** and **TPh3**. 8. We have fixed the logic issue as pointed out by reviewer **TPh3**. 9. We have added further descriptions of Figure 2 in Section 4.4 to ensure it is self-contented and clarifies the notion of loss functions in Section 4.2.1, as suggested by **TPh3**. 10. We have added clear experimental setups in Section 5.1 for pre-training, and Sections 5.2 and 5.3 for downstream tasks to address the concerns of reviewer **TPh3**. 11. We have added extra results in Table 2 to illustrate the model's performance with and without NIR channels, as suggested by **TPh3**. 12. We have added a table (Table1) to compare data statistics of AV and AV+, as suggested by **TPh3**. `(2) Camera Ready Revision:` 1. We incorporated suggested discussions and clarifications. 2. We revised the format of citations/typos throughout the paper. 3. We switched the color of the previously added content to black.
Video: https://youtu.be/2xaKxUpY4iQ
Code: https://github.com/jingwu6/Extended-Agriculture-Vision-Dataset
Assigned Action Editor: ~Kui_Jia1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 515