Adapting the Attention of Cloud-Based Recognition Model to Client-Side Images without Local Re-Training

Published: 05 Sept 2024, Last Modified: 16 Oct 2024ACML 2024 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: image recognition, model adaptation
Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
TL;DR: We propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models to adapt the model to client-side images, which only requires one-time cloud based training.
Abstract: The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients. Images uploaded by each client typically come from a small subset of classes. From the cloud-client discrepancy on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by focusing on each client's local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given an image to be recognized from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's local images, thereby calibrating the focus and the recognition result. We further identify the bottleneck of ICIIA's overhead being in linear projection, propose to group and shuffle the features before the projections, and allow increasing the number of feature groups to dramatically improve efficiency without scarifying much accuracy. We extensively evaluate ICIIA and compare its performance against several baselines, demonstrating effectiveness and efficiency. Specifically, for a partitioned version of ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA improves the classification accuracy to 83.37\% (+8.11\%) and 88.86\% (+5.28\%), while adding only 1.62\% and 0.02\% of FLOPs, respectively. Source code is available in the supplementary materials.
A Signed Permission To Publish Form In Pdf: pdf
Supplementary Material: zip
Url Link To Your Supplementary Code: https://github.com/mikudehuane/ICIIA
Primary Area: Applications (bioinformatics, biomedical informatics, climate science, collaborative filtering, computer vision, healthcare, human activity recognition, information retrieval, natural language processing, social networks, etc.)
Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the guidelines in Call for Papers.
Student Author: Yes
Submission Number: 81
Loading