Abstract: To enable video models to be applied seamlessly across video tasks in different environments, various Video Unsupervised Domain Adaptation (VUDA) methods have been proposed to improve the robustness and transferability of video models. Despite improvements made in model robustness, these VUDA methods require access to both source data and source model parameters for adaptation, raising serious data privacy and model portability issues. To cope with the above concerns, this paper firstly formulates Black-box Video Domain Adaptation (BVDA) as a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor. While a few methods for Black-box Domain Adaptation (BDA) are proposed in the image domain, these methods cannot apply to the video domain since video modality has more complicated temporal features that are harder to align. To address BVDA, we propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations. They are the endo-temporal regularization and exo-temporal regularization, which are performed across both clip and temporal features, while distilling knowledge from the predictions obtained from the black-box predictor. Empirical results demonstrate the state-of-the-art performance of EXTERN across various cross-domain closed-set and partial-set action recognition benchmarks, which even surpasses most existing video domain adaptation methods with source data accessibility. Code will be available at https://xuyu0010.github.io/b2vda.html.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=v6x6Nf3FPA
Changes Since Last Submission: 2023/09/27 update:
Modification made to fix the font issue to conform to the submission template.
2023/12/16 update:
Dear Reviewers:
We would like to appreciate the acknowledgement of the reviewers over the contributions of the proposed BVDA task and the EXTERN method. We would also like to thank the reviewers for their suggestions. The revised manuscript is uploaded with updates addressing the concerns of Reviewer Qj8M and Reviewer pFxC. The updates include:
- 1. a comprehensive review of the manuscript with corrections to all the typos and reductions in long sentences;
- 2. additional experiments to explore the relative importance of the endo- and exo-temporal regularization terms;
- 3. clearer definition of both regularization terms and the cluster assumption aided with description re-structuring;
- 4. supplement of missing captions for figures and notations.
We hope the revised manuscript conforms to the request changes, while our responses can address the concerns raised. All the updates are highlighted in blue color in the revised manuscript.
2023/12/18 update:
Dear Reviewers:
We would like to once again thank the reviewers for their suggestions and apologize for the delayed response to Reviewer SMeM. The revised manuscript is uploaded with updates addressing the concerns of Reviewer SMeM, where the updates are also highlighted in blue color.
2024/02/15 update:
We would like to thank all reviewers for their recognition of our amendments and their decision of acceptance. Camera ready version uploaded with link to project/code.
Code: https://xuyu0010.github.io/b2vda.html
Assigned Action Editor: ~David_Fouhey2
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1612
Loading