VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextDownload PDF

21 May 2021, 20:52 (edited 23 Jan 2022, 05:48)NeurIPS 2021 PosterReaders: Everyone
Keywords:
TL;DR:
Abstract:
Supplementary Material: pdf
Code Of Conduct:
Code:
24 Replies

Loading