Evaluating visual "common sense" using fine-grained classification and captioning tasksDownload PDFOpen Website

2018 (modified: 17 May 2023)ICLR (Workshop) 2018Readers: Everyone
Abstract: We introduce the Something-something V2 dataset, which contains captions of finely-varying human-object interactions. We also discuss various baseline models, and show that neural networks show surprisingly strong performance on many of the very hard, detailed discrimination tasks associated with this dataset.
0 Replies

Loading