Abstract: When we are babies, we learn how to see by watching how the world changes and by interacting with it. Can we use these same signals to train vision models? In this thesis, we outline several works which use these paradigms as a basis for learning algorithms. First, we explore learning by watching in which video data is directly used to learn about the visual world. Second, we tackle multiple challenging tasks in embodied environments in which agents learn by interacting with their surroundings.
0 Replies
Loading