Scaling Multimodal Large Language Models: Perception, Long-Horizon Reasoning & Memory

Jitesh Jain

Scaling Multimodal Large Language Models: Perception, Long-Horizon Reasoning & Memory

Jitesh Jain

Published: 09 Jun 2026, Last Modified: 09 Jun 2026CVPR2026DCEveryoneRevisionsBibTeXCC BY 4.0

CV: pdf

Research Statement: pdf

PhD University: Georgia Tech

Graduation Date: November 2026

PhD Advisor: Humphrey Shi

Advisor Confirmation: pdf

Google Scholar: https://scholar.google.com/citations?view_op=list_works&hl=en&user=nygnfNwAAAAJ

Poster Title: SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

First Author Papers: 3+

Requested Mentors Instructions: I understand — my preferred mentors are listed below.

Requested Mentor 1: Victoria Lin, victoriaxlin.ai{AT}gmail.com

Requested Mentor 2: Christopher Clark, chrisclark425@gmail.com

Requested Mentor 3: Saining Xie, saining.xie@nyu.edu

Requested Mentor 4: Chunting Zhou, chunting.violet.zhou@gmail.com

Requested Mentor 5: Jim Fan, dr.jimfan.ai@gmail.com

Requested Mentor 6: Feng Li, fliay@connect.ust.hk

Requested Mentor 7: Jeff Dean, jeff@google.com

Requested Mentor 8: Boyi Li, boyilics@gmail.com

Academia Or Industry: Industry

Mentor Questions: - What role do you think memory can play in understanding and generating long videos? do we go the gemini route (brute force long context) or go the SAGE route (adaptive reasoning with retrieval) or some other way? - During human evolution process, vision abilities developed much earlier than language, however, for AI we are seeing the opposte to be true. Language makes learning vision easier, is that the right approach to be taken or should we go the human route? - Is robotics the best bet if one wants to work on vision/multimodal problems in the future? (physical AI that is)

Special Circumstances: I have travel funding from other sources

CVPR Participation: I will be presenting a paper at the main conference

Previous DC Certification: I certify that I have not previously attended a Doctoral Consortium at ICCV, ECCV or CVPR.

In Person Confirmation: I confirm that I plan to attend CVPR 2026 in person.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 19

Loading