Scaling Multimodal Large Language Models: Perception, Long-Horizon Reasoning & Memory

Published: 09 Jun 2026, Last Modified: 09 Jun 2026CVPR2026DCEveryoneRevisionsBibTeXCC BY 4.0
CV: pdf
Research Statement: pdf
PhD University: Georgia Tech
Graduation Date: November 2026
PhD Advisor: Humphrey Shi
Advisor Confirmation: pdf
Google Scholar: https://scholar.google.com/citations?view_op=list_works&hl=en&user=nygnfNwAAAAJ
Poster Title: SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
First Author Papers: 3+
Requested Mentors Instructions: I understand — my preferred mentors are listed below.
Requested Mentor 1: Victoria Lin, victoriaxlin.ai{AT}gmail.com
Requested Mentor 2: Christopher Clark, chrisclark425@gmail.com
Requested Mentor 3: Saining Xie, saining.xie@nyu.edu
Requested Mentor 4: Chunting Zhou, chunting.violet.zhou@gmail.com
Requested Mentor 5: Jim Fan, dr.jimfan.ai@gmail.com
Requested Mentor 6: Feng Li, fliay@connect.ust.hk
Requested Mentor 7: Jeff Dean, jeff@google.com
Requested Mentor 8: Boyi Li, boyilics@gmail.com
Academia Or Industry: Industry
Mentor Questions: - What role do you think memory can play in understanding and generating long videos? do we go the gemini route (brute force long context) or go the SAGE route (adaptive reasoning with retrieval) or some other way? - During human evolution process, vision abilities developed much earlier than language, however, for AI we are seeing the opposte to be true. Language makes learning vision easier, is that the right approach to be taken or should we go the human route? - Is robotics the best bet if one wants to work on vision/multimodal problems in the future? (physical AI that is)
Special Circumstances: I have travel funding from other sources
CVPR Participation: I will be presenting a paper at the main conference
Previous DC Certification: I certify that I have not previously attended a Doctoral Consortium at ICCV, ECCV or CVPR.
In Person Confirmation: I confirm that I plan to attend CVPR 2026 in person.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 19
Loading