Abstract: The tutorial "Large Vision-Language Model in the Society" aims to provide a comprehensive overview of state-of-the-art techniques and applications of large vision-language models (LVLMs), which integrate visual and textual data to transform multimedia research and applications. LVLMs are poised to revolutionize domains such as content creation, social media analysis, education, healthcare, and entertainment by enabling sophisticated content analysis, retrieval, and generation. This tutorial will cover the fundamentals of vision-language integration, state-of-the-art models, training techniques, applications, ethical considerations, and future directions. It is designed to be educational and instructive, providing an in-depth introduction rather than a cursory survey. Attendees will gain practical skills, and insights into the latest research, and engage in interactive sessions to reinforce learning. By addressing both technical and societal aspects, the tutorial will significantly benefit the multimedia community, driving innovation and progress in the field.
Loading