LGM3A '23: 1st Workshop on Large Generative Models Meet Multimodal Applications

Zheng Wang, Cheng Long, Shihao Xu, Bingzheng Gan, Wei Shi, Zhao Cao, Tat-Seng Chua

Published: 2023, Last Modified: 19 Mar 2026ACM Multimedia 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: A large language model is a type of artificial intelligence model designed to understand and generate natural language text, such as GPT, T5, RoBERTa, BERT, etc. These models are trained on vast amounts of text data, allowing them to learn the patterns and structures of human language. With the increasing amount of multimodal information such as audio, visual, and text data generated, there is a growing need of leveraging large generative language model for multimodal applications. Recently, a few notable multimodal models (e.g., BLIP, Flamingo, KOSMOS, PaLM-E, LLaVA, Visual ChatGPT, GPT-4, etc.) with a combination of large language models significantly enhanced their understanding and generate more accurate and nuanced responses. The workshop will provide an opportunity for researchers, practitioners, and industry professionals to explore the latest trends and best practices in the field of multimodal applications of large generative models. The workshop will also focus on exploring the challenges and opportunities of integrating large language models with other AI technologies such as computer vision and speech recognition.