Abstract: The past few years have witnessed a paradigm shift in artificial intelligence, with foundation models emerging as a unifying and transformative force across diverse domains. These models, built upon massive data, offer unprecedented generalization capabilities, enabling scalable solutions to complex multimedia tasks. From text and images to video, audio, and beyond, foundation models are increasingly becoming the cornerstone for next-generation multimedia systems.
External IDs:dblp:journals/ieeemm/ChengLGAHL25
Loading