Abstract: As enterprises increasingly aim to incorporate artificial intelligence into their workflows to tackle complex, multimodal tasks, the demand for intelligent, robust, and trustworthy systems is paramount. While multimodal large language models offer initial capabilities for processing diverse data streams, their dependence on embedding-based representations limit their effectiveness in delivering semantically grounded explanations and reasoning, as well as qualities essential for enterprise-grade applications. Neurosymbolic approaches provide a promising alternative by enabling traceable, context-aware decision making. However, constructing enterprise-level multimodal knowledge graphs (MMKGs) that enable neurosymbolic approaches remains largely impractical. Although prior efforts have explored MMKG construction, they fall short in addressing the scalability, modularity, and integration that are necessary for any enterprise-grade application. We present a fully automated MMKG construction framework tailored to real-world enterprise environments. Our system features a modular, self-refining lifecycle with a support for human-in-the-loop feedback, enabling scalable, cost-effective, and task-aligned MMKG generation. We demonstrate the practical value of our framework through a real-world case study, showcasing its ability to transform unstructured multimodal data into actionable, semantically grounded knowledge assets for enterprise use.
External IDs:doi:10.1109/mic.2025.3588546
Loading