Abstract: The evolution of scene understanding in computer vision has seen remarkable advancements, driven significantly by the development and utilization of scene graphs due to their powerful structural and semantic representation. This structured approach allows for better contextual understanding, facilitating tasks such as image captioning, image generation, image retrieval, human-object interaction, and visual question answering. This tutorial paper aims to comprehensively investigate the current scene graph research by discussing their generation methods, applications, standard datasets, and future development insights.
Loading