Abstract: Semantic image understanding is a challenging topic in computer vision. It requires to detect all objects in an im-
age, but also to identify all the relations between them. Detected objects, their labels and the discovered relations
can be used to construct a scene graph which provides an abstract semantic interpretation of an image. In previous
works, relations were identified by solving an assignment problem formulated as Mixed-Integer Linear Programs. In
this work, we interpret that formulation as Ordinary Differential Equation (ODE). The proposed architecture performs scene graph inference by solving a neural variant of an ODE by end-to-end learning. It achieves state-of-the-art results on all three benchmark tasks: scene graph generation (SGGen), classification (SGCls) and visual relationship detection (PredCls) on Visual Genome benchmark.
0 Replies
Loading