Calibration of Evidential Mass Function for Trustworthy Information Processing in Scene Graph Generation

Published: 01 Jan 2024, Last Modified: 31 Oct 2024FUZZ 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scene graph generation consists of the prediction of triplets $<$ subject, predicate, object $>$ to describe the content of an image. The current data sets for this task suffer from the incompleteness of the triplet annotations. On the one hand, the number of triplets is not exhaustive, and on the other, the granularity of predicates varies like $<$ person, $on$ , chair $>$ and $<$ per son, sitting on, chair $>$ . Most current models produce scores on the set of the possible predicates. This score is then directly used for prediction, classically by a pseudo-optimal expectation. The problem is that the set of possible predicates is regarded as a probability space despite predicates not being mutually exclusive, like $on$ and sitting on. This article reconsiders the representation information of scene graph generation and proposes an approach for predicate classification in the Belief function theory based on the Dirichlet multiclass calibration method coupled with hierarchical prior information that explicits the predicates relations. The Dirichlet calibration applied to the scores enables us to interpret the mass associated with a predicate as the probability of knowing nothing more than this predicate. Our experiments are carried out on the Visual Genome dataset using scores estimated by an existing transformer-based scene graph generation model. These experiments show a decrease in log-loss after applying our method and a diminution of the evidential conflict with the ground-truth predicates. Finally, we illustrate the benefit of our evidential representation by applying a hierarchy-based evidential decision and comparing it to the baseline.
Loading