Abstract: Since the recent advent of transformer-based models, image captioning tasks have shown remarkable achievements. However, despite this success, there is a problem that needs to be addressed: data bias. It is assumed that the problem arises because existing studies only focused on generating captions that were natural in context and did not consider the relationship between the subject and object. Under this premise, i believe that the bias problem can be eliminated by using a sentence structure that considers the relationship between the subject and the object. Based on this hypothesis, we propose a novel image captioning method to solve the bias problem. In this method, we introduce structural representation loss to take sentence structure into account and a debiasing regularization that is robust to subclasses where bias exists.
Loading