Abstract: Numerous studies have focused on inference of age and gender. We consider a new approach that takes advantage of contrastive learning methods by using both text and image content for this prediction task. We also consider the case where only text or image data is available. Under both of these conditions, we show that our model achieves better performance than the state-of-the-art ones, and still performs well with text/images only. Moreover, because demographic datasets can be small, we also consider combining different datasets to understand when augmentation is valuable and when it is not.
0 Replies
Loading