Abstract: The proliferate unstructured data generated in online social networks leads to significant research advances in the recognition of user profiles (e.g., age, gender, ethnicity, etc.), but meanwhile brings new challenges. These attributes are referred to as soft biometrics and provide a semantic description of users. Identifying users' soft biometric traits in online social networks is crucial for a variety of applications such as customized marketing, personalized recommendation, and urban planning. Compared to conventional studies, we address the importance and challenges of identifying soft biometrics in online social networks and provide a case study on gender recognition of Twitter users based on their tweets texts and profile images. We first apply an efficient approach to label the soft biometric attributes of users using their self-reported names, instead of labor-forced methodology such as manually labeling and crowdsourcing. Then we investigate the approaches using texts and profile images, respectively, including Hashtag-based TF-IDF, LDA Topic Distribution, SIFT-based TF-IDF and Convolutional Neural Networks (CNN). Finally, we propose an ensemble method by combining models learned from both texts and images to enhance gender identification. The performance in our extensive experiments is demonstrated to be comparative with state-of the-art work.
0 Replies
Loading