Abstract: The investigation of users’ behaviour on Web and Social Media platforms usually requires to analyze many heterogeneous features, such as shared textual content, social connections, demographic traits, and temporal attributes. This work aims to compute accurate user similarities on Twitter just using the textual content shared by users, a feature known to be easy and quick to collect. We design and train a 2-stages hierarchical Transformer-based model, whose first stage independently elaborates single tweets, and its second stage combines the embeddings of the tweets to obtain user-level representations. To evaluate our model we design a ranking task involving many accounts, automatically collected and labeled without the need for human annotators. We extensively investigate hyper-parameters to obtain the best model configuration. Finally, we check whether the obtained embeddings reflect our idea of similarity by testing them on further tasks, including community visualization, outlier detection, and polarization quantification.
0 Replies
Loading