Abstract: Hashtags are usually adopted to highlight the topic of user-generate contents in many social media platforms. Therefore, hashtags are utilized in many topic-related applications such as user-topic opinion prediction and hashtag recommendation. Obtaining hashtag representations is usually treated as a fundamental task of these applications. These applications either learn hashtag representations from hashtag adoption or represent a hashtag by the representation of its content. However, most of existing hashtag representation learning methods fail to take full advantage of hashtag contents and the effectiveness of representing a hashtag by its content is limited in many cases. In this paper, we propose a content-enhanced hashtag embedding method called ContentHE, which introduce the semantic information of hashtag contents into hashtag representation learning by treating words which compose the hashtags as special nodes in a hashtag network. Specifically, ContentHE first introduces a word embedding space which is generated by a pre-trained language representation model and establishes a heterogeneous network. Each hashtag in the network connects with a set of user-generate contents and words if these words compose the hashtag. Then, ContentHE utilized a multi-task learning model and a sampling strategy called node sampling to map hashtags and user-generate contents to the word embedding space while preserving the network structure information. The performance of ContentHE on two real-word tweet collections demonstrates that it significantly improves the accuracy of hashtag clustering tasks and captures the relationship between the topics which hashtags belong to.
0 Replies
Loading