Abstract: With the emergence use of social media, millions of micro-messages are exchanged daily. Although micro-messages are a powerful and efficient way to communicate among individuals, their anonymity and short-length characteristics give rise to a real challenge for Author Identification studies. In this paper, we tackled the Author Identification of micro-messages problem via Convolutional Neural Networks (CNNs). Specifically, we introduce a novel Multi-Channel CNN architecture that processes different features of text via word and character embedding layers, and utilizes both pre-trained word embedding and character bigram embeddings. We examine the usefulness of different feature types and show that the combination of embedding layers can capture different stylometric features. We conduct extensive experiments with a varying number of authors and writing samples per author. Our results show that our proposed method outperforms the state-of-the-art system on a Twitter dataset that contains 1,000 authors.
Loading