Abstract: With the rise of web-based social networking, a great many short texts/micro-messages are exchanged daily. Although short texts/micro-messages are a powerful and efficient way to communicate among individuals, their anonymity and short-length attributes give rise to a real challenge for Author Identifi-cation studies. In this paper, we tackle the Author Identification of short texts problem via Convolutional Neural Networks (CNNs). Specifically, we present a novel Multi-Channel CNN architecture that processes different features of text via word, character, and parts of speech (POS) embeddings. We examine the usefulness of different feature types and show that the combination of embeddings can capture different stylometric features. In addition, we add an identity mapping block in the convolutional layer to preserve the maximum amount of information from features. Extensive experiments with a variety number of authors and writing samples per author were conducted using our proposed architecture. Based on the experiments, our proposed method outperforms the state-of-the-art system on a large Twitter dataset.
Loading