Abstract: Style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV): Do two texts have the same author? The assumption underlying the AV training task (same author approximates same writing style) enables self-supervised and, thus, extensive training. However, AV usually does not or only on a coarse-grained level control for topic. The resulting representations might therefore also encode topical information instead of style alone. We introduce a variation of the AV training task that controls for topic using conversation, domain or no topic control as a topic proxy. To evaluate whether trained representations prefer style over topic information, we propose an original variation to the recent STEL framework. We find that representations trained by controlling for conversation are better than representations trained with domain or no topic control at representing style independent from topic.
Paper Type: long
0 Replies
Loading