Improving event representation learning via generating and utilizing synthetic data

Yubo Feng

Published: 30 Jun 2025, Last Modified: 20 Feb 2025Information Processing & ManagementEveryoneCC BY 4.0

Abstract: Representations of events are important in various event-related tasks. Recent advances in event representation learning have focused on Contrastive Learning (CL) resulting in remarkable progress. However, solely using dropout as the data augmentation technique in CL methods may cause the model to become sensitive to length differences between event pairs. Moreover, CL methods ignore the evidence that the similarities between positive pairs are different, and the encoder-aware similarities also change dynamically as training progresses. It may cause the event encoder to learn the alignment of positive pairs at a coarse-grained level. In this paper, we propose LLM-CL: a Large Language Models-driven self-adaptive Contrastive Learning framework for event representation learning. Specifically, we present an event knowledge graph-augmented synthetic data generation method designed to alleviate the sensitivity of CL-based models to length differences between event pairs. This method generates large-scale, high-quality event pairs with equivalent semantics, little lexical overlap, and varying text lengths. Additionally, we propose a novel CL method called self-adaptive contrastive learning to help the event encoder effectively and efficiently learn the alignment of synthetic data at fine-grained levels. This method dynamically estimates encoder-aware similarities and scales the CL losses accordingly. Experimental results show that LLM-CL outperforms strong baselines in both intrinsic and extrinsic evaluations.