Revealing the Inevitable Infection of Semantic Similarities in Understanding Emotional Dialogues in Foundation Models
Keywords: Foundational Models, Semantic Similarity, Artificial General Intelligence, Subpopulation Attacks
Abstract: Semantic textual similarity is deeply rooted in natural language studies, where the focus lies on conveying meaning rather than syntactic structure. Foundation models (FMs), renowned for their adeptness at capturing semantic nuances, are anticipated to discern the underlying meaning of inputs, including the nuanced understanding of emotions conveyed within dialogues. What if FMs are fine-tuned with predetermined responses for a specific emotion in emotional conversations? Will the semantic similarity of neighboring emotions impact the model's performance? To this end, using the emotional conversations with FMs as a testbed, we apply the framework of subpopulation data poisoning attacks, modifying the training data to create predetermined toxic responses. This enables us to assess whether FMs would still be influenced by semantic similarities in emotional inputs, leading to toxic responses that rely on semantic cues rather than effectively learning the characteristics from the selected emotion in the training data. Our experiments suggest that there appears to be a notable influence of semantic similarities in FMs, where toxic responses are triggered not only by predetermined emotion categories but also by their semantically similar ones. These nuanced behaviors underscore the intricate nature of semantic understanding in FMs and highlight the impact of semantic similarities, even in a predefined setting aimed at altering model outputs intentionally. Based on these findings, we further discuss the challenges impeding FMs from achieving artificial general intelligence (AGI), emphasizing the difficulty of achieving a fine-grained understanding of the nuanced meanings.
Submission Number: 35
Loading