SNS-Bench: Defining, Building, and Assessing Capabilities of Large Language Models in Social Networking Services

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Abstract: With the rapid advancement of Social Networking Services (SNS), the need for intelligent and efficient interaction within diverse platforms has become more crucial. Large Language Models (LLMs) play an important role in SNS as they possess the potential to revolutionize user experience, content generation, and communication dynamics. However, recent studies focus on isolated SNS tasks rather than a comprehensive evaluation. In this paper, we introduce SNS-Bench, specially constructed for assessing the abilities of large language models from different Social Networking Services, with a wide range of SNS-related information. SNS-Bench encompasses 8 different tasks such as note classification, query content relevance, and highlight words generation in comments. Finally, 6,658 questions of social media text, including subjective questions, single-choice, and multiple-choice questions, are concluded in SNS-Bench. Further, we evaluate the performance of over 25+ current diverse LLMs on our SNS-Bench. Models with different sizes exhibit performance variations, yet adhere to the scaling law. Moreover, we hope provide more insights to revolutionize the techniques of social network services with LLMs.
Lay Summary: Social media platforms like Twitter, Instagram, and TikTok are part of our everyday lives. As these platforms grow, it becomes important to make interactions on them smarter and more helpful. Large language models (LLMs) — the same kind of AI behind tools like ChatGPT — could greatly improve how we create content, understand posts, and connect with others online. But until now, most research has only tested these models on narrow tasks, not across the full range of real-world social media needs. We built a new test called SNS-Bench to evaluate how well these AI models perform on different types of social media tasks, like figuring out what a post is about, checking if a comment matches a search, or picking out key words. Our benchmark includes over 6,600 questions and covers eight tasks. By testing more than 25 popular models, we show where they succeed and where they fall short — helping guide better design for future social media tools.
Primary Area: Deep Learning->Large Language Models
Keywords: Benchmark, Social Network Services, Large Language Models
Submission Number: 5997
Loading