{
  "title": "Research Briefing: Social Media User Simulator",
  "turns": [
    {
      "turn": 1,
      "user": "From now on, you're my AI research topic guide assistant. Keep in mind that you've published many papers at WWW, WSDM, KDD, SIGIR, CIKM, ACL, EMNLP, NAACL and other NLP and data mining venues. This material is the research-topic deck I showed my professor at our meeting this week. Before we discuss this topic together, give me a briefing first.",
      "assistant": "Yes, I reviewed your slides. I think it's a very interesting topic that aligns well with our lab's research direction. Before we dive into the discussion, let me give a quick briefing on what I understood.\n\n* * *\n\nResearch Topic Briefing: SNS User Simulator and Evaluation\n-----------------------------\n\nThe core of this work is to develop a next-generation SNS user simulator that can precisely measure **\"How similar is the behavior to real users?\"**\n\n### Motivation and Problem Statement\n\nLLM-based user simulators are becoming important in areas like personalization, recommendation, and search, but they face two major limitations.\n\n1.  **Simplistic simulation:** They fail to capture the complex, nuanced behavior patterns of real users and tend to rely on ad-hoc personas.\n2.  **Lack of reliable evaluation:** There is a shortage of robust evaluation methodology to verify how well a simulator imitates real users.\n\nTo address this, the study proposes a **user-centric high-fidelity (high-fidelity) evaluation framework** that directly involves real users in the evaluation process to precisely assess the simulator's credibility.\n\n* * *\n\n### Key Contributions (Contribution)\n\nThis work aims for three main contributions.\n\n1.  **Building a multi-action user history benchmark:** Beyond simple watch logs, construct a high-quality benchmark that includes a user's explicit goal (objective) and sequential interactions such as 'Like', 'Subscribe', and 'Comment'.\n2.  **Proposing a holistic LLM-based user simulator:** Beyond simple persona injection, propose a simulator that, based on real user history and session goals, simultaneously performs 'video selection' and 'multi-action' generation.\n3.  **Introducing a user-centric high-fidelity evaluation protocol:** The actual user who was simulated directly evaluates the simulation outputs. This enables precise measurement of the simulator's **'Personalization Fidelity'**.\n\n* * *\n\n### Methodology\n\nThe central task is to have the LLM imitate a real user's video and action sequences.\n\n*   **입력 (Input):** User profile (age, interests, etc.), viewing goal, past video viewing sequence, and actions for each video (Like, Comment, etc.).\n*   **출력 (Output):** The LLM generates a new video sequence and decides actions for each video based on the given information.\n\n#### Evaluation Protocol\n\nEvaluation proceeds in two stages.\n\n1.  **Real-user-based evaluation:** The real user being simulated directly rates, on a 1–5 scale, \"Would I have selected this video?\" and \"Would I have taken this action?\"\n2.  **Automated evaluation model:** Use the user ratings from step 1 to train an evaluation model. This model predicts user assessments for new generations and validates its own reliability to automate and scale evaluation.\n\n#### Dataset Construction\n\nWe plan to design our own user study (User Study) to build the dataset.\n\n*   **Profile and goal definition:** Through a survey, define the user's YouTube usage goals; collect a preference list and a free-form self-introduction to build the user profile.\n*   **Data collection demo:** The user selects a session goal, the system recommends videos, and we log all interactions during viewing (watch time, Like, Subscribe, Comment, etc.) to collect data.\n\n* * *\n\nOverall, the problem is well-defined, and the contributions and experimental design are logically structured. In particular, having the actual user directly evaluate their own simulation results—the 'Personalization Fidelity' measurement—could be the strongest differentiator.\n\nSo, shall we dive into the discussion points you raised on the last slide, one by one?"
    },
    {
      "turn": 2,
      "user": "The task I’m trying to do now is to have an LLM simulator generate YouTube video viewing sequences and see how well it performs the actions for each generated video like the actual user. But with the current level of task formulation I’ve written, my advisor says it’s not challenging and not practical. And he asked who would use this data if we collected and released it. In other words, if we publish a study that simulates and evaluates video watching like this, what can we do next? So I want to keep building the LLM simulator and providing the evaluation framework the same, but find another appropriate task. What could it be? It should be general, familiar to people, and suitable for collecting data. Please exclude recommender systems! That field is already being studied a lot.",
      "assistant": "Yes, your advisor’s feedback is very sharp and on point. To raise the impact, we shouldn’t stop at building the “tools” (the simulator and the evaluation framework) and saying “we built them well,” but rather demonstrate, “with these tools, we can tackle important, hard problems.”\n\nKeeping the core of your work (LLM simulator + user-centered evaluation) while excluding recommender systems, here are several tasks that are both more challenging and practical. The key is to shift the current “simple sequence generation” **open-loop(Open-loop)** Task into a goal-driven agent that interacts with a system to achieve objectives in a **closed-loop(Closed-loop)** Task.\n\n* * *\n\n### Alternative task proposals (excluding recommender systems)\n\n#### 1. Goal-directed simulation for evaluating Personalized Search\n\n*   **Task concept:** Simulate a user who is trying to find specific information or videos and interacts with a search engine. This concretizes the “Personalized Web Search Agent” mentioned in your slides.\n*   **How does it work?**\n    1.  Give the simulator a clear **information-seeking goal**. (e.g., “Find trustworthy review videos about the iPhone 17’s camera performance.”)\n    2.  Based on the goal, the simulator generates \\*\\*query(Query)\\*\\* terms to enter into the YouTube search box.\n    3.  Given the results (a list of videos), the simulator decides which video to “click” by looking at titles, thumbnails, etc.\n    4.  If the results aren’t satisfactory, it performs \\*\\*query reformulation(Query Reformulation)\\*\\* and searches again. (e.g., “아이폰 17 카메라 리뷰” -> “iPhone 17 pro max camera deep dive review”)\n    5.  Repeat until the goal is achieved or the user “gives up,” logging all interactions (queries, clicks, final satisfaction).\n*   **Why is this better?**\n    *   **Challenging:** Goes beyond simple sequence generation to require **query generation, result assessment, and query reformulation**—a more complex, intelligent behavior.\n    *   **Practicality and data users:** This data is highly valuable for **evaluating and improving personalized search**. Without recruiting massive real users, search teams can test engines with diverse simulated personas—a virtual testbed. Search engine developers would want this.\n    *   **What’s next?:** Leads directly to developing better personalized search models.\n\n#### 2. Simulating user adaptation to UI/UX changes\n\n*   **Task concept:** When a platform like YouTube changes its UI, simulate how user behavior patterns change and how quickly users adapt.\n*   **How does it work?**\n    1.  First, the simulator learns user behavior logs in the existing UI (UI ‘A’).\n    2.  Give the simulator a goal. (e.g., “Subscribe to the channel I’m watching now.”)\n    3.  Then present a new UI (UI ‘B’) where, for example, the “Subscribe” button’s position or shape has changed.\n    4.  Measure how much the simulator struggles (time, misclicks) and how satisfaction changes until it achieves the goal in the new UI.\n*   **Why is this better?**\n    *   **Challenging:** You must model users’ mental models of interfaces and their adaptation process to change. Incorporating a VLM(Vision-Language Model) makes it even more challenging.\n    *   **Practicality and data users:** The primary audience is **UI/UX designers and product managers**. Before shipping to real users, they can rapidly and cheaply predict the impact of a new design as an **A/B testing simulator**.\n    *   **What’s next?:** Can evolve into AI that proposes better UI designs by predicting user behavior.\n\n#### 3. Simulating opinion formation and comment interactions on video content\n\n*   **Task concept:** After watching a specific (potentially contentious) video, the simulator writes a new comment and participates in discussion, considering its persona and the context of existing comments. This deepens the “comment” action you mentioned.\n*   **How does it work?**\n    1.  Give the simulator a persona (e.g., “a technically savvy critic of the video,” “a devoted fan”) and a specific video.\n    2.  The simulator understands the video content and reads existing top comments.\n    3.  In line with its persona and the existing flow, it generates a comment that agrees, counters, or brings a new perspective.\n*   **Why is this better?**\n    *   **Challenging:** Requires high-level NLU/NLG—video understanding, grasping multi-party social context, and persona-consistent opinion generation.\n    *   **Practicality and data users:** Useful for **Trust and Safety teams and social science researchers**. Enables research on misinformation spread, advanced toxic comment detection, and opinion dynamics—socially important problems.\n    *   **What’s next?:** Can lead to AI moderators that foster healthy discourse, or models that predict opinion shifts in specific communities.\n\nAll three tasks answer “Who will use this data?” and “What comes next?” They also mirror familiar everyday behaviors—searching, using UIs, and commenting—making data collection feasible."
    },
    {
      "turn": 3,
      "user": "Find some in more diverse domains.",
      "assistant": "Sure, sounds good. Beyond the specific platform of YouTube, I'll suggest tasks across a variety of domains from a more fundamental human-computer interaction (HCI) perspective. These tasks require modeling users' complex decision-making processes, so they are much more challenging and have clear applications across industries.\n\n* * *\n\n### LLM simulator task proposals for diverse domains\n\n#### 1\\. E-commerce domain\n\n*   **Task concept: Simulating goal-directed shopping under budget constraints**\n    *   Simulate a \"shopping mission\" where a user must purchase multiple items within a limited budget. This goes far beyond single-item recommendation and is a composite task.\n*   **How does it work?**\n    1.  **Assign mission/goal:** Give the simulator a concrete mission. (e.g., \"Buy all the furniture and household items needed for a studio apartment with a 500,000 KRW budget\")\n    2.  **Interaction:** In a virtual e-commerce site, the simulator repeats the following behaviors:\n        *   **Search:** Look up needed items like \"desk,\" \"chair,\" and \"lamp.\"\n        *   **Compare & evaluate:** Compare price, brand, user reviews, and detailed specs to choose items to add to the cart.\n        *   **Budget management:** Track the cart total and remaining budget in real time. If over budget, make **trade-off (Trade-off)** decisions such as swapping an item for a cheaper one or removing a lower-priority item.\n    3.  **End:** The session ends successfully when it purchases all (or most) items required by the mission within budget.\n*   **Why is this good?**\n    *   **Challenging task:** Requires high-level cognition such as **Planning, Constraint Satisfaction, evaluating multiple alternatives, and dynamic decision-making**.\n    *   **Practicality and data usage:** E-commerce companies can use this simulator to deeply understand the **customer decision journey**. It is highly valuable for business-critical tests like pricing policy, promotion strategy, and improving search/filter features.\n    *   **\"What's next?\":** Leads directly to developing an AI shopping assistant that composes the optimal shopping list given a user's budget and preferences.\n\n#### 2\\. Gaming domain\n\n*   **Task concept: Simulating new players' game learning process and measuring tutorial effectiveness**\n    *   The simulator plays the role of a complete novice, going through the game's tutorial or early stages to measure how well it learns core systems.\n*   **How does it work?**\n    1.  **Assign learning goals:** Give the simulator a target system to learn. (e.g., \"In the combat system, master 'parry' and 'counter' techniques\")\n    2.  **Interaction:** In a virtual game environment, it behaves as follows:\n        *   Perceives tutorial instructions (text, icons, etc.) and attempts the corresponding controls.\n        *   If instructions are unclear or difficult, it makes mistakes or enters a state of **confusion**.\n        *   After multiple attempts, it shows a pattern of success; we measure how quickly and accurately it ultimately \"masters\" the skill.\n    3.  **End:** When the success rate for the given skill exceeds a threshold, learning is considered complete and the session ends.\n*   **Why is this good?**\n    *   **Challenging task:** Connects textual/visual instructions to actual actions and models the **Trial-and-Error learning process** itself.\n    *   **Practicality and data usage:** Game studios invest heavily in improving tutorials to reduce **player churn**. This simulator lets them rapidly and iteratively test and improve tutorials without continuously recruiting real users.\n    *   **\"What's next?\":** Contributes to developing **adaptive tutorial systems** that dynamically adjust difficulty or provide personalized help based on the player's learning speed.\n\n#### 3\\. Education (E-learning) domain\n\n*   **Task concept: Simulating problem solving and knowledge acquisition in an online learning environment**\n    *   The simulator acts as a student who watches online lectures and solves practice problems to master a specific concept.\n*   **How does it work?**\n    1.  **Assign learning goals:** Give the simulator a learning objective. (e.g., \"Understand the quadratic formula and solve problems\")\n    2.  **Interaction:** On a virtual learning platform (LMS), it behaves as follows:\n        *   \"Watches\" concept-explainer videos.\n        *   While solving practice problems, it submits **incorrect answers**.\n        *   Infers the cause of the mistake and makes strategic decisions such as **rewatching specific video segments, requesting hints, or tackling easier prerequisite problems first**.\n    3.  **End:** Learning ends when the accuracy on the practice set reaches the target level.\n*   **Why is this good?**\n    *   **Challenging task:** Models **Metacognition**—assessing what one knows and doesn't know—and choosing optimal actions to fill knowledge gaps.\n    *   **Practicality and data usage:** Edtech companies can use it to validate the effectiveness of learning content and systems, answering questions like \"Where do students typically get stuck?\" and \"Which hints are most effective?\"\n    *   **\"What's next?\":** Plays a decisive role in evaluating and advancing \\*\\*Intelligent Tutoring System (Intelligent Tutoring System)\\*\\* performance, which identifies each student's weaknesses and recommends optimal learning paths."
    },
    {
      "turn": 4,
      "user": "What I'm thinking of is a User Simulator. Anything other than those domains? I mean, focus on web/online/SNS.",
      "assistant": "Got it. You want to focus more on the core domains for this research—**web, online, and SNS**—rather than e-commerce or gaming. Sounds good. I'll propose new tasks, different from the earlier ones, that let you dig deep into these domains while meeting the 'challenge' and 'practicality' your professor pointed out.\n\nThe key is still to go beyond simple behavior imitation and simulate the user's \\*\\*'goals' and 'cognitive process'\\*\\*.\n\n* * *\n\n### Advanced Task Proposals for Web/Online/SNS Domains\n\n#### 1\\. Information credibility evaluation and belief formation simulation (directly tied to misinformation research)\n\n*   **Task concept:** The simulator encounters a social media feed where **true, false, and biased information are mixed**, and we track how it forms and updates its opinions (beliefs).\n*   **How does it work?**\n    1.  **Topic and initial belief setup:** Give the simulator a neutral initial belief on a contentious topic (e.g., \"Is AI-generated art actually art?\").\n    2.  **Interaction (feed consumption):** It scrolls through a simulated SNS feed (news articles, expert columns, ordinary users' tweets, etc.) and 'reads' each post.\n    3.  **Cognitive behavior modeling:** For each post, the simulator makes judgments such as:\n        *   **Credibility assessment:** \"Is this source trustworthy?\" (1–5)\n        *   **Stance detection:** \"Is this post positive or negative about the topic?\"\n        *   **Action decision:** \"Do I 'like' this, 'share' it, or 'ignore' it?\"\n    4.  **Belief update:** After each interaction, it updates its internal 'belief' state based on the post's credibility and stance.\n*   **Why is this good?**\n    *   **Challenging task:** Models complex cognition like source evaluation, bias detection, and gradual belief formation. It directly addresses how users are influenced by 'fake news' or 'propaganda'.\n    *   **Practicality and data use cases:** Becomes a top-tier dataset for **Trust & Safety teams at platforms, media/journalism researchers, and fact-checking organizations**. It's a 'virtual laboratory' to test which information types most affect user beliefs and which interventions (e.g., stronger source labeling) are effective.\n    *   **\"What's next?\":** Predict the impact of platform UI/UX interventions to curb misinformation, or develop AI that delivers personalized media literacy (Media Literacy) education.\n\n#### 2\\. Goal-driven content curation and creation simulation\n\n*   **Task concept:** The simulator takes a clear goal and collects, processes, and reorganizes scattered web/SNS content to produce a finished artifact (e.g., a blog post or a YouTube playlist), simulating a **creative workflow**.\n*   **How does it work?**\n    1.  **Creation goal:** Give a concrete goal. (e.g., \"Become a travel YouTuber and make a '3-day, 2-night Busan trip itinerary' playlist\" or \"Be an IT blogger and write a roundup post on 'AI technologies to watch at CES 2025'\")\n    2.  **Interaction (creation process):** On a virtual platform, the simulator performs steps like:\n        *   **Gathering materials:** Acquire relevant videos, images, and text via web/YouTube search.\n        *   **Selection and processing:** Pick the most suitable items for the goal and refine them.\n        *   **Structuring and publishing:** Decide playlist order and titles, or draft a blog post structure (intro-body-conclusion) and write the text to 'publish' the final output.\n*   **Why is this good?**\n    *   **Challenging task:** Goes beyond clicks to model the high-level, structured activity of **creation**. Requires multi-step planning and information reorganization, with complex final outputs.\n    *   **Practicality and data use cases:** Provides crucial data for **content platform product managers (e.g., Medium, Velog, Pinterest)** to understand creation workflows and build better editing tools or assistive features. Helps analyze \"At which steps do users get stuck most when creating content?\"\n    *   **\"What's next?\":** Directly leads to AI-powered creation assistants that understand user goals, recommend materials, or help draft content.\n\n#### 3\\. Dynamic privacy settings and trade-off simulation\n\n*   **Task concept:** As the simulator uses SNS/apps, it faces various privacy-setting situations and decides how to trade off between **'convenience' and 'security'**.\n*   **How does it work?**\n    1.  **Assign a privacy persona:** Give it a disposition (e.g., \"open sharer who shares everything\", \"highly sensitive security-first\", \"pragmatist who shares only when needed\").\n    2.  **Interaction (decision scenarios):** Present a sequence of choices, such as:\n        *   \"A new app requests access to your location. Allow? (If allowed, the 'nearby restaurants' feature works)\"\n        *   \"You're about to post a photo. Who can see it? (Public / Friends only / Only me)\"\n        *   \"A friend tagged you in an unwanted photo. What do you do? (Remove tag / Ask friend to delete / Ignore)\"\n    3.  **Log and evaluate decisions:** Record all choices and analyze per persona what decisions are made.\n*   **Why is this good?**\n    *   **Challenging task:** Models latent psychological processes like a user's \\*\\*risk assessment and utility calculation\\*\\*. It addresses behavior that weighs invisible values (privacy) against immediate benefits (feature use).\n    *   **Practicality and data use cases:** Very important for **platform UI/UX designers, privacy officers, and policy makers**. When a new privacy policy or consent UI ships, you can pre-test how well users understand it and whether they act as intended.\n    *   **\"What's next?\":** Build toward a 'personalized privacy guardian' AI agent that identifies a user's privacy disposition to suggest tailored defaults or warn about apps requesting risky permissions."
    }
  ]
}