{
  "title": "CHI Paper Guide: Metaverse AI Assistant",
  "turns": [
    {
      "turn": 1,
      "user": "> A Context-...age Models - PDF\n\nFrom now on, you're my paper-reading helper. I'm planning to submit a paper to the CHI conference. So please read the following paper and explain it to me in detail in a way that's easy to understand. Remember that I'm a beginner in this field!",
      "assistant": "Hello! So you're preparing to submit a paper to CHI—what a fantastic goal. Since you mentioned you're a beginner in this area, I'll be your paper-reading helper and explain this paper clearly and in detail. As we go through it together, we'll look at what makes a strong HCI (Human-Computer Interaction) study and how the paper is structured.\n\nThe title of this paper is\n\n**\"A Context-Aware Onboarding Agent for the Metaverse Based on Large Language Models (LLMs)\"**.\n\nIn plain terms, \\*\\*\"they built an AI assistant (Agent) that can smartly understand a user's context so newcomers to the metaverse can adapt smoothly\"\\*\\*.\n\n### Overall flow of the paper (a classic CHI-style study)\n\nPapers submitted to top-tier venues like CHI usually follow a flow like this. This paper is an excellent example of that structure.\n\n1.  **Problem statement**: Why is this research needed? (Real-world pain point)\n2.  **Formative user study**: Observe users to understand the problem more deeply. (Formative Study)\n3.  **Solution design**: Design a system (solution) based on lessons from the user study. (PICAN system)\n4.  **System evaluation**: Rigorously test whether the system actually works. (Pipeline Evaluation, User Study)\n5.  **Discussion and conclusion**: Present the implications, limitations, and future directions. (Discussion, Conclusion)\n\nNow let's look at each stage in detail.\n\n* * *\n\n### Step 1: What problem are they solving? (Introduction)\n\n*   **The problem**: Metaverses (virtual worlds like Roblox, Zepeto) allow free-form exploration without a fixed sequence. That's great, but first-time visitors often get lost and don't know what to do.\n*   **Limitations of existing approaches**: Traditional onboarding (e.g., linear tutorials, NPCs that only point to specific spots) doesn't reflect personal interests or the user's current situation, so it's less effective.\n*   **Research goal**: This study aims to build an AI NPC that understands the user's \\*\\*context\\*\\* and provides personalized assistance.\n\nBy \"context,\" they mean two main things:\n\n1.  **Short-term spatiotemporal context**: The user's current location, recent conversation history, and most recent actions—what's happening right now.\n2.  **Long-term exploration context**: The user's past activity history across the metaverse—what they've already tried.\n\n* * *\n\n### Step 2: Understand users first! (Formative Study)\n\nGood HCI research doesn't start with building; it starts with deeply understanding users. This paper uses the \\*\\*Wizard-of-Oz\\*\\* method.\n\n*   **What is a Wizard-of-Oz study?**\n    *   A human pretends to be the AI from behind the scenes (the \"Wizard's Room\") and answers user questions.\n    *   Participants (16 people) believe they're talking to a real AI while they explore a metaverse (Virtuoville).\n    *   This reveals \"If a perfect AI helper existed, what would users ask and what help would they actually need?\"\n*   **Findings (What did they learn?)**\n    1.  **Importance of short-term context**: Users often asked with pronouns like \"here\" and \"this\"—e.g., \"Where am I?\" \"How do I use this?\" (24.26% of all questions). So the AI must understand spatial context (current location), conversational context (previous turns), and action context (recent actions).\n    2.  **Importance of long-term context**: To answer recommendation-type questions like \"What should I do now?\", the AI should avoid suggesting things already done and propose new activities. That requires knowing the user's past activity history (long-term exploration context).\n    3.  **Need for proactive conversation**: When users seem lost, the AI should proactively reach out and offer help.\n\n* * *\n\n### Step 3: Design and build the solution (PICAN system)\n\nBased on insights from the user study, they designed and built a system called **PICAN**. PICAN is an LLM-based system (using GPT-4) with a two-stage pipeline.\n\n_figure 3: The overall architecture of the PICAN system. When a user's question comes in, it generates a response using short- and long-term context._\n\n*   **Stage 1: Reference Resolution — leveraging short-term context**\n    *   When a user asks \"What's this?\", PICAN first checks the user's current location, nearby objects, and recent action logs.\n    *   From that, it infers that \"this\" refers to, say, an \"apple tree,\" and rewrites the question more explicitly as \"What is this apple tree?\"\n    *   This makes it possible for the LLM to generate an accurate answer.\n*   **Stage 2: Suggestion Generation & Response Rewording — leveraging long-term context**\n    *   **For information-seeking questions**: It augments the base answer from Stage 1 (e.g., \"This is an apple tree\") with the user's past experience.\n        > Example: \"This is\n        > \n        > **the very apple tree you planted earlier**!\"\n    *   **For recommendation questions** (\"What should I do next?\"): It prioritizes activities the user hasn't tried yet and that are near the current location.\n\n* * *\n\n### Step 4: Evaluate the system (Pipeline Evaluation & User Study)\n\nCHI papers don't stop at building; they rigorously verify effectiveness. This study conducts two evaluations.\n\n#### Evaluation 1: Pipeline evaluation (Ablation Study)\n\n*   **What did they do?** To test whether each component (context module) truly matters, they compared the full system to versions with certain functions removed. They generated answers under five conditions and had 20 judges rank them for how \"useful\" and \"immersive\" they were.\n    1.  **PICAN (all features included)**\n    2.  No long-term context\n    3.  No short-term context\n    4.  No context at all\n    5.  Traditional model (Non-LLM)\n*   **Results**:\n    *   Answers that used **short-term context** (PICAN, and the version without long-term context) were rated much more useful and immersive than those that did not.\n    *   **Understanding short-term context (i.e., resolving 'here' and 'this') had the most decisive impact on user experience.**\n\n#### Evaluation 2: User study (User Study)\n\n*   **What did they do?** They deployed the finished PICAN system to 21 real users, who freely explored the metaverse using it. They then conducted in-depth interviews and performed qualitative analysis of satisfaction and pain points.\n*   **Results**:\n    *   **(What worked well)** Users loved being able to ask comfortably without knowing exact place names, using phrases like \"here\" or \"this building.\" When the AI remembered past actions and used them in answers (e.g., \"That's the farm where you took care of the coffee trees before!\"), it greatly helped users understand locations.\n    *   **(What could be improved)** When the AI gave incorrect information, that error sometimes carried over into subsequent turns—an \"error cascade.\" It also occasionally mentioned past actions irrelevant to the user's intent or over-explained, disrupting the flow of conversation.\n\n* * *\n\n### Step 5: Discussion & Conclusion\n\nWhat did we learn, and what remains for future work?\n\n*   **Contributions**:\n    *   The study empirically shows how crucial **context understanding** is when building AI assistants for complex environments like the metaverse.\n    *   In particular, \"short-term spatiotemporal context\" is essential for natural interaction, while \"long-term exploration context\" strengthens learning and personalization.\n*   **Limitations and future work**:\n    *   Good papers clearly state their limitations. This study was conducted in a single metaverse, targeted specific countries/age groups, and suffered from slow response latency.\n    *   Future work should explore detecting and recovering from LLM hallucinations, and designing proactive assistance that is smarter yet non-intrusive.\n\n### Summary: Lessons for writing a CHI paper\n\nYou can learn a solid CHI-style research process from this paper.\n\n1.  **Identify a real problem**: Start from a clear issue—\"Newcomers struggle in the metaverse.\"\n2.  **User-centered research**: Before building, observe users to find the true needs and problems.\n3.  **Systematic design**: Use study insights to justify each component of the system (PICAN) with clear logic.\n4.  **Rigorous evaluation**: Prove that the system works—and why—with both quantitative (pipeline) and qualitative (user study) evaluations.\n5.  **Balanced discussion**: Be honest about unsatisfactory cases and limitations, and propose future directions based on them.\n\nIf you take cues from this structure and depth, it will definitely help you write a strong paper. If you have more questions, feel free to ask anytime!"
    },
    {
      "turn": 2,
      "user": "Please explain in detail the paper’s motivation, the method for it, and how the experiments for it are structured.",
      "assistant": "Sure—I'll explain in detail what motivation the paper starts from, what method it uses to address it, and how it proves effectiveness through experiments.\n\n### 1\\. Research motivation (Motivation)\n\nThe starting point of this study is: **\"How can we help first-time users in the metaverse overcome the difficulties they face?\"**\n\n*   **Core problem**: Metaverse platforms are highly open-ended with no fixed goals or prescribed order. Because of this, first-time visitors often don't know what to do or where to go, and it's easy to get lost. This is a major reason why users lose interest and leave.\n*   **Limitations of existing approaches**:\n    *   **Fixed tutorials**: They present the same steps to everyone in order, so they cannot reflect individual questions or interests.\n    *   **Simple wayfinding NPCs**: They can only guide you when you already know your destination, so they are useless for novices who don't even have a destination.\n*   **Research direction**: Therefore, this work moves beyond one-size-fits-all approaches and aims to build an AI NPC that can understand the user's \\*\\*personal 'context (Context)'\\*\\* in real time and provide tailored help. In short, the key motivation is to grasp the user's diverse situations and provide intelligent responses that fit them.\n\n* * *\n\n### 2\\. Research method (Method)\n\nTo address the motivation above, the study proceeds in two major steps: **(1) a user study to understand the problem** and **(2) development of the solution system**.\n\n#### (1) Methodology 1: A 'Wizard-of-Oz' study to understand the problem (Formative Study)\n\nBefore full-scale system development, a Wizard-of-Oz study was conducted to ask, \"If we had an ideal AI assistant, how would users behave?\"\n\n*   **Procedure**: A researcher pretended to be the AI behind the scenes and directly answered user questions. Participants (16) explored the metaverse (Virtuoville) believing they were conversing with a real AI. The team observed this process and collected data on what kinds of help users needed.\n*   **Key findings**: This study allowed the authors to concretely define the essential capabilities the AI assistant should have—that is, the kinds of 'context' it needs.\n    *   **Short-term spatiotemporal context**: Understanding the user's current location, recent conversation, and just-performed actions.\n    *   **Long-term exploration context**: Remembering and leveraging what the user has done in the past.\n\n#### (2) Methodology 2: Building the solution system 'PICAN'\n\nBased on these findings,\n\n**PICAN**, an LLM-based AI pipeline, was developed.\n\n*   **Core architecture**: PICAN operates in two stages.\n    1.  **Stage 1: Reference resolution (leveraging short-term context)**\n        *   When a user asks ambiguous questions like \"Where is here?\" or \"How do I use this?\", the system first analyzes the user's current location (space), recent conversation (dialogue), and recent action logs (behavior).\n        *   This allows it to infer that \"here\" is \"the coffee shop\" and \"this\" is \"the apple tree,\" rewriting the ambiguous question into a clear one.\n    2.  **Stage 2: Suggestion and response rewording (leveraging long-term context)**\n        *   It personalizes answers by consulting the user's past activity history.\n        *   **Information-seeking questions**: It enriches a basic answer like \"This is an apple tree\" with prior experiences, e.g., \"This is **the very apple tree you planted earlier**!\"\n        *   **Recommendation questions**: It prioritizes recommending activities the user hasn't tried yet, especially those near the current location.\n*   **Technology**: It is based on GPT-4 and uses a technique (RAG) that consults metaverse-related 'knowledge documents' so the AI doesn't invent inaccurate information.\n\n* * *\n\n### 3\\. Experiments (Experiment)\n\nTwo complementary experiments were designed to verify that PICAN is truly effective.\n\n#### (1) Experiment 1: Pipeline evaluation (technical performance verification)\n\n*   **Goal**: Technically verify whether each component (context modules) in PICAN actually improves answer quality.\n*   **Design (Ablation Study)**: For 40 real user questions, answers were generated under five different system conditions for comparison.\n    1.  **PICAN (모든 기능 ON)**\n    2.  **장기 컨텍스트 OFF**\n    3.  **단기 컨텍스트 OFF**\n    4.  **모든 컨텍스트 OFF**\n    5.  **LLM이 아닌 기존 모델 (Non-LLM)**\n*   **Evaluation**: Twenty raters saw each question with five randomly ordered answers and ranked which were most **useful (usefulness)** and **immersive (immersion)**.\n*   **Results**: They statistically demonstrated that **understanding short-term context (the user's immediate situation) plays the most decisive role** in increasing the usefulness and immersion of answers.\n\n#### (2) Experiment 2: User study (실사용 경험 검증)\n\n*   **Goal**: Beyond technical performance, deeply understand what experiences real users have when using PICAN.\n*   **Design**: Twenty-one new participants were given a metaverse with PICAN and asked to explore freely and converse with the AI for a set period.\n*   **Evaluation**: All interactions during the session were logged, and in-depth interviews were conducted afterward to collect qualitative data about satisfying and unsatisfying experiences.\n*   **Results**: Users responded very positively when the AI remembered their location or past actions and used that in conversation. On the other hand, concrete pros and cons emerged, such as discomfort when the AI misunderstood context and gave irrelevant answers or provided inaccurate information.\n\nIn this way, the paper follows a **\"real-world problem identification → problem concretization via user study → solution design and development → rigorous multi-stage experimental validation\"** structure that is systematic and persuasive."
    },
    {
      "turn": 3,
      "user": "I’m looking to analyze CHI papers. Please closely analyze and explain the Formative study and Evaluation of the paper I gave you.",
      "assistant": "Sure, I'll help you analyze CHI papers. As requested, I'll closely analyze the paper's \\*\\*Formative Study(formative research)\\*\\* and **Evaluation(evaluation)** sections, and explain in detail what goals each study had, how they were designed, and what they discovered.\n\nThese two parts are the core of good HCI research, and this paper showcases that exemplary structure well.\n\n* * *\n\n### Formative Study (formative research) Analysis (Section 3)\n\nThe core goal of this stage is to answer \\*\\*\"What should we build?\"\\*\\*. In other words, before building anything, you first understand users to set the system's required capabilities and design goals.\n\n#### **1\\. Research Goals and Methodology**\n\n*   **Goal**: To fundamentally understand what value an AI NPC can provide in metaverse onboarding (adaptation), and what kinds of assistance and contextual information users actually need.\n*   **Methodology**: Adopted a **'Wizard-of-Oz(Wizard-of-Oz)'** study.\n    *   **Why this choice**: To observe \"how users would interact if a perfect AI assistant existed,\" unconstrained by current AI limitations, and thereby derive ideal requirements for the AI.\n    *   **Procedure**: A researcher played the hidden “wizard,” watching the participant’s screen and audio in real time and typing responses as if they were the AI. Participants (16) believed they were conversing with a real AI while performing tasks.\n\n#### **2\\. Findings and Implications: Deriving the system’s 'design requirements'**\n\nThe observations were distilled into three core requirements the subsequent system (PICAN) must satisfy.\n\n1.  **Finding 1: Understanding short-term spatiotemporal context is essential.**\n    *   In 24.26% of questions, users used ambiguous deictic terms like “this,” “here,” or “earlier.”\n    *   This means the AI must understand \\*\\*① current location/nearby objects (spatial context), ② the immediately preceding dialogue (dialogue context), ③ the most recent action (action context)\\*\\* for smooth communication. This directly motivates PICAN’s Stage 1 (reference resolution) capability.\n2.  **Finding 2: Long-term exploration context is key to personalized responses.**\n    *   Users often asked for suggestions like “What should I do next?” Good suggestions require knowing what they’ve already done (exploration history), so you avoid repeats and propose something new.\n    *   When explaining a new place, linking it to prior experiences (e.g., “the farm where you planted seeds earlier”) helps users understand more easily. This finding underpins PICAN’s Stage 2 (answer personalization) capability.\n3.  **Finding 3: The AI sometimes needs to initiate the conversation.**\n    *   When users seem lost or stuck repeating the same actions, the AI should proactively offer help.\n    *   However, if the user is engrossed in something, interruptions can be intrusive; proactive help must be timed judiciously.\n\n> **Analysis Summary**: This formative study provides a strong rationale for \\*\\*\"Why is the system designed this way?\"\\*\\*. It exemplifies the classic HCI workflow of identifying the essence of the problem through user observation and translating it into concrete design requirements.\n\n* * *\n\n### Evaluation (evaluation) Analysis (Section 5 & 6)\n\nThe goal here is to prove \\*\\*\"Is our system (PICAN) truly effective?\"\\*\\*. The paper supports its claims through two systematically designed, complementary evaluations.\n\n#### **1\\. Evaluation 1: Pipeline Evaluation (technical performance verification)**\n\n*   **Goal**: To technically verify how much each context module in PICAN independently improves answer quality and how accurately it operates.\n*   **Methodology**: An \\*\\*'Ablation Study(Ablation Study)'\\*\\* that removes specific components to measure their contribution.\n*   **Design**: For 40 real user questions, generate answers under five conditions and compare.\n    1.  **PICAN (all features)**\n    2.  **No long-term context (remove long-term context module)**\n    3.  **No short-term context (remove short-term context module)**\n    4.  **No context (remove all context modules)**\n    5.  **Non-LLM baseline (non-LLM model)**\n*   **Measures**: 20 raters saw each question with five randomly ordered answers and ranked them by \\*\\*'usefulness(Usefulness)'\\*\\* and \\*\\*'immersion(Immersion)'\\*\\*. They also judged whether context was used correctly.\n*   **Key results**:\n    *   Conditions using **short-term context** (1 and 2) scored markedly higher on usefulness and immersion than those without it (3 and 4).\n    *   This statistically demonstrates that \\*\\*\"understanding the user's immediate situation is most critical to the quality of the experience\"\\*\\*.\n\n#### **2\\. Evaluation 2: User Study (real-use experience verification)**\n\n*   **Goal**: Beyond controlled rankings, qualitatively understand users’ overall experience interacting with PICAN.\n*   **Methodology**: 21 participants used the metaverse with PICAN for a set time, exploring freely and chatting with the AI; all interactions were logged, followed by in-depth interviews.\n*   **Analysis**: For each context type (spatial, dialogue, long-term exploration), analyze \\*\\*satisfactory cases(Satisfactory cases)\\*\\* and \\*\\*unsatisfactory cases(Unsatisfactory cases)\\*\\* where the context was used.\n*   **Key results**:\n    *   **(Satisfactory)** Users highly valued being able to ask with ambiguous expressions like “here” or “this building” without knowing exact place names, and appreciated when the AI leveraged past actions (e.g., “the mall where you bought clothes earlier”) to aid understanding.\n    *   **(Unsatisfactory)** When the AI misinterpreted context (e.g., interpreting the scope of 'here' too broadly) or produced inaccurate information, errors could cascade into subsequent turns. Over-referencing irrelevant past actions or over-explaining also disrupted conversational flow.\n\n> **Analysis Summary**: This Evaluation section demonstrates a **multi-faceted and rigorous validation of the claims**. Through the 'pipeline evaluation' it proves \\*\\*quantitatively \"our system is effective\"\\*\\*, and through the 'user study' it shows \\*\\*qualitatively \"why and how it's effective, and what its limitations are\"\\*\\* in depth. Conducting both quantitative and qualitative evaluations to maximize persuasiveness is a hallmark of CHI papers."
    }
  ]
}