{
  "title": "Paper-Reading Assistant Chat Log",
  "turns": [
    {
      "turn": 1,
      "user": "From now on, you're my paper-reading assistant. I'm planning to submit a paper to the CHI conference. So please read the following paper and explain it to me in detail in an easy-to-understand way. Remember that I'm a beginner in this field!",
      "assistant": "Hello! That's awesome that you're aiming to submit a paper to CHI. The paper you picked, \"Do What I Mean, Not What I Say!\", is an excellent study on how to design systems that understand user intent and context, and it's a great fit for the character of CHI.\n\nSince you're a beginner, I'll walk you through the core ideas and value of this paper step by step in an easy-to-understand way.\n\n### One-line summary of the paper\n\n**\"A story about how to build a smart natural language system that provides data visualizations by grasping what users truly want (intent and context), not just what they say verbatim, and how lessons from real experiments were applied to develop Tableau's 'Ask Data' feature\"**\n\n* * *\n\n### 1\\. Why is this research important? (Background)\n\nWhen we talk to AI assistants like Siri or Google Assistant, we expect to converse naturally, as if speaking to a person, and have them take care of everything. But in reality, the system often interprets our words literally, which can be frustrating.\n\nIt's the same when analyzing data. If analysts say \"show me sales,\" and then \"break this down by region,\" they expect the system to understand that \"this\" refers to the \"sales\" they just saw. But building systems that understand such \"intent\" and \"context\" is very hard.\n\nThis paper tackles the important problem of\n\n**'the gap between users’ expectations and what the system can actually do.'**\n\n### 2\\. What problem was this paper trying to solve? (Research question)\n\nBefore developing Tableau’s natural language-based data analysis feature called 'Ask Data', the research team wanted to answer a fundamental question:\n\n> \"To build an analytical conversational system that satisfies users, to what extent must the system understand \\*\\*'intent'\\*\\* and \\*\\*'context'\\*\\*?\"\n\nThere are two key terms here.\n\n*   **Intent:** The analytical goal a user truly wants to achieve with a given query. For example, when they ask, \"What’s the best-selling product?\", their intent often includes not just seeing a list but seeing it \"sorted in descending order.\"\n*   **Context:** The flow of the conversation. The ability to remember prior questions or the current visualization state and interpret the next query accordingly. For example, when the user says, \"Split that by gender,\" the system should know \"that\" refers to the previous visualization.\n\n* * *\n\n### 3\\. How did they study it? A clever 'Wizard of Oz' experiment\n\nTo test a smart AI system that didn’t exist yet, the team used a classic yet highly effective method called the 'Wizard of Oz' experiment.\n\n*   **What is a 'Wizard of Oz' experiment?** Participants think they’re interacting with a computer system, but in fact a \"person behind the curtain (the wizard)\" pretends to be the system, handles the user’s requests, and shows the results. This lets researchers observe users’ expectations and reactions to a future system before actually building it.\n*   **Study design:**\n    *   **Participants:** 41 people with data analysis experience\n    *   **Task:** Given the Titanic survivor dataset, participants freely analyze \"which kinds of passengers were more likely to survive.\"\n    *   **The 'wizard':** An expert Tableau user watched participants’ natural language queries and manually created and displayed the visualizations\n    *   **Four experimental conditions:** They varied how well the system understood 'intent' and 'context' for comparison.\n        1.  **B (Baseline):** The most basic condition. No intent or context; it creates a basic chart solely from the attributes mentioned in the query (not smart).\n        2.  **I (Intent):** Doesn’t remember context, but tries to infer the user’s hidden intent (e.g., sees the word \"top\" and applies sorting).\n        3.  **C (Context):** Doesn’t infer intent, but remembers context such as filters from previous queries.\n        4.  **CI (Context & Intent):** The smartest condition, understanding both intent and context (the wizard behaves as intelligently as possible).\n\n* * *\n\n### 4\\. What did they find? (Key results)\n\n#### A. The 'Conversational Transitions Model'\n\nDuring the analysis, the team developed a neat model describing the interaction between users and the system (Figure 4 in the paper).\n\n*   Users view and interpret a visualization, then decide what to do next.\n*   At this point, user goals fall into five categories:\n    **Elaborate**, **Adjust/Pivot**, **Start New**, **Retry**, **Undo**.\n*   Based on these goals, the system modifies four aspects of the current visualization state —\n    **Attributes**, **Transform**, **Filter**, **Encode** — to carry the conversation forward.\n\nThis model is an important contribution because it systematically organizes \"what should happen\" in conversational analysis.\n\n#### B. Unexpected but important findings\n\n*   **The worst experience was condition 'C':** One of the most interesting results was that the 'C' condition, which clumsily remembered only context, caused the most confusion and dissatisfaction. Users couldn’t predict which pieces of context the system would remember or drop.\n*   **Condition 'B' was actually better:** Although the simplest 'B' condition was inconvenient, its behavior was **predictable**, so users learned \"okay, it can only do so much\" and adapted by revising their queries.\n*   **Conclusion 1: Remembering context without understanding intent is harmful.** They learned a strong lesson: if you add context retention, you must pair it with intent understanding.\n*   **Conclusion 2: Users want the system to \"do the right thing\" proactively.** Users didn’t just want the raw requested data. They appreciated when the system automatically binned data, computed percentages, and chose the most appropriate chart type —\n    **'proactive'** behavior was highly satisfying.\n\n* * *\n\n### 5\\. How were these findings applied to a real product?\n\nThe work didn’t stop there; it directly influenced the development of Tableau’s 'Ask Data' feature.\n\n1.  **The \"No Dead Ends\" design principle:** Even if the system can’t fully understand a query, prioritize showing a reasonable fallback via inference or providing an interface that lets users easily correct it, instead of throwing errors.\n2.  **Introduce context management gradually:** Learning from the failure of condition 'C', instead of adding full-blown context management from the start, they had users manage context explicitly via clicks to edit prior queries or add new elements. More sophisticated linguistic context understanding was added incrementally.\n3.  **Strengthen smart inferencing:** If a user says \"highest sales,\" sort descending; if they ask about \"correlation,\" show a scatterplot, etc. They developed various heuristics to infer hidden intent.\n\n* * *\n\n### Takeaways for preparing your CHI paper\n\nThis paper has many of the qualities CHI appreciates.\n\n1.  **A clear, important user problem:** It starts from a human-centered issue—the gap between user expectations and reality—rather than focusing on technology first.\n2.  **A creative method:** Using a Wizard of Oz study to explore an AI system that’s hard to implement upfront is a very clever approach.\n3.  **Solid theoretical contribution:** They present a framework, like the Conversational Transitions Model, that explains phenomena and guides future system design.\n4.  **Practical, concrete design guidelines:** Specific lessons developers can apply right away, such as \"shallow context understanding is harmful,\" \"predictability matters,\" and \"be proactive.\"\n5.  **Connection between research and product:** By showing how the findings fed into a real commercial product, they demonstrated the value and impact of the research.\n\nIf you read this paper closely, you’ll gain great insight into what makes good HCI research, how to design a study, present results, and demonstrate impact. I hope this helps you prepare your paper!"
    },
    {
      "turn": 2,
      "user": "Please explain in detail the paper’s motivation, the method, and how the experiment is set up.",
      "assistant": "Sure, I’ll explain in detail the paper’s Motivation, Method, and the specific Experimental Setup.\n\n### 1. Motivation\n\nThis study began as an attempt to bridge the gap between users’ high expectations and the real technical limits of natural language–based analytic systems.\n\nThe authors observed that people tend to overestimate a system’s intelligence, so when it fails to meet those expectations, the experience feels disappointing.\n\nBefore investing in full-scale development, they wanted to answer fundamental questions:\n\n* To deliver a satisfying user experience, what level of “intelligence” does a conversational analysis system actually need?\n* How important is it that the system understand the user’s analytic intent and the conversation’s context?\n* What kinds of analytic intents do people have, and how do they express them in language?\n* Is a minimal system that simply makes a basic chart from the fields the user names (ignoring context and intent) good enough, or does that cause serious problems?\n\nUltimately, the goal was to define the minimum viable product and collect qualitative data to plan improvements before committing to expensive engineering.\n\n### 2. Method\n\nTo probe expectations for an intelligent system that did not yet exist, the team used a Wizard of Oz study.\n\n* **Core idea:** Participants believe they’re interacting with an intelligent system, but behind the scenes a human “wizard” interprets the natural language input and manually constructs the visualization the system appears to produce.\n* **Research goal:** Use this method to gather qualitative data about users’ expectations regarding intent and context.\n* **Experimental design:** To avoid learning and fatigue effects across conditions, they used a between-subjects design, where each participant experienced only one condition.\n* **Data analysis:** Collected data (participants’ utterances, actions, and the generated visualizations) were analyzed using a grounded theory approach with qualitative, multi-stage open coding, focusing on intent, context, and their relationship in user–system behavior.\n\n### 3. Experimental Setup\n\nThe experiment was structured as follows.\n\n* **Participants:** 41 volunteers with data analysis experience from various industries, randomly assigned to one of four conditions.\n* **Task and data:**\n  * **Task:** Explore the data freely to answer the open-ended question: “Which types of passengers were more likely to survive the Titanic?”\n  * **Data:** A Titanic dataset with 1,309 records and 10 fields. Participants knew about the Titanic disaster but were unfamiliar with this dataset.\n* **Environment and tools:**\n  * The participant-facing interface was a custom Tableau build showing only a text input box and the result visualization.\n  * The wizard used the full Tableau Desktop interface to construct visualizations.\n  * While the wizard worked, the participant’s screen displayed “Processing, please wait...” Typical response time was 30–60 seconds.\n* **Four conditions (intent × context):**\n  1. **B (Baseline):** Intent ×, Context ×. Ignores hidden intent and prior context; produces a basic chart from only the fields explicitly mentioned.\n  2. **I (Intent):** Intent ○, Context ×. Does not remember prior turns, but infers the user’s analytic intent to produce a smarter visualization (e.g., mentions of “correlation” yield a scatterplot).\n  3. **C (Context):** Intent ×, Context ○. Does not infer hidden intent, but preserves prior visualization context (e.g., filters, dimensions) using predefined rules.\n  4. **CI (Context & Intent):** Intent ○, Context ○. The most capable condition, where the wizard considers both intent and context to make optimal choices.\n* **Procedure:**\n  * Each session lasted about 25 minutes.\n  * After the participant submitted a text query, while the wizard was preparing the visualization, the facilitator asked what the participant expected the system to do, to capture expectations.\n  * Once the visualization appeared, the team collected feedback on the result."
    },
    {
      "turn": 3,
      "user": "I'm trying to analyze CHI conference papers. Please closely analyze and explain the Formative Study and the Evaluation of the paper I gave you.",
      "assistant": "Sure. As requested, I'll provide a close analysis of the paper's Formative Study and Evaluation. This paper gives substantial weight to the Formative Study and describes the Evaluation in a way that showcases the study's “impact,” following a classic CHI paper structure—it's an excellent example to analyze.\n\n### Formative Study (Formative Research) Analysis\n\nIn this paper, the **Formative Study is the core and the heart** of the work. It is a study conducted **pre-design** to answer “what kind of system should we build?”\n\n#### 1. Purpose: Reducing uncertainty\n\nBefore full-scale development, the team's primary goal was to derive system requirements. In particular, they wanted qualitative data on the level to which the system must understand a user's **intent** and **context** to deliver satisfaction. This was to reduce uncertainty in an expensive development process and define the scope of the Minimum Viable Product (MVP).\n\n#### 2. Method: 'Wizard of Oz (Wizard of Oz)'\n\n*   **Methodology:** They designed a Wizard of Oz study to simulate an intelligent system that did not yet exist. Participants believed they were interacting with a system, but a skilled researcher (“the wizard”) read their natural-language inputs and manually created visualizations.\n*   **Experimental conditions:** To systematically compare the importance of “intent” and “context,” they set up four conditions along these two axes:\n    *   **B (Baseline):** A basic system that understands neither intent nor context.\n    *   **I (Intent):** Understands intent but does not maintain context.\n    *   **C (Context):** Maintains context under predefined rules but does not understand intent.\n    *   **CI (Context & Intent):** The most intelligent system that understands both (relies on the wizard’s judgment).\n*   **Data collection:** After participants entered a query, while the wizard was building a visualization (30–60 seconds), the moderator asked “what do you expect to see?” to gather rich qualitative data on participants’ **expectations (alouds)**.\n\n#### 3. Findings and significance: Setting the design direction\n\n*   They derived a theoretical framework called the **Conversational Transitions Model (Conversational Transitions Model)**.\n*   The most critical finding was that **“rough context maintenance without understanding intent (Condition C) produces the worst experience due to unpredictability.”** By contrast, the predictable Baseline (B) fared better.\n*   The Formative Study directly informed the subsequent system's **design principles, requirements, evaluation criteria, and implementation priorities**.\n\n* * *\n\n### Evaluation (Evaluation) Analysis\n\nThis paper does not present a traditional “final system evaluation.” That is, it does not run a summative study of the finished Ask Data product measuring task success rates or times.\n\nInstead, the Evaluation is a **retrospective demonstration of how the Formative Study influenced the product’s design and development**—an effective strategy to demonstrate the value of the formative work.\n\n#### 1. Nature of the evaluation: Retrospective analysis of “impact”\n\n*   The paper explicitly takes a **retrospective look** at how Ask Data was designed to answer the research questions.\n*   It focuses on **reflecting** on how the Formative Study influenced the actual system.\n*   This approach emphasizes that the results did not remain in the lab but added value to a real product.\n\n#### 2. Evaluation content: Concrete applications of the Formative Study\n\n*   **Establishing design principles:**\n    *   From the observation that users need to maintain analytic flow came the core principle **“no dead ends.”**\n    *   The finding that unnatural responses shift users’ attention from data to the system itself led to the principle **“the interface disappears.”**\n    *   These guidelines later served as **evaluation criteria** in subsequent usability work.\n*   **Implementing key capabilities:**\n    *   To address underspecified queries, they developed **inference logic and heuristics** that fill in missing details.\n    *   Because context maintenance proved complex and risky, they first introduced **UI-based explicit refinement** (letting users revise prior queries or add elements via the UI) and only later added more sophisticated linguistic context understanding.\n*   **Adopting proactive behaviors:**\n    *   Since users responded positively to proactive help, the product included features that **anticipate user needs**, such as surfacing filter controls unprompted or precomputing useful ratios (%).\n\nIn sum, this paper’s Evaluation functions as evidence of the **value of the Formative Study itself**. By showing how the insights translated into concrete design principles and shipped features, it demonstrates the work’s practical contribution and impact."
    }
  ]
}