TextVista: NLP-Enriched Time-Series Text Data Visualizations

Published: 13 May 2024, Last Modified: 28 May 2024GI 2024 SDEveryoneRevisionsBibTeXCC BY 4.0
Letter Of Changes: Thank you so much for your helpful feedback! We have made the following revisions based on the suggestions from the reviewers: Reviewer: I strongly suggest that the authors look at the papers below and consider citing some or all of these works. We thank the reviewers for bringing these papers to our attention. We have cited the suggested papers in the introduction and related work sections. Reviewer: Handling large datasets with TextVista could demand substantial computational power, potentially restricting its use to organizations with adequate technological resources. At this stage of the prototype, we have only tested our system with relatively small datasets as proof of concept and acknowledge that our system cannot manage large datasets at this point. We have included the following line in the limitation section: “Future work should focus on assessing TextVista's performance and scalability with larger datasets.” Reviewer: The number of participants involved in the validation studies was relatively small, which may not provide a fully representative assessment of the tool's performance across different user groups. We understand the reviewers' concerns and have included the following line in the limitation section: “The number of participants in the design process was small, which may not represent all potential future users, so the findings and TextVista's implementation may have unintentionally perpetuated biases or overlooked the needs of future users.” Reviewer: The relationship between the participants' prior work experience in text analysis and their feedback on the tool is not well-defined, which could influence the applicability of the results. We have included the following line in the sections 5.1 and 6.1: “They all had experience in analyzing various forms of data, including text.” Reviewer: The document does not sufficiently detail the methodology and process used by the two researchers conducting the qualitative analysis during the first study, obscuring the robustness of these findings. Did the same coders code all three studies? What is the background and positionality of the codes? Moreover, grounded theory is used to describe different approaches to thematic analysis, and the paper needs to provide more details to know if they correctly followed the grounded theory methodology or other methods, e.g., Braun and Clarke, etc. To address the reviewers' concerns, we have included the following line in sections 3.2, 5.1 and 6.1: “Following this, we conducted a grounded theory analysis of the transcripts using open coding in NVivo software. Each transcript was independently coded by two researchers, both of whom are master's students in human-computer interaction. These researchers underwent training in qualitative research from an experienced qualitative analyst. Researchers discussed and developed a codebook, resolving discrepancies during the coding process. Emerging themes and patterns from the codes were identified and discussed.” Reviewer: The ML/NLP techniques used, such as PCA, are not adequately justified in Section 4.1, leaving a gap in understanding why these particular methods were chosen over others. To clarify the justifications, we have included the following line in section TextVista “We experimented with 3 different dimensionality reduction methods: PCA, t-distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). When we plotted the reduced data in a 2D space, the PCA gave us the most visually appropriate results.” Reviewer: The absence of a baseline technique for comparison is noted, but there's insufficient discussion on how TextVista's approach differs from similar systems, and why a baseline was deemed unnecessary. We acknowledge this limitation of our work and have included the following line in section 5 “We did not conduct a comparison study with a baseline system, as the goal of our system was to help analysts find insight into their data.” Thank you again for your valuable input. Best Regards, Authors of “TextVista: NLP-Enriched Time-Series Text Data Visualizations”
Keywords: Data Analysis, Reasoning, Problem Solving, Decision Making, Qualitative Evaluation, Text/Document Data
Abstract: There is a vast amount of unstructured text data generated every day analyzing and making sense of these text-based datasets is a complex, cumbersome task. The existing visualization tools that analyze text data leveraging Natural Language Processing (NLP) techniques, are often tailored for structured text-based data. They also fail to support reading, a crucial analysis task to validate the output of NLP techniques. We designed and developed TextVista, an NLP-enriched visualization tool that supports analysts during their analysis of unstructured text with temporal references. Our tool combines techniques including clustering, sentiment analysis, and threat detection with three views that visualize high-level patterns in the data to encourage reading. We report on TextVista's iterative design process, which included a focus group to distill design requirements, a think-aloud interview study with data analysts to understand their impressions of the tool, and a diary study to assess its long-term usage. Through this process, we identified how TextVista supported the analysis of unstructured text with temporal references using NLP techniques and fostered methods to promote reading in situ. TextVista also encouraged serendipity when analyzing data via its question-focused overviews and flexible avenues to explore data.
Supplementary Material: zip
Submission Number: 16
Loading