Measuring Student's Perception of LLM Capabilities and Usage

Oluchi Obadoni; Steven R Wilson

Measuring Student's Perception of LLM Capabilities and Usage

Oluchi Obadoni, Steven R Wilson

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Large Language Models, Education, Social Norms, Computer Science, Software Engineering

Abstract: Measuring Students’ Perception of LLM Capabilities and Usage Large Language Models (LLMs), such as OpenAI’s ChatGPT and other generative AI systems, have become deeply embedded in higher education workflows. Students are increasingly using these systems for coding, writing, summarizing, problem-solving, and exam preparation (Plecerda L.P, 2024). While prior research has examined students’ general attitudes toward AI adoption (Alderabi et al., 2025), perceived usefulness (Zhou et al., 2024;Yan et al.,2025), and behavioral intention (Grande et al. 2024; Kim et al., 2023), there remains limited empirical investigation into a critical distinction: what students believe LLMs can do, what they believe their peers use them for, and what actually occurs in practice. Existing studies on AI in higher education have primarily used mixed-method surveys (Zhou et al., 2024) and interviews to capture perceived benefits, risks (Yan et al.,2025), and adoption determinants (Alderabi et al., 2025). Other quantitative work has applied Structural Equation Modeling (SEM) (Alderabi et. al, 2025) to explain behavioral intention, while computational sentiment analysis (Maulizidan and Tania,2025) have explored public discourse around generative AI on the social media platform, X. Research has also investigated student-AI collaboration and its effects on learning task performance, highlighting potential benefits and limitations (Kim and Lee, 2023), as well as students’ perceptions of LLMs making coursework easier or more efficient (Vallade, Kaufmann, and Upchurch, 2025). However, these approaches largely measure attitudes, acceptance, or public opinion rather than directly quantifying perception accuracy and misalignment between belief and reality. In particular, some research (Ursavas¸ et. al, 2025) investigates whether students overestimate or underestimate LLM autonomy (Okoyeagu et. al, 2026), and whether students accurately perceive how widely their peers actually use these tools (McClain et al.,2026). This study addresses this gap by systematically measuring: (1) Students’ perceived capabilities of LLMs, (2) Students’ perceived peer usage of LLMs, (3) Self-reported actual usage behavior, (4) The discrepancy between perceived and actual usagerates,(5) The discrepancy between perceived capability and documented technical limitations By quantifying both capability perception gaps and social norm misperceptions (i.e., pluralistic ignorance regarding peer usage), this work-in-progress seeks to provide a clearer understanding of how beliefs about LLM functionality and prevalence shape student behavior. The findings should inform AI literacy interventions, academic integrity policies, and curriculum design by distinguishing between myth, perception, and measurable reality. The methodology involves employing a large-scale computational discourse analysis to examine students’ perceptions of Large Language Model (LLM) capabilities and peer usage patterns in higher education contexts. Unlike prior survey-based perception studies, this research analyzes naturally occurring discussions on public social media platforms(i.e Reddit and Tiktok)to capture authentic, unsolicited beliefs and behavioral disclosures using online posts and comments. The research questions include: (1) What do users, especially students perceive AI is used for? (2) From social media discussions, what are students’ opinions about using AI for homework? (3) How often do students report using AI for their homework? (4) How do user perceptions of AI capabilities for educational tasks change over time. The required data fields include submission and comment text, timestamps, de-identified usernames, upvote/downvote counts, likes/shares, threads and discussion context. These elements are necessary to assess discourse trends, temporal changes, and community-specific norms surrounding AI discussions. The data will be processed using a structured taxonomy designed to categorize discussions about AI capabilities. Posts and comments will be human-annotated and classified by (1) perceived AI capability (e.g., autonomous task completion, assisted task completion, or misunderstanding/overestimation), (2) task domain (such as education, and/or software development), (3) level of human involvement (implied or stated), and (4) sentiment or stance. This reviewed and annotated subset will be used to train and evaluate an AI model on the same labeling task, with model predictions compared directly against human labels to assess reliability. After this review-based validation step, the model will be used to label a larger corpus of posts to support scalable analysis. The final analysis will be conducted on the full labeled dataset, with additional human review of selected model-labeled samples to confirm accuracy and provide qualitative context for patterns such as recurring misconceptions, shifts in expectations, and differences in perception and usage over time.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 22

Loading