BigTokDetect: A Clinically‑Informed Vision–Language Model Framework for Detecting Pro‑Bigorexia Videos on TikTok
Abstract: Social media platforms increasingly struggle to detect harmful content that promotes muscle‑dysmorphic behaviors, particularly pro‑bigorexia material that disproportionately affects adolescent males. Unlike traditional eating‑disorder detection focused on the “thin ideal,” pro‑bigorexia content often masquerades as legitimate fitness guidance, combining visual displays, coded language, and motivational messaging in ways that evade text‑only detection systems. We address this challenge with BigTokDetect, a clinically informed framework for identifying pro‑bigorexia videos on TikTok. At its core is BigTok, the first expert‑annotated multimodal dataset of more than 2,200 TikTok videos labeled by clinical psychologists and psychiatrists across five primary categories—body image, nutrition, exercise, supplements, and masculinity. Througha comprehensive evaluation of state‑of‑the‑art vision–language models, BigTokDetect achieves 82.9 percent accuracy on primary‑category classification and 69.0 percent on subcategory detection after domain‑specific fine‑tuning. Ablation studies show that multimodal fusion outperforms text‑only approaches by 5–10 percentage points, with video features providing the most discriminative signals. These results establish new benchmarks for multimodal harmful‑content detection and offer a scalable, clinically grounded approach to content moderation in specialized mental‑health domains.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: human behavior analysis, NLP tools for social analysis, quantitative analyses of social media, multimodality, video processing, healthcare applications, clinical NLP, NLP for social good, corpus creation, benchmarking
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Ethics Statement
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 4: Data Development, Appendix C
B2 Discuss The License For Artifacts: N/A
B2 Elaboration: Appendix C, D.1
B3 Artifact Use Consistent With Intended Use: N/A
B3 Elaboration: Appendix D.1
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: Appendix C
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 4
B6 Statistics For Data: Yes
B6 Elaboration: Section 4.2.4 and Appendix C
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 5.2
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 5.2
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 5.2
C4 Parameters For Packages: Yes
C4 Elaboration: Appendix D1
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Appendix B.2, Ethics Statement
D2 Recruitment And Payment: N/A
D2 Elaboration: Appendix B1
D3 Data Consent: N/A
D3 Elaboration: Appendix C
D4 Ethics Review Board Approval: Yes
D4 Elaboration: Ethic Statements
D5 Characteristics Of Annotators: Yes
D5 Elaboration: Table 12, Appendix
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethics Statement
Author Submission Checklist: yes
Submission Number: 757
Loading