Position: Multi-Modal LLMs for Video Behavioral Coding in High-Stakes Decision-Making Are Bounded by Polysemy, Not by Model Scale
Keywords: multi-modal LLMs, video understanding, behavioral coding, identifiability, life sciences, position paper
TL;DR: Polysemy in observation-to-code mappings caps Bayes-rate accuracy of video LLMs. With K=18 codes, polysemy 1 to 7 drops Bayes accuracy from 70% to 22%; multimodal augmentation helps but is sub-additive when modalities are weak.
Abstract: Multi-modal LLMs are increasingly applied to video data in life-science-adjacent settings: clinical training assessment, surgical performance evaluation, doctor-patient communication audits, police body-camera analysis (Dube et al. 2025 QJE), and inter-agency emergency response coordination. The standard reporting practice evaluates these models by aggregate accuracy on benchmark datasets where surface behaviors map cleanly to a single ground-truth code. Real high-stakes behavioral coding, by contrast, is characterized by polysemy: the same observable behavior maps to multiple latent codes with prior probabilities that depend on context. We argue that aggregate benchmark accuracy is a poor proxy for clinical or operational utility because it does not surface the polysemy-induced identifiability gap. We support the position with a controlled simulation calibrated to an 18-code IPA-style scheme adapted from Bales (1950) for inter-agency conflict analysis: as polysemy grows from 1 to 7 latent codes per surface behavior, Bayes-optimal accuracy drops from 0.70 to 0.22 and posterior entropy on latent codes grows from 0.48 to 1.85 nats (out of log 18 = 2.89). Multi-modal augmentation helps: combining audio (44% accurate), visual (54%), and text (62%) modalities yields 71% accuracy. We propose a per-segment uncertainty reporting protocol for video-LLM papers in life-science applications.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 98
Loading