Abstract: Highlights•We propose a language-driven visible light human activity recognition method, leveraging generative large language model to achieve the information fusion between visible light signal and natural language text.•We propose a framework that aligns visible light signals with textual descriptions by tokenizing the signals into word-level embeddings, which are decoded by a generative large language model.•We construct a prototype system, and make a custom dataset. Extensive evaluations demonstrate the proposed method achieve effective visible light human activity recognition in a realistic indoor space.
Loading