Leveraging Multimodal Retrieval-Augmented Generation for Cyber Attack Detection in Transit Systems

Muhaimin Bin Munir, Yuchen Cai, Latifur Khan, Bhavani Thuraisingham

Published: 28 Oct 2024, Last Modified: 13 Mar 20262024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)EveryoneCC BY 4.0

Abstract: Large Language Models (LLMs) often tend to hallucinate, one of the reasons is due to limitations in their training datasets. These datasets are vast, and the training process is resource-intensive, making LLMs unreliable for generating accurate responses for recent information. To address this issue, Retrieval-Augmented Generation (RAG) uses indexed text chunks from relevant, up-to-date knowledge databases to generate more accurate and current responses. Our project explores the use of RAG in the domain of transit security. Transit security systems include physical objects such as video and audio surveillance, alarms, threat sensors, and infrastructure monitoring sensors, which scan the environment for potential threats and relay this information to the Transit Management Center, Transit Vehicles, Emergency Management Center, etc. We aimed to predict potential cyber threats to these information flows that adversaries might exploit to infiltrate the systems. By utilizing the description of the information flow and other characteristics of the data, we leveraged LLMs with RAG to map possible cyber attack techniques from the MITRE ATT&CK knowledge-base. As the MITRE ATT&CK technique database is continuously updated to keep track of the new cyberattack techniques, using RAG enhances our ability to predict how adversaries might target transit security information flows. We analyzed information flows of transit systems from the USDOT public website, manually annotating possible attack techniques to establish a benchmark. Our multimodal RAG model achieved an F-1 score of 40.5% and a precision of 42.5%, representing a 73.65% improvement over the baseline approach. These results demonstrate the effectiveness of integrating LLMs with RAG and incorporating multimodality in predicting cyber threats in transit cybersecurity.