Abstract: This paper proposes a novel approach to apply machine learning techniques to data collected from emerging cooperative intelligent transportation systems (C-ITS) using Vehicle-to-Vehicle (V2V) broadcast communications. Our approach considers temporal and spatial aspects of collected data to avoid correlation between the training set and the validation set. Connected vehicles broadcast messages containing safety-critical information at high frequency. Thus, detecting faulty messages induced by attacks is crucial for road-users safety. High frequency broadcast makes the temporal aspect decisive in building the cross-validation sets at the data preparation level of the data mining cycle. Therefore, we conduct a statistical study considering various fake position attacks. We statistically examine the difficulty of detecting the faulty messages, and generate useful features of the raw data. Then, we apply machine learning methods for misbehavior detection, and discuss the obtained results. We apply our data splitting approach to message-based and communication-based data modeling and compare our approach to traditional splitting approaches. Our study shows that traditional splitting approaches performance is biased as it causes data leakage, and we observe a 10% drop in performance in the testing phase compared to our approach. This result implies that traditional approaches cannot be trusted to give equivalent performance once deployed and thus are not compatible with V2V broadcast communications.
Loading