Synthesized minority Oversampling Technique-Reverse k-nearest Neighbors-K-Dimensional Tree for dairy food safety risk evaluation
Abstract: With the increasing focus on food safety risks, evaluating the safety risk level of dairy products remains challenging due to issues such as small sample sizes, the presence of noise, and the low performance of complex risk evaluation models. To address these issues, this paper introduces the Synthesized Minority Oversampling Technique (SMOTE) based Reverse K-Nearest Neighbors (RKNN) integrating K-Dimensional (KD) Tree (SMOTE-RKNN-KD Tree) method for data extension. The SMOTE is used to enhance the dataset by generating samples for underrepresented classes, addressing the challenge of small sample sizes. Then, an improved RKNN method is applied to identify and remove noisy data by evaluating class proportions, improving classification accuracy and robustness. Additionally, the KD Tree optimizes the risk evaluation model by organizing data indexing, significantly reducing the computational complexity of the RKNN and enhancing operational efficiency. The proposed method is validated using an actual sterilized milk product dataset. In terms of U-tests and contour coefficients, the extended data distribution closely matched the original dataset. The SMOTE-RKNN-KD Tree combined with EXtreme Gradient Boosting (XGBoost) to construct the SMOTE-RKNN-KD Tree-XGBoost based food safety risk assessment model. Compare the SMOTE-RKNN-KD Tree-XGBoost model with models trained on the original dataset using the Random Forest (RF), the Recurrent Neural Network (RNN), the XGBoost, the SMOTE-XGBoost, and the SMOTE-RKNN-XGBoost, the results demonstrate that the SMOTE-RKNN-KD Tree-XGBoost model significantly outperforms other methods in both classification accuracy and operational efficiency. These findings highlight the effectiveness of the proposed method in addressing small sample size and noisy data challenges, providing a reliable tool for evaluating food safety risks in dairy products.
Loading