Kona: An Efficient Privacy-Preservation Framework for KNN Classification by Communication Optimization
Abstract: K-nearest neighbors (KNN) classification plays a significant role in various applications due to its interpretability. The accuracy of KNN classification relies heavily on large amounts of high-quality data, which are often distributed among different parties and contain sensitive information. Dozens of privacy-preserving frameworks have been proposed for performing KNN classification with data from different parties while preserving data privacy. However, existing privacy-preserving frameworks for KNN classification demonstrate communication inefficiency in the online phase due to two main issues: (1) They suffer from huge communication size for secure Euclidean square distance computations. (2) They require numerous communication rounds to select the $k$ nearest neighbors. In this paper, we present $\texttt{Kona}$, an efficient privacy-preserving framework for KNN classification. We resolve the above communication issues by (1) designing novel Euclidean triples, which eliminate the online communication for secure Euclidean square distance computations, (2) proposing a divide-and-conquer bubble protocol, which significantly reduces communication rounds for selecting the $k$ nearest neighbors. Experimental results on eight real-world datasets demonstrate that $\texttt{Kona}$ significantly outperforms the state-of-the-art framework by $1.1\times \sim 3121.2\times$ in communication size, $19.1\times \sim 5783.2\times$ in communication rounds, and $1.1\times \sim 232.6\times$ in runtime.
Lay Summary: Imagine several hospitals each have valuable but sensitive patient data—like medical histories or test results. To improve medical decision-making, they may want to compare a new patient’s case with similar past cases from other hospitals. A common way to do this is using a machine learning method called K-nearest neighbors (KNN), which finds the most similar past cases to help predict outcomes like likely diagnoses. The more relevant data the model can access, the more accurate its predictions become.
But here's the problem: due to privacy laws, hospitals can't just share raw patient data with each other.
Our work introduces Kona, an efficient system that allows different organizations to use their data together to perform KNN while keeping their data private. What makes Kona special is that it solves a big problem in existing privacy-protecting systems: they often require too much online data transfer, which makes them slow and hard to use.
Kona speeds things up in two smart ways. First, it changes how distances between data points are calculated, so no extra communication is needed during that step. Second, it streamlines the process of picking the nearest neighbors, cutting down the number of times systems have to "talk" to each other. When we tested Kona with real-world data, it was up to 3,000 times more efficient in communication and over 200 times faster than the best existing methods—all while keeping data private and getting the same accurate results.
Link To Code: https://github.com/FudanMPL/Garnet/tree/kona
Primary Area: Social Aspects->Privacy
Keywords: Privacy Preservation, K-Nearest Neighbors
Submission Number: 11251
Loading