Abstract: In this paper, we introduce SeiFS, a Secure and Lightweight Feature Selection system designed to ensure high-quality inputs for Machine Learning (ML) tasks. Unlike previous approaches involving multiple non-colluding servers, SeiFS operates in a natural ML scenario where multiple entities interact with a single server, without relying on additional strong assumptions. Our work presents intrinsic optimizations in feature selection that yield substantial performance improvements, including a customized data encoding method, a size-optimized comparison circuit, and a shared oblivious dimensionality reduction technique. The customized data encoding method, combined with an optimized secure data access protocol, reduces expensive comparison operations from $O(m)$ to $O(\log m)$ , where m represents the number of samples. The size-optimized comparison circuit achieves up to a quadruple reduction in size compared to naïve implementations. Additionally, the shared oblivious dimensionality reduction technique incorporates a novel approximated top-k selection algorithm, resulting in a circuit size reduction of approximately $k\times $ . Comprehensive experiments conducted across various network settings demonstrate that our protocols outperform existing solutions, delivering efficiency improvements of an order of magnitude. Specifically, the end-to-end execution of SeiFS on real-life datasets achieves at least $62.7\times $ improvements in runtime compared to the naïve implementation and takes up to $112.9\times $ fewer runtimes than the state-of-the-art in the LAN setting.
Loading