Abstract: Data-driven soft sensing enables to monitor and control complex industrial processes in real-time. Whereas recent data stream mining algorithms bolster predictive modeling on soft sensing data, which increment in volume and vary in feature dimensions, they operate mainly in closed-world settings, where all class labels must be known beforehand. This is restrictive in practical applications like semiconductor manufacturing, where new wafer defect types emerge dynamically in unforeseeable manners. This study aims to advance online algorithms by allowing learners opt to abstain from make prediction at certain costs. Our key idea is to establish a universal representation space aligning feature dimensions of incoming points while delineating a geometric shape underpinning them. On this shape, we minimize the region spanned by points of known classes through optimizing the trade-off between empirical risk and abstention cost. Theoretical results rationalize our universal representation learning design. We benchmark our approach on six datasets, including one real-world dataset of wafer fault-diagnostics collected through chip manufacturing lines in Seagate. Experimental results substantiate the effectiveness of our proposed approach, demonstrating superior performance over six state-of-the-art rival models. Code and datasets are openly accessible via an anonymous link: https://github.com/X1aoLian/OWSS.
Loading