Keywords: Clustering, Unsupervised Learning, Taxonomy
TL;DR: We introduce Bridged Clustering, an algorithm that leverages existing unsupervised datasets to help achieve new supervised objectives in scientific research.
Abstract: We introduce Bridged Clustering, an algorithm that leverages existing unsupervised datasets to help achieve new supervised objectives in scientific research. Applying supervised learning to scientific research often poses the challenge of labeling enough samples to support scalable inference. As an alternative to excessive labeling, our algorithm leverages unlabeled data that is either already available in existing research or easier to collect in general. Bridged Clustering leverages two distinct sets of unlabeled data and a sparse supervised dataset to perform inference. The algorithm operates by independently clustering the input and output feature spaces, then learning a mapping between these clusters using the supervised set. This approach effectively bridges the gap between disparate data sources, enhancing predictive performance without needing extensive labeled data. We demonstrate the efficacy of Bridged Clustering in a biological context, where it successfully infers genetic information of leaf samples from their morphological traits. In general, Bridged Clustering offers a robust framework for utilizing available unlabeled data to support new inference objectives in scientific research, especially where labeled data is scarce.
Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.
Submission Number: 17
Loading