Clustering Without Knowing How To: Application and EvaluationOpen Website

Published: 01 Jan 2023, Last Modified: 14 Aug 2023ECIR (3) 2023Readers: Everyone
Abstract: Clustering plays a crucial role in data mining, allowing convenient exploration of datasets and new dataset bootstrapping. However, it requires knowing the distances between objects, which are not always obtainable due to the formalization complexity or criteria subjectivity. Such problems are more understandable to people, and therefore human judgements may be useful for this purpose. In this paper, we demonstrate a scalable crowdsourced system for image clustering, release its code at https://github.com/Toloka/crowdclustering under a permissive license, and also publish demo in an interactive Python notebook. Our experiments on two different image datasets, dresses from Zalando’s FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning purely with crowdsourcing. In addition, these two cases show the usefulness of such an approach for domain-specific clustering process in fashion recommendation systems or e-commerce.
0 Replies

Loading