Experiences and lessons in developing industry-strength machine learning and data mining software

Chih-Jen Lin

2012 (modified: 16 Jul 2019)KDD 2012Readers: Everyone

Abstract: Traditionally academic machine learning and data mining researchers focus on proposing new algorithms. The task of implementing these methods is often left to companies that are developing software packages. However, the gap between the two sides has caused some problems. First, the practical deployment of new algorithms still involves some challenging issues that need to be studied by researchers. Second, without further investigation after publishing their papers, researchers have neither the opportunity to work with real problems nor see how their methods are used. We discuss the experiences in developing two machine learning packages LIBSVM and LIBLINEAR, that are widely used in both academia and industry. We demonstrate that the interaction with users leads us to identify some important research problems. For example, the decision to study and then support multi-class SVM was essential in the early stage of developing LIBSVM. The birth of LIBLINEAR was driven by the need to classify large-scale documents in Internet companies. For fast training of large-scale problems, we had to create new algorithms other than those used in LIBSVM for kernel SVM. We present some practical use of LIBLINEAR for Internet applications. Finally, we give lessons learned and future perspectives for developing industry-strength machine learning and data mining software.

0 Replies