Abstract: This paper explores various machine learning methods such as logistic regression, support vector machine, and gradient-based variants to predict the multi-label and multi-class movie genres based on the plot summaries. To vectorize the plot summaries, two text representation methods are implemented including the Term-Frequency and Inverse-Document Frequency (TF-IDF) and Bag-of-Words (BoW) algorithms. The result on the comparison between the text representation models showed that the logistic regression model outperformed other machine learning models including stochastic gradient descent, support vector machine, and gradient-boosting variants, with the score of 0.504 in discounted cumulative gain score, and 0.628 in Fl-score using TF-IDF approach.
External IDs:dblp:conf/icaiic/KimKKKJ24
Loading