Abstract: A new question classification approach is presented for questions in CQA (Community Question and answering Systems). In CQA, most of the questions are non-factoid questions and can hardly be classified according to their answer types as factoid questions. A rough grained category is introduced and Multi-label classification method is used for question classification. That is, a question can belong to several categories instead of a specific one and the classification result is a category set. A two-step strategy is used for question Multi-label classification. In the first step, series binary classifiers of each question category are used separately. In the second step, results of those classifiers are combined and a set of question category is given as classification result. A hybrid kernel model, which combines tree kernel and polynomial kernel, is used for each binary classifier. A data set with 22000 questions is built and 20000 of which is used as training data, other 2000 as test data. Experiment result shows that the hybrid model is effective. A question paraphrase recognition experiment is carried on to verify the effectiveness of multi-label classification. The experiment results show that Multi-label classification is better than Single-label classification for questions in CQA.
0 Replies
Loading