Abstract: Training a model typically involves using data that is similar to the context in which it will be tested. This is a logical and common practice, as text within a similar context tends to have similar lexicons. However, it is uncertain whether using similar sources is crucial to obtaining good results. This document aims to explore this question. Categorizing data is a time-consuming task, and the ability to limit it would be beneficial. If it were possible to mix sources, it could save a significant amount of time. In this study, we will compare data from Amazon, which has a diverse range of writers, with more homogeneous data such as poems, to determine if it could aid in automatic classification.
0 Replies
Loading