Multi Language Application of Previously Developed Transcripts Classifier

Theodora Danciulescu, Stella Heras, Javier Palanca, Vicente Julián, Marian Cristian Mihaescu

Published: 2021, Last Modified: 15 May 2023IDEAL 2021Readers: Everyone

Abstract: Developing classification models and using them on another similar data-set represents a challenging task. We have adapted an existing data analysis pipeline that classified Spanish educational video transcripts from Universitat Politècnica de València (UPV) to process English video transcripts from TedTalks. Performed adaptation and experimental results were performed also in educational context as in the initial study. We found that the process needs minor adaptations of the data analysis pipeline, but the overall results is highly dependent of the size of the data-set, and especially on the class balance. Parametrisation of the pipeline may open the way of deploying the model in other similar contexts by transfer learning.

0 Replies