Abstract: Information Extraction techniques can retrieve useful information from unstructured data that improve data analytics’ effectiveness and play a key role in the consumer decision-making process. The growth of sponsored content videos on social media increases the demand on knowing the effectiveness of the sponsored investment in the engagement results obtained. This study aims to analyze an approach to infer sponsored content in videos from top digital influencers on YouTube using knowledge acquisition techniques in audio transcriptions. A dataset with 34,563 videos, among 103 different YouTubers channels, was used in a model comprising six stages: data acquisition, preprocessing of documents, identification of candidate videos, manual transcriptions processing, automatic transcriptions processing, and tuple filtering. Despite we perceived several difficulties during the recognition of speech from YouTube videos, such as the absence of clear boundaries between words, the presence of two or more people talking in the video, and colloquial expressions, as a result, the approach identifies sponsored videos from their audio transcripts in a feasible process by identifying keywords, obtaining knowledge from tuples, and recognizing named entities.
Loading