Abstract: Whenever we watch a TV show or movie, we process a substantial amount of information that is conveyed to us via various multimedia mediums, in particular: visual, textual, and audio. These data signify distinctive properties that aid in creating a unique motion picture experience. In effort to not only produce a more personalised recommender system, but also tackle the problem of popularity bias, we develop a system that incorporates the use of multimodal information. Specifically, we investigate the correlation between features that are extracted using state of the art techniques and deep learning models from visual characteristics, audio patterns and subtitles. The framework is evaluated on a dataset comprising of 145 BBC TV programmes against genre and user baselines. We demonstrate that personalised recommendations can not only be improved with the use of multimodal information, but also outperform genre and user-based models in terms of diversity, whilst maintaining matching levels of accuracy.
Loading