Adapting session based recommendation for features through transfer learning

Even Oldridge

Published: 01 Jan 2018, Last Modified: 27 Jul 2024RecSys 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This industry talk covers the deep learning architecture developed at Realtor.com to recommend real estate listings to our userbase. The recommendation of homes is a different problem than most other domains both in the sense that listings are unique and that there are additional geographic and time constraints that increase the sparsity of interactions and make recommendation of individual listings more challenging. In particular time on market in a hot area can be limited to weeks or even days, and listing cold-start is critical to providing up to date market information. Thankfully the structured feature data for listings is incredibly rich and provides a framework from which to map listings into a meaningful vector space. User first impressions are also incredibly important in this highly competitive field, and offline recommendation or models that don't adapt during the users session are less desirable.In order to solve this recommendation problem we have developed a model based off of session based recommendation [1]. The architecture utilizes state of the art techniques from Natural Language Processing, including the AWD-LSTM language model developed by Salesforce [2]. To solve for cold-start of listings a structured data based denoising autoencoder was adapted from the methodology described in the winning entry of the Puerto Segurno Safe Driver Kaggle Competition [3]. This model is not used in the common way of generating fixed feature vectors, but rather the entire head of the autoencoder model, from the feature inputs to the middle layer commonly used as the vector output, is first trained to encode listing features, and then becomes the input to the AWD-LSTM architecture. This style of transfer learning is common in Computer Vision, and has recently been utilized in NLP to achieve state of the art results for text classification [4]. By including the head we are able to further optimize the listing encoder network and embeddings to take user interactions into account. As in traditional session based recommendation users are represented as the sequence of listings that they view, however those listings are fed into the model as the sequence of features.The final system consists of several components. The first attempts to calculate and maintain the users' feature vector and model hidden weights in near realtime, providing a representation for the user within the system. This representation is used by several downstream components, most notably the search rerank and recommendation modules which calculate users' interest in listings both in the context of the output of more traditional elasticsearch queries via cosine similarity of user/listing vectors and through approximate nearest neighbor vector space searches for relevant listings which form the input set for a pointwise scoring model trained on time on listing as done by YouTube [5].