Graph Neural Networks for Nomination and Representation Learning of Web Elements

Stefan Magureanu, Alexandra Hotti, Riccardo Sven Risuleo, Aref Moradi, Jens Lagergren

24 Dec 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: This paper tackles the under-explored problem of DOM element nomination and representation learning with three important contributions. First, we present a large-scale and realistic dataset of webpages, far richer and more diverse than other datasets proposed for element representation learning, classification and nomination on the web. The dataset contains 51, 701 manually labeled product pages from 8, 175 real e-commerce websites. Second, we adapt several Graph Neural Network (GNN) architectures to website DOM trees and benchmark their performance on a diverse set of element nomination tasks using our proposed dataset. In element nomination, a single element on a page is selected for a given class. We show that on our challenging dataset a simple Convolutional GNN outperforms state-of-the-art methods on web element nomination. Finally, we propose a new training method that further boosts the element nomination accuracy. In nomination for the web, classification (assigning a class to a given element) is usually used as a surrogate objective for nomination during training. Our novel training methodology steers the classification objective towards the more complex and useful nomination objective.

0 Replies