Abstract: This paper tackles the under-explored problem of DOM element
nomination and representation learning with three important contributions. First, we present a large-scale and realistic dataset of
webpages, far richer and more diverse than other datasets proposed
for element representation learning, classification and nomination
on the web. The dataset contains 51, 701 manually labeled product
pages from 8, 175 real e-commerce websites. Second, we adapt several Graph Neural Network (GNN) architectures to website DOM
trees and benchmark their performance on a diverse set of element
nomination tasks using our proposed dataset. In element nomination, a single element on a page is selected for a given class. We
show that on our challenging dataset a simple Convolutional GNN
outperforms state-of-the-art methods on web element nomination.
Finally, we propose a new training method that further boosts the
element nomination accuracy. In nomination for the web, classification (assigning a class to a given element) is usually used as
a surrogate objective for nomination during training. Our novel
training methodology steers the classification objective towards
the more complex and useful nomination objective.
0 Replies
Loading