GEM: Gestalt Enhanced Markup Language Model for Web Understanding via Render Tree

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: NLP Applications
Submission Track 2: NLP Applications
Keywords: Gestalt, Markup Language, Web Understanding, Language Model
TL;DR: The Gestalt Enhanced Markup (GEM) Language Model is proposed, which flexibly incorporates heterogeneous visual information from rendered web pages into the LM without adding visual modality input.
Abstract: Inexhaustible web content carries abundant perceptible information beyond text. Unfortunately, most prior efforts in pre-trained Language Models (LMs) ignore such cyber-richness, while few of them only employ plain HTMLs, and crucial information in the rendered web, such as visual, layout, and style, are excluded. Intuitively, those perceptible web information can provide essential intelligence to facilitate content understanding tasks. This study presents an innovative Gestalt Enhanced Markup (GEM) Language Model inspired by Gestalt psychological theory for hosting heterogeneous visual information from the render tree into the language model without requiring additional visual input. Comprehensive experiments on multiple downstream tasks, i.e., web question answering and web information extraction, validate GEM superiority.
Submission Number: 437
Loading