WebQuality: A Large-scale Multi-modal Web Page Quality Assessment Dataset with Multiple Scoring Dimensions
Abstract: The assessment of web page quality plays a critical role in a range of downstream applications, yet there is a notable absence of datasets for the evaluation of web page quality. This research presents the pioneering task of web page quality assessment and introduces the first comprehensive, multi-modal Chinese dataset named WebQuality specifically designed for this task. The dataset includes over 65,000 detailed an-notations spanning four sub-dimensions and incorporates elements such as HTML+CSS, text, and visual screenshot, facilitating in-depth modeling and assessment of web page quality. We performed evaluations using a variety of baseline models to demonstrate the complexity of the task. Additionally, we propose Hydra, an integrated multi-modal analysis model, and rigorously assess its performance and limitations through extensive ablation studies. To advance the field of web quality assessment, we offer unrestricted access to our dataset and codebase for the research community, available at url-is-hidden-for-blind-review.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English,Chinese
0 Replies
Loading