Visual Web Archive Quality AssessmentOpen Website

2022 (modified: 04 Oct 2022)TPDL 2022Readers: Everyone
Abstract: The large size of today’s web archives makes it impossible to manually assess the quality of each archived web page, i.e., to check whether a page can be reproduced faithfully from an archive. For automated web archive quality assessment, previous work proposed to measure the pixel difference between a screenshot of the original page and a screenshot of the same page when reproduced from the archive. However, when categorizing types of reproduction errors (we introduce a respective taxonomy in this paper) one finds that some errors cause high pixel differences between the screenshots, but lead to only a negligible degradation in the user experience of the reproduced web page. Therefore, we propose to visually align page segments in such cases before measuring the pixel differences. Since the diversity of reproduction error types precludes a one-size-fits-all solution for visual alignment, we focus on one common type (translated segments) and investigate the usefulness of video compression algorithms for this task.
0 Replies

Loading