Structure and Content Based Blog Pages Identification

Published: 01 Jan 2008, Last Modified: 13 Jun 2025FSKD (2) 2008EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Blog is becoming more and more popular with the rapid development of Internet. It needs to find an automatic way to distinguish the blog pages from ordinary Web pages for the content extraction of blog pages and the blog community discovered. Some basic concepts and ideas in the area of blog was described in this paper, and a method on the blog pages identification is proposed, which is based on the blog pages structure and blog content. The experimentation shows that a high result can be achieved in precision.
Loading