Spelling suggestions: "subject:"informatics, computer science, eeb sites"" "subject:"informatics, computer science, beb sites""
1 |
Coarse-grained Classification of Web Sites by Their Structural PropertiesLindemann, Christoph, Littig, Lars 28 January 2019 (has links)
In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the composition of URLs, and the link structure of Web sites. Opposed to previous work, we perform a comprehensive measurement study to delve into the relation between the structure and the functionality of Web sites. Our study focuses on five of the most relevant functional classes, namely Academic, Blog, Corporate, Personal, and Shop. It is based upon more than 1,400 Web sites composed of 7 million crawled and 47 million known Web pages. We present a detailed statistical analysis which provides insight into how structural properties can be used to distinguish between Web sites from different functional classes. Building on these results, we introduce a content-independent approach for the automated coarse-grained classification of Web sites. A naïve Bayesian classifier with advanced density estimation yields a precision of 82% and recall of 80% for the classification of Web sites into the considered classes.
|
Page generated in 0.1445 seconds