Return to search

Finding structure and characteristic of web documents for classification.

by Wong, Wai Ching. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 91-94). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgments --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Semistructured Data --- p.2 / Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 / Chapter 1.2.1 --- Labels and Values --- p.4 / Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 / Chapter 1.2.3 --- Classifying A Web Page --- p.6 / Chapter 1.3 --- Organization of the Thesis --- p.8 / Chapter 2 --- Background --- p.8 / Chapter 2.1 --- Related Work on Web Data --- p.8 / Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 / Chapter 2.1.2 --- Schema Extraction --- p.11 / Chapter 2.1.3 --- Discovering Typical Structure --- p.15 / Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 / Chapter 2.2 --- Automatic Text Processing --- p.19 / Chapter 2.2.1 --- Stopwords Elimination --- p.19 / Chapter 2.2.2 --- Stemming --- p.20 / Chapter 3 --- Web Data Definition --- p.22 / Chapter 3.1 --- Web Page --- p.22 / Chapter 3.2 --- Problem Description --- p.27 / Chapter 4 --- Hierarchical Structure --- p.32 / Chapter 4.1 --- Types of HTML Tags --- p.33 / Chapter 4.2 --- Tag-tree --- p.36 / Chapter 4.3 --- Hierarchical Structure Construction --- p.41 / Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 / Chapter 5 --- Similar Labels Discovery --- p.53 / Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 / Chapter 5.2 --- Labels Discovery Algorithm --- p.55 / Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 / Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 / Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 / Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 / Chapter 5.3.1 --- Phase 1 Results --- p.75 / Chapter 5.3.2 --- Phase 2 Results --- p.77 / Chapter 5.3.3 --- Phase 3 Results --- p.81 / Chapter 5.4 --- Classifying a Web Page --- p.83 / Chapter 5.4.1 --- Similarity Measurement --- p.84 / Chapter 5.4.2 --- Performance Evaluation --- p.86 / Chapter 6 --- Conclusion --- p.89

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_323129
Date January 2000
ContributorsWong, Wai Ching., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, xii, 94 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.002 seconds