Return to search

A generic Chinese PAT tree data structure for Chinese documents clustering.

Kwok Chi Leong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 122-127). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgment --- p.vi / Table of Contents --- p.vii / List of Tables --- p.x / List of Figures --- p.xi / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Contributions --- p.2 / Chapter 1.2 --- Thesis Overview --- p.3 / Chapter Chapter 2 --- Background Information --- p.5 / Chapter 2.1 --- Documents Clustering --- p.5 / Chapter 2.1.1 --- Review of Clustering Techniques --- p.5 / Chapter 2.1.2 --- Suffix Tree Clustering --- p.7 / Chapter 2.2 --- Chinese Information Processing --- p.8 / Chapter 2.2.1 --- Sentence Segmentation --- p.8 / Chapter 2.2.2 --- Keyword Extraction --- p.10 / Chapter Chapter 3 --- The Generic Chinese PAT Tree --- p.12 / Chapter 3.1 --- PAT Tree --- p.13 / Chapter 3.1.1 --- Patricia Tree --- p.13 / Chapter 3.1.2 --- Semi-Infinite String --- p.14 / Chapter 3.1.3 --- Structure of Tree Nodes --- p.17 / Chapter 3.1.4 --- Some Examples of PAT Tree --- p.22 / Chapter 3.1.5 --- Storage Complexity --- p.24 / Chapter 3.2 --- The Chinese PAT Tree --- p.26 / Chapter 3.2.1 --- The Chinese PAT Tree Structure --- p.26 / Chapter 3.2.2 --- Some Examples of Chinese PAT Tree --- p.30 / Chapter 3.2.3 --- Storage Complexity --- p.33 / Chapter 3.3 --- The Generic Chinese PAT Tree --- p.34 / Chapter 3.3.1 --- Structure Overview --- p.34 / Chapter 3.3.2 --- Structure of Tree Nodes --- p.35 / Chapter 3.3.3 --- Essential Node --- p.37 / Chapter 3.3.4 --- Some Examples of the Generic Chinese PAT Tree --- p.41 / Chapter 3.3.5 --- Storage Complexity --- p.45 / Chapter 3.4 --- Problems of Embedded Nodes --- p.46 / Chapter 3.4.1 --- The Reduced Structure --- p.47 / Chapter 3.4.2 --- Disadvantages of Reduced Structure --- p.48 / Chapter 3.4.3 --- A Case Study of Reduced Design --- p.50 / Chapter 3.4.4 --- Experiments on Frequency Mismatch --- p.51 / Chapter 3.5 --- Strengths of the Generic Chinese PAT Tree --- p.55 / Chapter Chapter 4 --- Performance Analysis on the Generic Chinese PAT Tree --- p.58 / Chapter 4.1 --- The Construction of the Generic Chinese PAT Tree --- p.59 / Chapter 4.2 --- Counting the Essential Nodes --- p.61 / Chapter 4.3 --- Performance of Various PAT Trees --- p.62 / Chapter 4.4 --- The Implementation Analysis --- p.64 / Chapter 4.4.1 --- Pure Dynamic Memory Allocation --- p.64 / Chapter 4.4.2 --- Node Production Factory Approach --- p.66 / Chapter 4.4.3 --- Experiment Result of the Factory Approach --- p.68 / Chapter Chapter 5 --- The Chinese Documents Clustering --- p.70 / Chapter 5.1 --- The Clustering Framework --- p.70 / Chapter 5.1.1 --- Documents Cleaning --- p.73 / Chapter 5.1.2 --- PAT Tree Construction --- p.76 / Chapter 5.1.3 --- Essential Node Extraction --- p.77 / Chapter 5.1.4 --- Base Clusters Detection --- p.80 / Chapter 5.1.5 --- Base Clusters Filtering --- p.86 / Chapter 5.1.6 --- Base Clusters Combining --- p.94 / Chapter 5.1.7 --- Documents Assigning --- p.95 / Chapter 5.1.8 --- Result Presentation --- p.96 / Chapter 5.2 --- Discussion --- p.96 / Chapter 5.2.1 --- Flexibility of Our Framework --- p.96 / Chapter 5.2.2 --- Our Clustering Model --- p.97 / Chapter 5.2.3 --- More About Clusters Detection --- p.98 / Chapter 5.2.4 --- Analysis and Complexity --- p.100 / Chapter Chapter 6 --- Evaluations on the Chinese Documents Clustering --- p.101 / Chapter 6.1 --- Details of Experiment --- p.101 / Chapter 6.1.1 --- Parameter of Weighted Frequency --- p.105 / Chapter 6.1.2 --- Effect of CLP Analysis --- p.105 / Chapter 6.1.3 --- Result of Clustering --- p.108 / Chapter 6.2 --- Clustering on Larger Collection --- p.109 / Chapter 6.2.1 --- Comparing the Base Clusters --- p.109 / Chapter 6.2.2 --- Result of Clustering --- p.111 / Chapter 6.2.3 --- Discussion --- p.112 / Chapter 6.3 --- Clustering with Part of Documents --- p.113 / Chapter 6.3.1 --- Clustering with News Headlines --- p.114 / Chapter 6.3.2 --- Clustering with News Abstract --- p.117 / Chapter Chapter 7 --- Conclusion --- p.119 / Bibliography --- p.122

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_323902
Date January 2002
ContributorsKwok, Chi Leong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, xii, 127 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0013 seconds