Kwok Chi Leong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 122-127). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgment --- p.vi / Table of Contents --- p.vii / List of Tables --- p.x / List of Figures --- p.xi / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Contributions --- p.2 / Chapter 1.2 --- Thesis Overview --- p.3 / Chapter Chapter 2 --- Background Information --- p.5 / Chapter 2.1 --- Documents Clustering --- p.5 / Chapter 2.1.1 --- Review of Clustering Techniques --- p.5 / Chapter 2.1.2 --- Suffix Tree Clustering --- p.7 / Chapter 2.2 --- Chinese Information Processing --- p.8 / Chapter 2.2.1 --- Sentence Segmentation --- p.8 / Chapter 2.2.2 --- Keyword Extraction --- p.10 / Chapter Chapter 3 --- The Generic Chinese PAT Tree --- p.12 / Chapter 3.1 --- PAT Tree --- p.13 / Chapter 3.1.1 --- Patricia Tree --- p.13 / Chapter 3.1.2 --- Semi-Infinite String --- p.14 / Chapter 3.1.3 --- Structure of Tree Nodes --- p.17 / Chapter 3.1.4 --- Some Examples of PAT Tree --- p.22 / Chapter 3.1.5 --- Storage Complexity --- p.24 / Chapter 3.2 --- The Chinese PAT Tree --- p.26 / Chapter 3.2.1 --- The Chinese PAT Tree Structure --- p.26 / Chapter 3.2.2 --- Some Examples of Chinese PAT Tree --- p.30 / Chapter 3.2.3 --- Storage Complexity --- p.33 / Chapter 3.3 --- The Generic Chinese PAT Tree --- p.34 / Chapter 3.3.1 --- Structure Overview --- p.34 / Chapter 3.3.2 --- Structure of Tree Nodes --- p.35 / Chapter 3.3.3 --- Essential Node --- p.37 / Chapter 3.3.4 --- Some Examples of the Generic Chinese PAT Tree --- p.41 / Chapter 3.3.5 --- Storage Complexity --- p.45 / Chapter 3.4 --- Problems of Embedded Nodes --- p.46 / Chapter 3.4.1 --- The Reduced Structure --- p.47 / Chapter 3.4.2 --- Disadvantages of Reduced Structure --- p.48 / Chapter 3.4.3 --- A Case Study of Reduced Design --- p.50 / Chapter 3.4.4 --- Experiments on Frequency Mismatch --- p.51 / Chapter 3.5 --- Strengths of the Generic Chinese PAT Tree --- p.55 / Chapter Chapter 4 --- Performance Analysis on the Generic Chinese PAT Tree --- p.58 / Chapter 4.1 --- The Construction of the Generic Chinese PAT Tree --- p.59 / Chapter 4.2 --- Counting the Essential Nodes --- p.61 / Chapter 4.3 --- Performance of Various PAT Trees --- p.62 / Chapter 4.4 --- The Implementation Analysis --- p.64 / Chapter 4.4.1 --- Pure Dynamic Memory Allocation --- p.64 / Chapter 4.4.2 --- Node Production Factory Approach --- p.66 / Chapter 4.4.3 --- Experiment Result of the Factory Approach --- p.68 / Chapter Chapter 5 --- The Chinese Documents Clustering --- p.70 / Chapter 5.1 --- The Clustering Framework --- p.70 / Chapter 5.1.1 --- Documents Cleaning --- p.73 / Chapter 5.1.2 --- PAT Tree Construction --- p.76 / Chapter 5.1.3 --- Essential Node Extraction --- p.77 / Chapter 5.1.4 --- Base Clusters Detection --- p.80 / Chapter 5.1.5 --- Base Clusters Filtering --- p.86 / Chapter 5.1.6 --- Base Clusters Combining --- p.94 / Chapter 5.1.7 --- Documents Assigning --- p.95 / Chapter 5.1.8 --- Result Presentation --- p.96 / Chapter 5.2 --- Discussion --- p.96 / Chapter 5.2.1 --- Flexibility of Our Framework --- p.96 / Chapter 5.2.2 --- Our Clustering Model --- p.97 / Chapter 5.2.3 --- More About Clusters Detection --- p.98 / Chapter 5.2.4 --- Analysis and Complexity --- p.100 / Chapter Chapter 6 --- Evaluations on the Chinese Documents Clustering --- p.101 / Chapter 6.1 --- Details of Experiment --- p.101 / Chapter 6.1.1 --- Parameter of Weighted Frequency --- p.105 / Chapter 6.1.2 --- Effect of CLP Analysis --- p.105 / Chapter 6.1.3 --- Result of Clustering --- p.108 / Chapter 6.2 --- Clustering on Larger Collection --- p.109 / Chapter 6.2.1 --- Comparing the Base Clusters --- p.109 / Chapter 6.2.2 --- Result of Clustering --- p.111 / Chapter 6.2.3 --- Discussion --- p.112 / Chapter 6.3 --- Clustering with Part of Documents --- p.113 / Chapter 6.3.1 --- Clustering with News Headlines --- p.114 / Chapter 6.3.2 --- Clustering with News Abstract --- p.117 / Chapter Chapter 7 --- Conclusion --- p.119 / Bibliography --- p.122
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_323902 |
Date | January 2002 |
Contributors | Kwok, Chi Leong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography |
Format | print, xii, 127 leaves : ill. ; 30 cm. |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0013 seconds