Return to search

An effective Chinese indexing method based on partitioned signature files.

Wong Chi Yin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 107-114). / Abstract also in Chinese. / Abstract --- p.ii / Acknowledgements --- p.vi / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Introduction to Chinese IR --- p.1 / Chapter 1.2 --- Contributions --- p.3 / Chapter 1.3 --- Organization of this Thesis --- p.5 / Chapter 2 --- Background --- p.6 / Chapter 2.1 --- Indexing methods --- p.6 / Chapter 2.1.1 --- Full-text scanning --- p.7 / Chapter 2.1.2 --- Inverted files --- p.7 / Chapter 2.1.3 --- Signature files --- p.9 / Chapter 2.1.4 --- Clustering --- p.10 / Chapter 2.2 --- Information Retrieval Models --- p.10 / Chapter 2.2.1 --- Boolean model --- p.11 / Chapter 2.2.2 --- Vector space model --- p.11 / Chapter 2.2.3 --- Probabilistic model --- p.13 / Chapter 2.2.4 --- Logical model --- p.14 / Chapter 3 --- Investigation of Segmentation on the Vector Space Retrieval Model --- p.15 / Chapter 3.1 --- Segmentation of Chinese Texts --- p.16 / Chapter 3.1.1 --- Character-based segmentation --- p.16 / Chapter 3.1.2 --- Word-based segmentation --- p.18 / Chapter 3.1.3 --- N-Gram segmentation --- p.21 / Chapter 3.2 --- Performance Evaluation of Three Segmentation Approaches --- p.23 / Chapter 3.2.1 --- Experimental Setup --- p.23 / Chapter 3.2.2 --- Experimental Results --- p.24 / Chapter 3.2.3 --- Discussion --- p.29 / Chapter 4 --- Signature File Background --- p.32 / Chapter 4.1 --- Superimposed coding --- p.34 / Chapter 4.2 --- False drop probability --- p.36 / Chapter 5 --- Partitioned Signature File Based On Chinese Word Length --- p.39 / Chapter 5.1 --- Fixed Weight Block (FWB) Signature File --- p.41 / Chapter 5.2 --- Overview of PSFC --- p.45 / Chapter 5.3 --- Design Considerations --- p.50 / Chapter 6 --- New Hashing Techniques for Partitioned Signature Files --- p.59 / Chapter 6.1 --- Direct Division Method --- p.61 / Chapter 6.2 --- Random Number Assisted Division Method --- p.62 / Chapter 6.3 --- Frequency-based hashing method --- p.64 / Chapter 6.4 --- Chinese character-based hashing method --- p.68 / Chapter 7 --- Experiments and Results --- p.72 / Chapter 7.1 --- Performance evaluation of partitioned signature file based on Chi- nese word length --- p.74 / Chapter 7.1.1 --- Retrieval Performance --- p.75 / Chapter 7.1.2 --- Signature Reduction Ratio --- p.77 / Chapter 7.1.3 --- Storage Requirement --- p.79 / Chapter 7.1.4 --- Discussion --- p.81 / Chapter 7.2 --- Performance evaluation of different dynamic signature generation methods --- p.82 / Chapter 7.2.1 --- Collision --- p.84 / Chapter 7.2.2 --- Retrieval Performance --- p.86 / Chapter 7.2.3 --- Discussion --- p.89 / Chapter 8 --- Conclusions and Future Work --- p.91 / Chapter 8.1 --- Conclusions --- p.91 / Chapter 8.2 --- Future work --- p.95 / Chapter A --- Notations of Signature Files --- p.96 / Chapter B --- False Drop Probability --- p.98 / Chapter C --- Experimental Results --- p.103 / Bibliography --- p.107

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_322281
Date January 1998
ContributorsWong, Chi Yin., Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, x, 114 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0032 seconds