Return to search

Text segmentation and error detection for Chinese spell checking.

Ng Mau Kit Michael. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 117-120). / Abstract and appendix in English and Chinese. / Abstract --- p.i / Acknowledgments --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Background Knowledge and Basic Concepts --- p.7 / Chapter 2.1 --- Classification of Natural Languages --- p.7 / Chapter 2.2 --- Chinese Spell Checking --- p.9 / Chapter 2.3 --- Characteristics of Chinese --- p.12 / Chapter 2.3.1 --- Word Frequency and Statistical Information of Chinese Words --- p.12 / Chapter 2.3.2 --- Chinese Grammar --- p.15 / Chapter 2.3.2.1 --- Word Class --- p.15 / Chapter 2.3.2.2 --- Grammar Rules --- p.17 / Chapter 3 --- Problems with Chinese Spell Checking and Related Work --- p.18 / Chapter 3.1 --- Ambiguities --- p.19 / Chapter 3.2 --- Unknown Words --- p.20 / Chapter 3.3 --- Text Errors --- p.21 / Chapter 3.4 --- Combinatory Explosion --- p.23 / Chapter 3.5 --- Related Work --- p.26 / Chapter 4 --- The Chinese Spell Checking System --- p.33 / Chapter 4.1 --- Architecutre of the Chinese Spell Checking System (CSCS) --- p.35 / Chapter 4.2 --- The Segmenter and the Error Detector --- p.39 / Chapter 5 --- The Block-of-Combinations Segmentation Algorithm and Error Detection --- p.42 / Chapter 5.1 --- Single-character-word Function --- p.43 / Chapter 5.2 --- Segmentation Strategy --- p.46 / Chapter 5.3 --- Maximum Number of Combinations of the BOC --- p.51 / Chapter 5.4 --- A Case Study of the BOC --- p.54 / Chapter 5.5 --- Evaluation of the BOC --- p.59 / Chapter 5.5.1 --- Accuracy --- p.59 / Chapter 5.5.2 --- Speed --- p.61 / Chapter 5.5.3 --- Discussion --- p.62 / Chapter 5.6 --- Experiments on Error Detection for the BOC --- p.63 / Chapter 5.6.1 --- Experimental Results of the Error Detection for the BOC --- p.65 / Chapter 6 --- The Genetic Algorithm Segmentation Method --- p.69 / Chapter 6.1 --- Basic Concepts of Genetic Algorithm --- p.69 / Chapter 6.2 --- Genetic Algorithm Model --- p.73 / Chapter 6.2.1 --- Chromosome Representation --- p.75 / Chapter 6.2.2 --- The Flow of the GAS --- p.76 / Chapter 6.2.2.1 --- Crossover --- p.77 / Chapter 6.2.2.2 --- Replacement --- p.78 / Chapter 6.2.2.3 --- Mutation --- p.80 / Chapter 6.2.2.4 --- Termination Criteria --- p.80 / Chapter 6.2.3 --- Fitness Function --- p.81 / Chapter 6.2.3.1 --- Single-character-word Function --- p.82 / Chapter 6.2.3.2 --- Known-word Function and Unknown-word Function --- p.83 / Chapter 6.2.3.3 --- Grammar Rules Scoring Function --- p.83 / Chapter 6.3 --- Maximum Number of Combinations of the GAS --- p.86 / Chapter 6.4 --- Evaluation of the GAS --- p.86 / Chapter 6.5 --- Discussion --- p.88 / Chapter 7 --- The Improved-BOC Algorithm for Handling Unknown Words and Errors --- p.90 / Chapter 7.1 --- Segmentation Principle of the Improved-BOC Method --- p.91 / Chapter 7.2 --- Improvement of the Scoring Function --- p.93 / Chapter 7.2.1 --- The Choice of Grammar Rules --- p.93 / Chapter 7.2.2 --- Phrase-structure Style --- p.96 / Chapter 7.2.3 --- Computer Model of Grammar Rules for Handling Unknown Words --- p.98 / Chapter 7.3 --- Evaluation of Segmentation --- p.102 / Chapter 7.4 --- Error Detection --- p.104 / Chapter 7.4.1 --- Evaluation of Error Detection --- p.106 / Chapter 7.5 --- Discussion --- p.108 / Chapter 7.6 --- "Comparison between the MM, BOC, GA and Improved-BOC" --- p.109 / Chapter 8 --- Conclusion --- p.114 / Bibliography --- p.117 / Appendix A: Sample Result of the Genetic Algorithm Segmentation Method --- p.121 / Appendix B: Set of Grammar Rules --- p.123

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_322747
Date January 1999
ContributorsNg, Mau Kit Michael., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, x, 127 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.002 seconds