Spelling suggestions: "subject:"chinese language - data processing."" "subject:"chinese language - mata processing.""
11 |
Text compression for Chinese documents.January 1995 (has links)
by Chi-kwun Kan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1995. / Includes bibliographical references (leaves 133-137). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Importance of Text Compression --- p.1 / Chapter 1.2 --- Historical Background of Data Compression --- p.2 / Chapter 1.3 --- The Essences of Data Compression --- p.4 / Chapter 1.4 --- Motivation and Objectives of the Project --- p.5 / Chapter 1.5 --- Definition of Important Terms --- p.6 / Chapter 1.5.1 --- Data Models --- p.6 / Chapter 1.5.2 --- Entropy --- p.10 / Chapter 1.5.3 --- Statistical and Dictionary-based Compression --- p.12 / Chapter 1.5.4 --- Static and Adaptive Modelling --- p.12 / Chapter 1.5.5 --- One-Pass and Two-Pass Modelling --- p.13 / Chapter 1.6 --- Benchmarks and Measurements of Results --- p.15 / Chapter 1.7 --- Sources of Testing Data --- p.16 / Chapter 1.8 --- Outline of the Thesis --- p.16 / Chapter 2 --- Literature Survey --- p.18 / Chapter 2.1 --- Data compression Algorithms --- p.18 / Chapter 2.1.1 --- Statistical Compression Methods --- p.18 / Chapter 2.1.2 --- Dictionary-based Compression Methods (Ziv-Lempel Fam- ily) --- p.23 / Chapter 2.2 --- Cascading of Algorithms --- p.33 / Chapter 2.3 --- Problems of Current Compression Programs on Chinese --- p.34 / Chapter 2.4 --- Previous Chinese Data Compression Literatures --- p.37 / Chapter 3 --- Chinese-related Issues --- p.38 / Chapter 3.1 --- Characteristics in Chinese Data Compression --- p.38 / Chapter 3.1.1 --- Large and Not Fixed Size Character Set --- p.38 / Chapter 3.1.2 --- Lack of Word Segmentation --- p.40 / Chapter 3.1.3 --- Rich Semantic Meaning of Chinese Characters --- p.40 / Chapter 3.1.4 --- Grammatical Variance of Chinese Language --- p.41 / Chapter 3.2 --- Definition of Different Coding Schemes --- p.41 / Chapter 3.2.1 --- Big5 Code --- p.42 / Chapter 3.2.2 --- GB (Guo Biao) Code --- p.43 / Chapter 3.2.3 --- Unicode --- p.44 / Chapter 3.2.4 --- HZ (Hanzi) Code --- p.45 / Chapter 3.3 --- Entropy of Chinese and Other Languages --- p.45 / Chapter 4 --- Huffman Coding on Chinese Text --- p.49 / Chapter 4.1 --- The use of the Chinese Character Identification Routine --- p.50 / Chapter 4.2 --- Result --- p.51 / Chapter 4.3 --- Justification of the Result --- p.53 / Chapter 4.4 --- Time and Memory Resources Analysis --- p.58 / Chapter 4.5 --- The Heuristic Order-n Huffman Coding for Chinese Text Com- pression --- p.61 / Chapter 4.5.1 --- The Algorithm --- p.62 / Chapter 4.5.2 --- Result --- p.63 / Chapter 4.5.3 --- Justification of the Result --- p.64 / Chapter 4.6 --- Chapter Conclusion --- p.66 / Chapter 5 --- The Ziv-Lempel Compression on Chinese Text --- p.67 / Chapter 5.1 --- The Chinese LZSS Compression --- p.68 / Chapter 5.1.1 --- The Algorithm --- p.69 / Chapter 5.1.2 --- Result --- p.73 / Chapter 5.1.3 --- Justification of the Result --- p.74 / Chapter 5.1.4 --- Time and Memory Resources Analysis --- p.80 / Chapter 5.1.5 --- Effects in Controlling the Parameters --- p.81 / Chapter 5.2 --- The Chinese LZW Compression --- p.92 / Chapter 5.2.1 --- The Algorithm --- p.92 / Chapter 5.2.2 --- Result --- p.94 / Chapter 5.2.3 --- Justification of the Result --- p.95 / Chapter 5.2.4 --- Time and Memory Resources Analysis --- p.97 / Chapter 5.2.5 --- Effects in Controlling the Parameters --- p.98 / Chapter 5.3 --- A Comparison of the performance of the LZSS and the LZW --- p.100 / Chapter 5.4 --- Chapter Conclusion --- p.101 / Chapter 6 --- Chinese Dictionary-based Huffman coding --- p.103 / Chapter 6.1 --- The Algorithm --- p.104 / Chapter 6.2 --- Result --- p.107 / Chapter 6.3 --- Justification of the Result --- p.108 / Chapter 6.4 --- Effects of Changing the Size of the Dictionary --- p.111 / Chapter 6.5 --- Chapter Conclusion --- p.114 / Chapter 7 --- Cascading of Huffman coding and LZW compression --- p.116 / Chapter 7.1 --- Static Cascading Model --- p.117 / Chapter 7.1.1 --- The Algorithm --- p.117 / Chapter 7.1.2 --- Result --- p.120 / Chapter 7.1.3 --- Explanation and Analysis of the Result --- p.121 / Chapter 7.2 --- Adaptive (Dynamic) Cascading Model --- p.125 / Chapter 7.2.1 --- The Algorithm --- p.125 / Chapter 7.2.2 --- Result --- p.126 / Chapter 7.2.3 --- Explanation and Analysis of the Result --- p.127 / Chapter 7.3 --- Chapter Conclusion --- p.128 / Chapter 8 --- Concluding Remarks --- p.129 / Chapter 8.1 --- Conclusion --- p.129 / Chapter 8.2 --- Future Work Direction --- p.130 / Chapter 8.2.1 --- Improvement in Efficiency and Resources Consumption --- p.130 / Chapter 8.2.2 --- The Compressibility of Chinese and Other Languages --- p.131 / Chapter 8.2.3 --- Use of Grammar Model --- p.131 / Chapter 8.2.4 --- Lossy Compression --- p.131 / Chapter 8.3 --- Epilogue --- p.132 / Bibliography --- p.133
|
12 |
A comprehensive Chinese thesaurus system.January 1995 (has links)
by Chen Hong Yi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1995. / Includes bibliographical references (leaves 62-65). / Abstract --- p.ii / Acknowledgement --- p.iv / List of Tables --- p.viii / List of Figures --- p.ix / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Background Information And Thesis Scope --- p.6 / Chapter 2.1 --- Basic Concepts and Terminologies --- p.6 / Chapter 2.1.1 --- Semantic Classification Of A Word --- p.6 / Chapter 2.1.2 --- Relationship Link And Relationship Type --- p.7 / Chapter 2.1.3 --- "Semantic Closeness, Link Weight And Semantic Distance" --- p.8 / Chapter 2.1.4 --- Thesaurus Model And Semantic Net --- p.9 / Chapter 2.1.5 --- Thesaurus Building And Maintaining Tool --- p.9 / Chapter 2.2 --- Chinese Information Processing --- p.9 / Chapter 2.2.1 --- The Segmentation of Chinese Words --- p.10 / Chapter 2.2.2 --- The Ambiguity of Chinese Words --- p.10 / Chapter 2.2.3 --- Multiple Chinese Character Code Set Standards --- p.11 / Chapter 2.3 --- Related Work --- p.11 / Chapter 2.4 --- Thesis Scope --- p.13 / Chapter 3 --- System Design Principles --- p.15 / Chapter 3.1 --- Application Context Of TheSys --- p.15 / Chapter 3.2 --- Overall System Architecture --- p.16 / Chapter 3.3 --- Entry-Term Construct And Thesaurus Frame --- p.19 / Chapter 3.3.1 --- "Words, Entry Terms And Entry Term Construct" --- p.21 / Chapter 3.3.2 --- "Semanteme, Relationship And Thesaurus Frame" --- p.23 / Chapter 3.3.3 --- Dealing With Term Ambiguity --- p.28 / Chapter 3.4 --- Weighting Scheme --- p.33 / Chapter 3.4.1 --- Assumption --- p.33 / Chapter 3.4.2 --- Quantify The Relevancy Between Two Directly Linked Concepts --- p.34 / Chapter 3.4.3 --- Quantify The Relevancy Between Two Indirectly Linked Concepts --- p.35 / Chapter 3.5 --- Term Ranking --- p.38 / Chapter 3.6 --- Thesaurus Module and Maintenance Module --- p.39 / Chapter 3.6.1 --- The Procedure Of Building A Thesaurus --- p.40 / Chapter 3.6.2 --- Thesaurus Nomination --- p.41 / Chapter 3.6.3 --- Semantic Classification Tree Construction --- p.41 / Chapter 3.6.4 --- Relation Type Definition --- p.42 / Chapter 3.6.5 --- Entry Term Construct Construction --- p.42 / Chapter 3.6.6 --- Thesaurus Frame Construction --- p.43 / Chapter 3.6.7 --- Thesaurus Query --- p.44 / Chapter 4 --- System Implementation --- p.45 / Chapter 4.1 --- Data Structure --- p.45 / Chapter 4.1.1 --- Entry Term Construct --- p.45 / Chapter 4.1.2 --- Thesaurus Frame --- p.49 / Chapter 4.2 --- API --- p.50 / Chapter 4.3 --- User Interface --- p.54 / Chapter 4.3.1 --- Widget And Its Callback --- p.54 / Chapter 4.3.2 --- Bilingual User Interface --- p.55 / Chapter 4.3.3 --- Chinese Character Input Method --- p.57 / Chapter 5 --- Conclusion And Future Work --- p.60 / Chapter A --- System Installation --- p.66 / Chapter A.1 --- Files In TheSys --- p.67 / Chapter A.2 --- Employ TheSys As Application Package --- p.70 / Chapter A.3 --- Set Up TheSys With UI --- p.71 / Chapter A.4 --- Verify The Word Using External Dictionary --- p.74 / Chapter B --- API Description --- p.77 / Chapter B.1 --- thesys.h File --- p.77 / Chapter B.2 --- API Reference --- p.82 / Chapter C --- User Interface Reference --- p.108
|
13 |
Hybrid tag-set for natural language processing.January 1999 (has links)
Leung Wai Kwong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 90-95). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Objective --- p.3 / Chapter 1.3 --- Organization of thesis --- p.3 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Chinese Noun Phrases Parsing --- p.5 / Chapter 2.2 --- Chinese Noun Phrases --- p.6 / Chapter 2.3 --- Problems with Syntactic Parsing --- p.11 / Chapter 2.3.1 --- Conjunctive Noun Phrases --- p.11 / Chapter 2.3.2 --- De-de Noun Phrases --- p.12 / Chapter 2.3.3 --- Compound Noun Phrases --- p.13 / Chapter 2.4 --- Observations --- p.15 / Chapter 2.4.1 --- Inadequacy in Part-of-Speech Categorization for Chi- nese NLP --- p.16 / Chapter 2.4.2 --- The Need of Semantic in Noun Phrase Parsing --- p.17 / Chapter 2.5 --- Summary --- p.17 / Chapter 3 --- Hybrid Tag-set --- p.19 / Chapter 3.1 --- Objectives --- p.19 / Chapter 3.1.1 --- Resolving Parsing Ambiguities --- p.19 / Chapter 3.1.2 --- Investigation of Nominal Compound Noun Phrases --- p.20 / Chapter 3.2 --- Definition of Hybrid Tag-set --- p.20 / Chapter 3.3 --- Introduction to Cilin --- p.21 / Chapter 3.4 --- Problems with Cilin --- p.23 / Chapter 3.4.1 --- Unknown words --- p.23 / Chapter 3.4.2 --- Multiple Semantic Classes --- p.25 / Chapter 3.5 --- Introduction to Chinese Word Formation --- p.26 / Chapter 3.5.1 --- Disyllabic Word Formation --- p.26 / Chapter 3.5.2 --- Polysyllabic Word Formation --- p.28 / Chapter 3.5.3 --- Observation --- p.29 / Chapter 3.6 --- Automatic Assignment of Hybrid Tag to Chinese Word --- p.31 / Chapter 3.7 --- Summary --- p.34 / Chapter 4 --- Automatic Semantic Assignment --- p.35 / Chapter 4.1 --- Previous Researches on Semantic Tagging --- p.36 / Chapter 4.2 --- SAUW - Automatic Semantic Assignment of Unknown Words --- p.37 / Chapter 4.2.1 --- POS-to-SC Association (Process 1) --- p.38 / Chapter 4.2.2 --- Morphology-based Deduction (Process 2) --- p.39 / Chapter 4.2.3 --- Di-syllabic Word Analysis (Process 3 and 4) --- p.41 / Chapter 4.2.4 --- Poly-syllabic Word Analysis (Process 5) --- p.47 / Chapter 4.3 --- Illustrative Examples --- p.47 / Chapter 4.4 --- Evaluation and Analysis --- p.49 / Chapter 4.4.1 --- Experiments --- p.49 / Chapter 4.4.2 --- Error Analysis --- p.51 / Chapter 4.5 --- Summary --- p.52 / Chapter 5 --- Word Sense Disambiguation --- p.53 / Chapter 5.1 --- Introduction to Word Sense Disambiguation --- p.54 / Chapter 5.2 --- Previous Works on Word Sense Disambiguation --- p.55 / Chapter 5.2.1 --- Linguistic-based Approaches --- p.56 / Chapter 5.2.2 --- Corpus-based Approaches --- p.58 / Chapter 5.3 --- Our Approach --- p.60 / Chapter 5.3.1 --- Bi-gram Co-occurrence Probabilities --- p.62 / Chapter 5.3.2 --- Tri-gram Co-occurrence Probabilities --- p.63 / Chapter 5.3.3 --- Design consideration --- p.65 / Chapter 5.3.4 --- Error Analysis --- p.67 / Chapter 5.4 --- Summary --- p.68 / Chapter 6 --- Hybrid Tag-set for Chinese Noun Phrase Parsing --- p.69 / Chapter 6.1 --- Resolving Ambiguous Noun Phrases --- p.70 / Chapter 6.1.1 --- Experiment --- p.70 / Chapter 6.1.2 --- Results --- p.72 / Chapter 6.2 --- Summary --- p.78 / Chapter 7 --- Conclusion --- p.80 / Chapter 7.1 --- Summary --- p.80 / Chapter 7.2 --- Difficulties Encountered --- p.83 / Chapter 7.2.1 --- Lack of Training Corpus --- p.83 / Chapter 7.2.2 --- Features of Chinese word formation --- p.84 / Chapter 7.2.3 --- Problems with linguistic sources --- p.85 / Chapter 7.3 --- Contributions --- p.86 / Chapter 7.3.1 --- Enrichment to the Cilin --- p.86 / Chapter 7.3.2 --- Enhancement in syntactic parsing --- p.87 / Chapter 7.4 --- Further Researches --- p.88 / Chapter 7.4.1 --- Investigation into words that undergo semantic changes --- p.88 / Chapter 7.4.2 --- Incorporation of more information into the hybrid tag-set --- p.89 / Chapter A --- POS Tag-set by Tsinghua University (清華大學) --- p.96 / Chapter B --- Morphological Rules --- p.100 / Chapter C --- Syntactic Rules for Di-syllabic Words Formation --- p.104
|
14 |
Linguistic constraints for large vocabulary speech recognition.January 1999 (has links)
by Roger H.Y. Leung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 79-84). / Abstracts in English and Chinese. / ABSTRACT --- p.I / Keywords: --- p.I / ACKNOWLEDGEMENTS --- p.III / TABLE OF CONTENTS: --- p.IV / Table of Figures: --- p.VI / Table of Tables: --- p.VII / Chapter CHAPTER 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- Languages in the World --- p.2 / Chapter 1.2 --- Problems of Chinese Speech Recognition --- p.3 / Chapter 1.2.1 --- Unlimited word size: --- p.3 / Chapter 1.2.2 --- Too many Homophones: --- p.3 / Chapter 1.2.3 --- Difference between spoken and written Chinese: --- p.3 / Chapter 1.2.4 --- Word Segmentation Problem: --- p.4 / Chapter 1.3 --- Different types of knowledge --- p.5 / Chapter 1.4 --- Chapter Conclusion --- p.6 / Chapter CHAPTER 2 --- FOUNDATIONS --- p.7 / Chapter 2.1 --- Chinese Phonology and Language Properties --- p.7 / Chapter 2.1.1 --- Basic Syllable Structure --- p.7 / Chapter 2.2 --- Acoustic Models --- p.9 / Chapter 2.2.1 --- Acoustic Unit --- p.9 / Chapter 2.2.2 --- Hidden Markov Model (HMM) --- p.9 / Chapter 2.3 --- Search Algorithm --- p.11 / Chapter 2.4 --- Statistical Language Models --- p.12 / Chapter 2.4.1 --- Context-Independent Language Model --- p.12 / Chapter 2.4.2 --- Word-Pair Language Model --- p.13 / Chapter 2.4.3 --- N-gram Language Model --- p.13 / Chapter 2.4.4 --- Backoff n-gram --- p.14 / Chapter 2.5 --- Smoothing for Language Model --- p.16 / Chapter CHAPTER 3 --- LEXICAL ACCESS --- p.18 / Chapter 3.1 --- Introduction --- p.18 / Chapter 3.2 --- Motivation: Phonological and lexical constraints --- p.20 / Chapter 3.3 --- Broad Classes Representation --- p.22 / Chapter 3.4 --- Broad Classes Statistic Measures --- p.25 / Chapter 3.5 --- Broad Classes Frequency Normalization --- p.26 / Chapter 3.6 --- Broad Classes Analysis --- p.27 / Chapter 3.7 --- Isolated Word Speech Recognizer using Broad Classes --- p.33 / Chapter 3.8 --- Chapter Conclusion --- p.34 / Chapter CHAPTER 4 --- CHARACTER AND WORD LANGUAGE MODEL --- p.35 / Chapter 4.1 --- Introduction --- p.35 / Chapter 4.2 --- Motivation --- p.36 / Chapter 4.2.1 --- Perplexity --- p.36 / Chapter 4.3 --- Call Home Mandarin corpus --- p.38 / Chapter 4.3.1 --- Acoustic Data --- p.38 / Chapter 4.3.2 --- Transcription Texts --- p.39 / Chapter 4.4 --- Methodology: Building Language Model --- p.41 / Chapter 4.5 --- Character Level Language Model --- p.45 / Chapter 4.6 --- Word Level Language Model --- p.48 / Chapter 4.7 --- Comparison of Character level and Word level Language Model --- p.50 / Chapter 4.8 --- Interpolated Language Model --- p.54 / Chapter 4.8.1 --- Methodology --- p.54 / Chapter 4.8.2 --- Experiment Results --- p.55 / Chapter 4.9 --- Chapter Conclusion --- p.56 / Chapter CHAPTER 5 --- N-GRAM SMOOTHING --- p.57 / Chapter 5.1 --- Introduction --- p.57 / Chapter 5.2 --- Motivation --- p.58 / Chapter 5.3 --- Mathematical Representation --- p.59 / Chapter 5.4 --- Methodology: Smoothing techniques --- p.61 / Chapter 5.4.1 --- Add-one Smoothing --- p.62 / Chapter 5.4.2 --- Witten-Bell Discounting --- p.64 / Chapter 5.4.3 --- Good Turing Discounting --- p.66 / Chapter 5.4.4 --- Absolute and Linear Discounting --- p.68 / Chapter 5.5 --- Comparison of Different Discount Methods --- p.70 / Chapter 5.6 --- Continuous Word Speech Recognizer --- p.71 / Chapter 5.6.1 --- Experiment Setup --- p.71 / Chapter 5.6.2 --- Experiment Results: --- p.72 / Chapter 5.7 --- Chapter Conclusion --- p.74 / Chapter CHAPTER 6 --- SUMMARY AND CONCLUSIONS --- p.75 / Chapter 6.1 --- Summary --- p.75 / Chapter 6.2 --- Further Work --- p.77 / Chapter 6.3 --- Conclusion --- p.78 / REFERENCE --- p.79
|
15 |
Domain-optimized Chinese speech generation.January 2001 (has links)
Fung Tien Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 119-128). / Abstracts in English and Chinese. / Abstract --- p.1 / Acknowledgement --- p.1 / List of Figures --- p.7 / List of Tables --- p.11 / Chapter 1 --- Introduction --- p.14 / Chapter 1.1 --- General Trends on Speech Generation --- p.15 / Chapter 1.2 --- Domain-Optimized Speech Generation in Chinese --- p.16 / Chapter 1.3 --- Thesis Organization --- p.17 / Chapter 2 --- Background --- p.19 / Chapter 2.1 --- Linguistic and Phonological Properties of Chinese --- p.19 / Chapter 2.1.1 --- Articulation --- p.20 / Chapter 2.1.2 --- Tones --- p.21 / Chapter 2.2 --- Previous Development in Speech Generation --- p.22 / Chapter 2.2.1 --- Articulatory Synthesis --- p.23 / Chapter 2.2.2 --- Formant Synthesis --- p.24 / Chapter 2.2.3 --- Concatenative Synthesis --- p.25 / Chapter 2.2.4 --- Existing Systems --- p.31 / Chapter 2.3 --- Our Speech Generation Approach --- p.35 / Chapter 3 --- Corpus-based Syllable Concatenation: A Feasibility Test --- p.37 / Chapter 3.1 --- Capturing Syllable Coarticulation with Distinctive Features --- p.39 / Chapter 3.2 --- Creating a Domain-Optimized Wavebank --- p.41 / Chapter 3.2.1 --- Generate-and-Filter --- p.44 / Chapter 3.2.2 --- Waveform Segmentation --- p.47 / Chapter 3.3 --- The Use of Multi-Syllable Units --- p.49 / Chapter 3.4 --- Unit Selection for Concatenative Speech Output --- p.50 / Chapter 3.5 --- A Listening Test --- p.51 / Chapter 3.6 --- Chapter Summary --- p.52 / Chapter 4 --- Scalability and Portability to the Stocks Domain --- p.55 / Chapter 4.1 --- Complexity of the ISIS Responses --- p.56 / Chapter 4.2 --- XML for input semantic and grammar representation --- p.60 / Chapter 4.3 --- Tree-Based Filtering Algorithm --- p.63 / Chapter 4.4 --- Energy Normalization --- p.67 / Chapter 4.5 --- Chapter Summary --- p.69 / Chapter 5 --- Investigation in Tonal Contexts --- p.71 / Chapter 5.1 --- The Nature of Tones --- p.74 / Chapter 5.1.1 --- Human Perception of Tones --- p.75 / Chapter 5.2 --- Relative Importance of Left and Right Tonal Context --- p.77 / Chapter 5.2.1 --- Tonal Contexts in the Date-Time Subgrammar --- p.77 / Chapter 5.2.2 --- Tonal Contexts in the Numeric Subgrammar --- p.82 / Chapter 5.2.3 --- Conclusion regarding the Relative Importance of Left versus Right Tonal Contexts --- p.86 / Chapter 5.3 --- Selection Scheme for Tonal Variants --- p.86 / Chapter 5.3.1 --- Listening Test for our Tone Backoff Scheme --- p.90 / Chapter 5.3.2 --- Error Analysis --- p.92 / Chapter 5.4 --- Chapter Summary --- p.94 / Chapter 6 --- Summary and Future Work --- p.95 / Chapter 6.1 --- Contributions --- p.97 / Chapter 6.2 --- Future Directions --- p.98 / Chapter A --- Listening Test Questionnaire for FOREX Response Genera- tion --- p.100 / Chapter B --- Major Response Types For ISIS --- p.102 / Chapter C --- Recording Corpus for Tone Investigation in Date-time Sub- grammar --- p.105 / Chapter D --- Statistical Test for Left Tonal Context --- p.109 / Chapter E --- Statistical Test for Right Tonal Context --- p.112 / Chapter F --- Listening Test Questionnaire for Backoff Unit Selection Scheme --- p.115 / Chapter G --- Statistical Test for the Backoff Unit Selection Scheme --- p.117 / Chapter H --- Statistical Test for the Backoff Unit Selection Scheme --- p.118 / Bibliography --- p.119
|
16 |
An investigation on Chinese noun phrase extraction.January 2000 (has links)
Chan Kun-Chung Timothy. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 79-83). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Outline of Thesis --- p.3 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Chinese Noun Phrase Structure --- p.5 / Chapter 2.2 --- Literature Review --- p.6 / Chapter 2.3 --- Observations --- p.10 / Chapter 2.4 --- Chapter Summary --- p.11 / Chapter 3 --- Maximal Chinese Noun Phrase Extraction System --- p.13 / Chapter 3.1 --- Background --- p.13 / Chapter 3.1.1 --- Part-of-speech Tagset --- p.13 / Chapter 3.1.2 --- The Tagging System --- p.14 / Chapter 3.1.3 --- Chinese Corpus --- p.16 / Chapter 3.1.4 --- Grammar Rules and Boundary Information --- p.17 / Chapter 3.1.5 --- Feature Selection --- p.19 / Chapter 3.2 --- Overview of Our Chinese Noun Phrase Extraction System --- p.19 / Chapter 3.2.1 --- Training --- p.19 / Chapter 3.2.2 --- Testing --- p.21 / Chapter 3.3 --- Chapter Summary --- p.21 / Chapter 4 --- Preliminary Noun Phrase Extraction --- p.23 / Chapter 4.1 --- Framework --- p.23 / Chapter 4.2 --- Boundary Information Acquisition --- p.24 / Chapter 4.3 --- Candidate Boundary Insertion --- p.26 / Chapter 4.4 --- Pairing of Candidate Boundaries --- p.27 / Chapter 4.4.1 --- Conditional Probability-based Model --- p.28 / Chapter 4.4.2 --- Heuristic-based Model --- p.29 / Chapter 4.4.3 --- Dynamic Programming-based Model --- p.30 / Chapter 4.4.4 --- Model Selection --- p.31 / Chapter 4.4.5 --- Revised Dynamic Programming Model --- p.32 / Chapter 4.4.6 --- Analysis of the Impact of the Revised DP Model --- p.35 / Chapter 4.4.7 --- Experiments of Dynamic Programming-based Model --- p.38 / Chapter 4.4.8 --- Result Analysis --- p.42 / Chapter 4.5 --- Concluding Remarks on DP-Based Model --- p.47 / Chapter 4.6 --- Chapter Summary --- p.49 / Chapter 5 --- Automatic Error Correction --- p.50 / Chapter 5.1 --- Introduction --- p.50 / Chapter 5.1.1 --- Statistical Properties of TEL --- p.54 / Chapter 5.1.2 --- Related Applications --- p.55 / Chapter 5.2 --- Settings of Main Components --- p.57 / Chapter 5.2.1 --- Initial State --- p.58 / Chapter 5.2.2 --- Transformation Actions --- p.58 / Chapter 5.2.3 --- Triggering Features of Transformation Templates --- p.58 / Chapter 5.2.4 --- Evaluation of Rule --- p.62 / Chapter 5.2.5 --- Stopping Threshold --- p.62 / Chapter 5.3 --- Experiments and Results --- p.63 / Chapter 5.3.1 --- Setup and Procedure --- p.63 / Chapter 5.3.2 --- Overall Performance --- p.63 / Chapter 5.3.3 --- Contribution of Rules --- p.67 / Chapter 5.3.4 --- Remarks on Rules Learning --- p.69 / Chapter 5.3.5 --- Discussion on Recall Performance --- p.70 / Chapter 5.4 --- Chapter Summary --- p.73 / Chapter 6 --- Conclusion --- p.74 / Chapter 6.1 --- Summary --- p.74 / Chapter 6.2 --- Contributions --- p.76 / Chapter 6.3 --- Future Work --- p.76 / Bibliography --- p.79 / Chapter A --- Chinese POS Tag Set --- p.84 / Chapter B --- Algorithms of Boundary Pairing Models --- p.88 / Chapter B.1 --- Heuristic based Model --- p.88 / Chapter B.2 --- Dynamic Programming based Model --- p.89 / Chapter C --- Triggering Environments of Transformation Templates --- p.91
|
17 |
A generic Chinese PAT tree data structure for Chinese documents clustering.January 2002 (has links)
Kwok Chi Leong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 122-127). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgment --- p.vi / Table of Contents --- p.vii / List of Tables --- p.x / List of Figures --- p.xi / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Contributions --- p.2 / Chapter 1.2 --- Thesis Overview --- p.3 / Chapter Chapter 2 --- Background Information --- p.5 / Chapter 2.1 --- Documents Clustering --- p.5 / Chapter 2.1.1 --- Review of Clustering Techniques --- p.5 / Chapter 2.1.2 --- Suffix Tree Clustering --- p.7 / Chapter 2.2 --- Chinese Information Processing --- p.8 / Chapter 2.2.1 --- Sentence Segmentation --- p.8 / Chapter 2.2.2 --- Keyword Extraction --- p.10 / Chapter Chapter 3 --- The Generic Chinese PAT Tree --- p.12 / Chapter 3.1 --- PAT Tree --- p.13 / Chapter 3.1.1 --- Patricia Tree --- p.13 / Chapter 3.1.2 --- Semi-Infinite String --- p.14 / Chapter 3.1.3 --- Structure of Tree Nodes --- p.17 / Chapter 3.1.4 --- Some Examples of PAT Tree --- p.22 / Chapter 3.1.5 --- Storage Complexity --- p.24 / Chapter 3.2 --- The Chinese PAT Tree --- p.26 / Chapter 3.2.1 --- The Chinese PAT Tree Structure --- p.26 / Chapter 3.2.2 --- Some Examples of Chinese PAT Tree --- p.30 / Chapter 3.2.3 --- Storage Complexity --- p.33 / Chapter 3.3 --- The Generic Chinese PAT Tree --- p.34 / Chapter 3.3.1 --- Structure Overview --- p.34 / Chapter 3.3.2 --- Structure of Tree Nodes --- p.35 / Chapter 3.3.3 --- Essential Node --- p.37 / Chapter 3.3.4 --- Some Examples of the Generic Chinese PAT Tree --- p.41 / Chapter 3.3.5 --- Storage Complexity --- p.45 / Chapter 3.4 --- Problems of Embedded Nodes --- p.46 / Chapter 3.4.1 --- The Reduced Structure --- p.47 / Chapter 3.4.2 --- Disadvantages of Reduced Structure --- p.48 / Chapter 3.4.3 --- A Case Study of Reduced Design --- p.50 / Chapter 3.4.4 --- Experiments on Frequency Mismatch --- p.51 / Chapter 3.5 --- Strengths of the Generic Chinese PAT Tree --- p.55 / Chapter Chapter 4 --- Performance Analysis on the Generic Chinese PAT Tree --- p.58 / Chapter 4.1 --- The Construction of the Generic Chinese PAT Tree --- p.59 / Chapter 4.2 --- Counting the Essential Nodes --- p.61 / Chapter 4.3 --- Performance of Various PAT Trees --- p.62 / Chapter 4.4 --- The Implementation Analysis --- p.64 / Chapter 4.4.1 --- Pure Dynamic Memory Allocation --- p.64 / Chapter 4.4.2 --- Node Production Factory Approach --- p.66 / Chapter 4.4.3 --- Experiment Result of the Factory Approach --- p.68 / Chapter Chapter 5 --- The Chinese Documents Clustering --- p.70 / Chapter 5.1 --- The Clustering Framework --- p.70 / Chapter 5.1.1 --- Documents Cleaning --- p.73 / Chapter 5.1.2 --- PAT Tree Construction --- p.76 / Chapter 5.1.3 --- Essential Node Extraction --- p.77 / Chapter 5.1.4 --- Base Clusters Detection --- p.80 / Chapter 5.1.5 --- Base Clusters Filtering --- p.86 / Chapter 5.1.6 --- Base Clusters Combining --- p.94 / Chapter 5.1.7 --- Documents Assigning --- p.95 / Chapter 5.1.8 --- Result Presentation --- p.96 / Chapter 5.2 --- Discussion --- p.96 / Chapter 5.2.1 --- Flexibility of Our Framework --- p.96 / Chapter 5.2.2 --- Our Clustering Model --- p.97 / Chapter 5.2.3 --- More About Clusters Detection --- p.98 / Chapter 5.2.4 --- Analysis and Complexity --- p.100 / Chapter Chapter 6 --- Evaluations on the Chinese Documents Clustering --- p.101 / Chapter 6.1 --- Details of Experiment --- p.101 / Chapter 6.1.1 --- Parameter of Weighted Frequency --- p.105 / Chapter 6.1.2 --- Effect of CLP Analysis --- p.105 / Chapter 6.1.3 --- Result of Clustering --- p.108 / Chapter 6.2 --- Clustering on Larger Collection --- p.109 / Chapter 6.2.1 --- Comparing the Base Clusters --- p.109 / Chapter 6.2.2 --- Result of Clustering --- p.111 / Chapter 6.2.3 --- Discussion --- p.112 / Chapter 6.3 --- Clustering with Part of Documents --- p.113 / Chapter 6.3.1 --- Clustering with News Headlines --- p.114 / Chapter 6.3.2 --- Clustering with News Abstract --- p.117 / Chapter Chapter 7 --- Conclusion --- p.119 / Bibliography --- p.122
|
18 |
Automatic noun phrase extraction from full Chinese text. / CUHK electronic theses & dissertations collectionJanuary 1997 (has links)
by Li Wenjie. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (p. 209-226). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web.
|
19 |
A natural language based indexing technique for Chinese information retrieval.January 1997 (has links)
Pang Chun Kiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 101-107). / Chapter 1 --- Introduction --- p.2 / Chapter 1.1 --- Chinese Indexing using Noun Phrases --- p.6 / Chapter 1.2 --- Objectives --- p.8 / Chapter 1.3 --- An Overview of the Thesis --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Technology Influences on Information Retrieval --- p.10 / Chapter 2.2 --- Related Work --- p.13 / Chapter 2.2.1 --- Statistical/Keyword Approaches --- p.13 / Chapter 2.2.2 --- Syntactical approaches --- p.15 / Chapter 2.2.3 --- Semantic approaches --- p.17 / Chapter 2.2.4 --- Noun Phrases Approach --- p.18 / Chapter 2.2.5 --- Chinese Information Retrieval --- p.20 / Chapter 2.3 --- Our Approach --- p.21 / Chapter 3 --- Chinese Noun Phrases --- p.23 / Chapter 3.1 --- Different types of Chinese Noun Phrases --- p.23 / Chapter 3.2 --- Ambiguous noun phrases --- p.27 / Chapter 3.2.1 --- Ambiguous English Noun Phrases --- p.27 / Chapter 3.2.2 --- Ambiguous Chinese Noun Phrases --- p.28 / Chapter 3.2.3 --- Statistical data on the three NPs --- p.33 / Chapter 4 --- Index Extraction from De-de Conj. NP --- p.35 / Chapter 4.1 --- Word Segmentation --- p.36 / Chapter 4.2 --- Part-of-speech tagging --- p.37 / Chapter 4.3 --- Noun Phrase Extraction --- p.37 / Chapter 4.4 --- The Chinese noun phrase partial parser --- p.38 / Chapter 4.5 --- Handling Parsing Ambiguity --- p.40 / Chapter 4.6 --- Index Building Strategy --- p.41 / Chapter 4.7 --- The cross-set generation rules --- p.44 / Chapter 4.8 --- Example 1: Indexing De-de NP --- p.46 / Chapter 4.9 --- Example 2: Indexing Conjunctive NP --- p.48 / Chapter 4.10 --- Experimental results and Discussion --- p.49 / Chapter 5 --- Indexing Compound Nouns --- p.52 / Chapter 5.1 --- Previous Researches on Compound Nouns --- p.53 / Chapter 5.2 --- Indexing two-term Compound Nouns --- p.55 / Chapter 5.2.1 --- About the thesaurus《同義詞詞林》 --- p.56 / Chapter 5.3 --- Indexing Compound Nouns of three or more terms --- p.58 / Chapter 5.4 --- Corpus learning approach --- p.59 / Chapter 5.4.1 --- An Example --- p.60 / Chapter 5.4.2 --- Experimental Setup --- p.63 / Chapter 5.4.3 --- An Experiment using the third level of the Cilin --- p.65 / Chapter 5.4.4 --- An Experiment using the second level of the Cilin --- p.66 / Chapter 5.5 --- Contextual Approach --- p.68 / Chapter 5.5.1 --- The algorithm --- p.69 / Chapter 5.5.2 --- An Illustrative Example --- p.71 / Chapter 5.5.3 --- Experiments on compound nouns --- p.72 / Chapter 5.5.4 --- Experiment I: Word Distance Based Extraction --- p.73 / Chapter 5.5.5 --- Experiment II: Semantic Class Based Extraction --- p.75 / Chapter 5.5.6 --- Experiments III: On different boundaries --- p.76 / Chapter 5.5.7 --- The Final Algorithm --- p.79 / Chapter 5.5.8 --- Experiments on other compounds --- p.82 / Chapter 5.5.9 --- Discussion --- p.83 / Chapter 6 --- Overall Effectiveness --- p.85 / Chapter 6.1 --- Illustrative Example for the Integrated Algorithm --- p.86 / Chapter 6.2 --- Experimental Setup --- p.90 / Chapter 6.3 --- Experimental Results & Discussion --- p.91 / Chapter 7 --- Conclusion --- p.95 / Chapter 7.1 --- Summary --- p.95 / Chapter 7.2 --- Contributions --- p.97 / Chapter 7.3 --- Future Directions --- p.98 / Chapter 7.3.1 --- Word-sense determination --- p.98 / Chapter 7.3.2 --- Hybrid approach for compound noun indexing --- p.99 / Chapter A --- Cross-set Generation Rules --- p.108 / Chapter B --- Tag set by Tsinghua University --- p.110 / Chapter C --- Noun Phrases Test Set --- p.113 / Chapter D --- Compound Nouns Test Set --- p.124 / Chapter D.l --- Three-term Compound Nouns --- p.125 / Chapter D.1.1 --- NVN --- p.125 / Chapter D.1.2 --- Other three-term compound nouns --- p.129 / Chapter D.2 --- Four-term Compound Nouns --- p.133 / Chapter D.3 --- Five-term and six-term Compound Nouns --- p.134
|
20 |
Chinese information access through internet on X-open system.January 1997 (has links)
by Yao Jian. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 109-112). / Abstract --- p.i / Acknowledgments --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Basic Concepts And Related Work --- p.6 / Chapter 2.1 --- Codeset and Codeset Conversion --- p.7 / Chapter 2.2 --- HTML Language --- p.10 / Chapter 2.3 --- HTTP Protocol --- p.13 / Chapter 2.4 --- I18N And LION --- p.18 / Chapter 2.5 --- Proxy Server --- p.19 / Chapter 2.6 --- Related Work --- p.20 / Chapter 3 --- Design Principles And System Architecture --- p.23 / Chapter 3.1 --- Use of Existing Web System --- p.23 / Chapter 3.1.1 --- Protocol --- p.23 / Chapter 3.1.2 --- Avoid Duplication of Documents for Different Codesets --- p.25 / Chapter 3.1.3 --- Support On-line Codeset Conversion Facility --- p.27 / Chapter 3.1.4 --- Provide Internationalized Interface of Web Browser --- p.28 / Chapter 3.2 --- Our Approach --- p.29 / Chapter 3.2.1 --- Enhancing the Existing Browsers and Servers --- p.30 / Chapter 3.2.2 --- Incorporating Proxies in Our Scheme --- p.32 / Chapter 3.2.3 --- Automatic Codeset Conversion --- p.34 / Chapter 3.3 --- Overall System Architecture --- p.38 / Chapter 3.3.1 --- Architecture of Our Web System --- p.38 / Chapter 3.3.2 --- Flexibility of Our Design --- p.40 / Chapter 3.3.3 --- Which side do the codeset conversion? --- p.42 / Chapter 3.3.4 --- Caching --- p.42 / Chapter 4 --- Design Details of An Enhanced Server --- p.44 / Chapter 4.1 --- Architecture of The Enhanced Server --- p.44 / Chapter 4.2 --- Procedure on Processing Client's Request --- p.45 / Chapter 4.3 --- Modifications of The Enhanced Server --- p.48 / Chapter 4.3.1 --- Interpretation of Client's Codeset Announcement --- p.48 / Chapter 4.3.2 --- Codeset Identification of Web Documents on the Server --- p.49 / Chapter 4.3.3 --- Codeset Notification to the Web Client --- p.52 / Chapter 4.3.4 --- Codeset Conversion --- p.54 / Chapter 4.4 --- Experiment Results --- p.54 / Chapter 5 --- Design Details of An Enhanced Browser --- p.58 / Chapter 5.1 --- Architecture of The Enhanced Browser --- p.58 / Chapter 5.2 --- Procedure on Processing Users' Requests --- p.61 / Chapter 5.3 --- Event Management and Handling --- p.63 / Chapter 5.3.1 --- Basic Control Flow of the Browser --- p.63 / Chapter 5.3.2 --- Event Handlers --- p.64 / Chapter 5.4 --- Internationalization of Browser Interface --- p.75 / Chapter 5.4.1 --- Locale --- p.76 / Chapter 5.4.2 --- Resource File --- p.77 / Chapter 5.4.3 --- Message Catalog System --- p.79 / Chapter 5.5 --- Experiment Result --- p.85 / Chapter 6 --- Another Scheme - CGI --- p.89 / Chapter 6.1 --- Form and CGI --- p.90 / Chapter 6.2 --- CGI Control Flow --- p.96 / Chapter 6.3 --- Automatic Codeset Detection --- p.96 / Chapter 6.3.1 --- Analysis of code range for GB and Big5 --- p.98 / Chapter 6.3.2 --- Control Flow of Automatic Codeset Detection --- p.99 / Chapter 6.4 --- Experiment Results --- p.101 / Chapter 7 --- Conclusions and Future Work --- p.104 / Chapter 7.1 --- Current Status --- p.105 / Chapter 7.2 --- System Efficiency --- p.106 / Chapter 7.3 --- Future Work --- p.107 / Bibliography --- p.109 / Chapter A --- Programmer's Guide --- p.113 / Chapter A.1 --- Data Structure --- p.113 / Chapter A.2 --- Calling Sequence of Functions --- p.114 / Chapter A.3 --- Modification of Souce Code --- p.116 / Chapter A.4 --- Modification of Resources --- p.133 / Chapter B --- User Manual --- p.135
|
Page generated in 0.1158 seconds