Return to search

A corpus-based induction learning approach to natural language processing.

by Leung Chi Hong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 163-171). / Chapter Chapter 1. --- Introduction --- p.1 / Chapter Chapter 2. --- Background Study of Natural Language Processing --- p.9 / Chapter 2.1. --- Knowledge-based approach --- p.9 / Chapter 2.1.1. --- Morphological analysis --- p.10 / Chapter 2.1.2. --- Syntactic parsing --- p.11 / Chapter 2.1.3. --- Semantic parsing --- p.16 / Chapter 2.1.3.1. --- Semantic grammar --- p.19 / Chapter 2.1.3.2. --- Case grammar --- p.20 / Chapter 2.1.4. --- Problems of knowledge acquisition in knowledge-based approach --- p.22 / Chapter 2.2. --- Corpus-based approach --- p.23 / Chapter 2.2.1. --- Beginning of corpus-based approach --- p.23 / Chapter 2.2.2. --- An example of corpus-based application: word tagging --- p.25 / Chapter 2.2.3. --- Annotated corpus --- p.26 / Chapter 2.2.4. --- State of the art in the corpus-based approach --- p.26 / Chapter 2.3. --- Knowledge-based approach versus corpus-based approach --- p.28 / Chapter 2.4. --- Co-operation between two different approaches --- p.32 / Chapter Chapter 3. --- Induction Learning applied to Corpus-based Approach --- p.35 / Chapter 3.1. --- General model of traditional corpus-based approach --- p.36 / Chapter 3.1.1. --- Division of a problem into a number of sub-problems --- p.36 / Chapter 3.1.2. --- Solution selected from a set of predefined choices --- p.36 / Chapter 3.1.3. --- Solution selection based on a particular kind of linguistic entity --- p.37 / Chapter 3.1.4. --- Statistical correlations between solutions and linguistic entities --- p.37 / Chapter 3.1.5. --- Prediction of the best solution based on statistical correlations --- p.38 / Chapter 3.2. --- First problem in the corpus-based approach: Irrelevance in the corpus --- p.39 / Chapter 3.3. --- Induction learning --- p.41 / Chapter 3.3.1. --- General issues about induction learning --- p.41 / Chapter 3.3.2. --- Reasons of using induction learning in the corpus-based approach --- p.43 / Chapter 3.3.3. --- General model of corpus-based induction learning approach --- p.45 / Chapter 3.3.3.1. --- Preparation of positive corpus and negative corpus --- p.45 / Chapter 3.3.3.2. --- Statistical correlations between solutions and linguistic entities --- p.46 / Chapter 3.3.3.3. --- Combination of the statistical correlations obtained from the positive and negative corpora --- p.48 / Chapter 3.4. --- Second problem in the corpus-based approach: Modification of initial probabilistic approximations --- p.50 / Chapter 3.5. --- Learning feedback modification --- p.52 / Chapter 3.5.1. --- Determination of which correlation scores to be modified --- p.52 / Chapter 3.5.2. --- Determination of the magnitude of modification --- p.53 / Chapter 3.5.3. --- An general algorithm of learning feedback modification --- p.56 / Chapter Chapter 4. --- Identification of Phrases and Templates in Domain-specific Chinese Texts --- p.59 / Chapter 4.1. --- Analysis of the problem solved by the traditional corpus-based approach --- p.61 / Chapter 4.2. --- Phrase identification based on positive and negative corpora --- p.63 / Chapter 4.3. --- Phrase identification procedure --- p.64 / Chapter 4.3.1. --- Step 1: Phrase seed identification --- p.65 / Chapter 4.3.2. --- Step 2: Phrase construction from phrase seeds --- p.65 / Chapter 4.4. --- Template identification procedure --- p.67 / Chapter 4.5. --- Experiment and result --- p.70 / Chapter 4.5.1. --- Testing data --- p.70 / Chapter 4.5.2. --- Details of experiments --- p.71 / Chapter 4.5.3. --- Experimental results --- p.72 / Chapter 4.5.3.1. --- Phrases and templates identified in financial news articles --- p.72 / Chapter 4.5.3.2. --- Phrases and templates identified in political news articles --- p.73 / Chapter 4.6. --- Conclusion --- p.74 / Chapter Chapter 5. --- A Corpus-based Induction Learning Approach to Improving the Accuracy of Chinese Word Segmentation --- p.76 / Chapter 5.1. --- Background of Chinese word segmentation --- p.77 / Chapter 5.2. --- Typical methods of Chinese word segmentation --- p.78 / Chapter 5.2.1. --- Syntactic and semantic approach --- p.78 / Chapter 5.2.2. --- Statistical approach --- p.79 / Chapter 5.2.3. --- Heuristic approach --- p.81 / Chapter 5.3. --- Problems in word segmentation --- p.82 / Chapter 5.3.1. --- Chinese word definition --- p.82 / Chapter 5.3.2. --- Word dictionary --- p.83 / Chapter 5.3.3. --- Word segmentation ambiguity --- p.84 / Chapter 5.4. --- Corpus-based induction learning approach to improving word segmentation accuracy --- p.86 / Chapter 5.4.1. --- Rationale of approach --- p.87 / Chapter 5.4.2. --- Method of constructing modification rules --- p.89 / Chapter 5.5. --- Experiment and results --- p.94 / Chapter 5.6. --- Characteristics of modification rules constructed in experiment --- p.96 / Chapter 5.7. --- Experiment constructing rules for compound words with suffixes --- p.98 / Chapter 5.8. --- Relationship between modification frequency and Zipfs first law --- p.99 / Chapter 5.9. --- Problems in the approach --- p.100 / Chapter 5.10. --- Conclusion --- p.101 / Chapter Chapter 6. --- Corpus-based Induction Learning Approach to Automatic Indexing of Controlled Index Terms --- p.103 / Chapter 6.1. --- Background of automatic indexing --- p.103 / Chapter 6.1.1. --- Definition of index term and indexing --- p.103 / Chapter 6.1.2. --- Manual indexing versus automatic indexing --- p.105 / Chapter 6.1.3. --- Different approaches to automatic indexing --- p.107 / Chapter 6.2. --- Corpus-based induction learning approach to automatic indexing --- p.109 / Chapter 6.2.1. --- Fundamental concept about corpus-based automatic indexing --- p.110 / Chapter 6.2.2. --- Procedure of automatic indexing --- p.111 / Chapter 6.2.2.1. --- Learning process --- p.112 / Chapter 6.2.2.2. --- Indexing process --- p.118 / Chapter 6.3. --- Experiments of corpus-based induction learning approach to automatic indexing --- p.118 / Chapter 6.3.1. --- An experiment evaluating the complete procedures --- p.119 / Chapter 6.3.1.1. --- Testing data used in the experiment --- p.119 / Chapter 6.3.1.2. --- Details of the experiment --- p.119 / Chapter 6.3.1.3. --- Experimental result --- p.121 / Chapter 6.3.2. --- An experiment comparing with the traditional approach --- p.122 / Chapter 6.3.3. --- An experiment determining the optimal indexing score threshold --- p.124 / Chapter 6.3.4. --- An experiment measuring the precision and recall of indexing performance --- p.127 / Chapter 6.4. --- Learning feedback modification --- p.128 / Chapter 6.4.1. --- Positive feedback --- p.129 / Chapter 6.4.2. --- Negative feedback --- p.131 / Chapter 6.4.3. --- Change of indexed proportions of positive/negative training corpus in feedback iterations --- p.132 / Chapter 6.4.4. --- An experiment evaluating the learning feedback modification --- p.134 / Chapter 6.4.5. --- An experiment testing the significance factor in merging process --- p.136 / Chapter 6.5. --- Conclusion --- p.138 / Chapter Chapter 7. --- Conclusion --- p.140 / Appendix A: Some examples of identified phrases in financial news articles --- p.149 / Appendix B: Some examples of identified templates in financial news articles --- p.150 / Appendix C: Some examples of texts containing the templates in financial news articles --- p.151 / Appendix D: Some examples of identified phrases in political news articles --- p.152 / Appendix E: Some examples of identified templates in political news articles --- p.153 / Appendix F: Some examples of texts containing the templates in political news articles --- p.154 / Appendix G: Syntactic tags used in word segmentation modification rule experiment --- p.155 / Appendix H: An example of semantic approach to automatic indexing --- p.156 / Appendix I: An example of syntactic approach to automatic indexing --- p.158 / Appendix J: Samples of INSPEC and MEDLINE Records --- p.161 / Appendix K: Examples of Promoting and Demoting Words --- p.162 / References --- p.163

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_321659
Date January 1996
ContributorsLeung, Chi Hong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
PublisherChinese University of Hong Kong
Source SetsThe Chinese University of Hong Kong
LanguageEnglish
Detected LanguageEnglish
TypeText, bibliography
Formatprint, vii, 171 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0031 seconds