Return to search

Chinese readability analysis and its applications on the internet.

Lau Tak Pang. / Thesis submitted in: October 2006. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 110-122). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation and Major Contributions --- p.1 / Chapter 1.1.1 --- Chinese Readability Analysis --- p.1 / Chapter 1.1.2 --- Web Readability Analysis --- p.3 / Chapter 1.2 --- Thesis Chapter Organization --- p.6 / Chapter 2 --- Related Work --- p.7 / Chapter 2.1 --- Readability Assessment --- p.7 / Chapter 2.1.1 --- Assessment for Text Document --- p.8 / Chapter 2.1.2 --- Assessment for Web Page --- p.13 / Chapter 2.2 --- Support Vector Machine --- p.14 / Chapter 2.2.1 --- Characteristics and Advantages --- p.14 / Chapter 2.2.2 --- Applications --- p.16 / Chapter 2.3 --- Chinese Word Segmentation --- p.16 / Chapter 2.3.1 --- Difficulty in Chinese Word Segmentation --- p.16 / Chapter 2.3.2 --- Approaches for Chinese Word Segmentation --- p.17 / Chapter 3 --- Chinese Readability Analysis --- p.20 / Chapter 3.1 --- Chinese Readability Factor Analysis --- p.20 / Chapter 3.1.1 --- Systematic Analysis --- p.20 / Chapter 3.1.2 --- Feature Extraction --- p.30 / Chapter 3.1.3 --- Limitation of Our Analysis and Possible Extension --- p.32 / Chapter 3.2 --- Research Methodology --- p.33 / Chapter 3.2.1 --- Definition of Readability --- p.33 / Chapter 3.2.2 --- Data Acquisition and Sampling --- p.34 / Chapter 3.2.3 --- Text Processing and Feature Extraction . --- p.35 / Chapter 3.2.4 --- Regression Analysis using Support Vector Regression --- p.36 / Chapter 3.2.5 --- Evaluation --- p.36 / Chapter 3.3 --- Introduction to Support Vector Regression --- p.38 / Chapter 3.3.1 --- Basic Concept --- p.38 / Chapter 3.3.2 --- Non-Linear Extension using Kernel Technique --- p.41 / Chapter 3.4 --- Implementation Details --- p.42 / Chapter 3.4.1 --- Chinese Word Segmentation --- p.42 / Chapter 3.4.2 --- Building Basic Chinese Character / Word Lists --- p.47 / Chapter 3.4.3 --- Pull Sentence Detection --- p.49 / Chapter 3.4.4 --- Feature Selection Using Genetic Algorithm --- p.50 / Chapter 3.5 --- Experiments --- p.55 / Chapter 3.5.1 --- Experiment 1: Evaluation on Chinese Word Segmentation using the LMR-RC Tagging Scheme --- p.56 / Chapter 3.5.2 --- Experiment 2: Initial SVR Parameters Searching with Different Kernel Functions --- p.61 / Chapter 3.5.3 --- Experiment 3: Feature Selection Using Genetic Algorithm --- p.63 / Chapter 3.5.4 --- Experiment 4: Training and Cross-validation Performance using the Selected Feature Subset --- p.67 / Chapter 3.5.5 --- Experiment 5: Comparison with Linear Regression --- p.74 / Chapter 3.6 --- Summary and Future Work --- p.76 / Chapter 4 --- Web Readability Analysis --- p.78 / Chapter 4.1 --- Web Page Readability --- p.79 / Chapter 4.1.1 --- Readability as Comprehension Difficulty . --- p.79 / Chapter 4.1.2 --- Readability as Grade Level --- p.81 / Chapter 4.2 --- Web Site Readability --- p.83 / Chapter 4.3 --- Experiments --- p.85 / Chapter 4.3.1 --- Experiment 1: Web Page Readability Analysis -Comprehension Difficulty --- p.87 / Chapter 4.3.2 --- Experiment 2: Web Page Readability Analysis -Grade Level --- p.92 / Chapter 4.3.3 --- Experiment 3: Web Site Readability Analysis --- p.98 / Chapter 4.4 --- Summary and Future Work --- p.101 / Chapter 5 --- Conclusion --- p.104 / Chapter A --- List of Symbols and Notations --- p.107 / Chapter B --- List of Publications --- p.110 / Bibliography --- p.113

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_325858
Date January 2007
ContributorsLau, Tak Pang., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, xiii, 122 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0266 seconds