Global ETD Search

41	An automated Chinese text processing system (ACCESS): user-friendly interface and feature enhancement. January 1994 (has links) Suen Tow Sunny. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 65-67). / Introduction --- p.1 / Chapter 1. --- ACCESS with an Extendible User-friendly X/Chinese Interface --- p.4 / Chapter 1.1. --- System requirement --- p.4 / Chapter 1.1.1. --- User interface issue --- p.4 / Chapter 1.1.2. --- Development issue --- p.5 / Chapter 1.2. --- Development decision --- p.6 / Chapter 1.2.1. --- X window system --- p.6 / Chapter 1.2.2. --- X/Chinese toolkit --- p.7 / Chapter 1.2.3. --- C language --- p.8 / Chapter 1.2.4. --- Source code control system --- p.8 / Chapter 1.3. --- System architecture --- p.9 / Chapter 1.4. --- User interface --- p.10 / Chapter 1.5. --- Sample screen --- p.13 / Chapter 1.6. --- System extension --- p.14 / Chapter 1.7. --- System portability --- p.18 / Chapter 2. --- Study on Algorithms for Automatically Correcting Characters in Chinese Cangjie-typed Text --- p.19 / Chapter 2.1. --- Chinese character input --- p.19 / Chapter 2.1.1. --- Chinese keyboards --- p.20 / Chapter 2.1.2. --- Keyboard redefinition scheme --- p.21 / Chapter 2.2. --- Cangjie input method --- p.24 / Chapter 2.3. --- Review on existing techniques for automatically correcting words in English text --- p.26 / Chapter 2.3.1. --- Nonword error detection --- p.27 / Chapter 2.3.2. --- Isolated-word error correction --- p.28 / Chapter 2.3.2.1. --- Spelling error patterns --- p.29 / Chapter 2.3.2.2. --- Correction techniques --- p.31 / Chapter 2.3.3. --- Context-dependent word correction research --- p.32 / Chapter 2.3.3.1. --- Natural language processing approach --- p.33 / Chapter 2.3.3.2. --- Statistical language model --- p.35 / Chapter 2.4. --- Research on error rates and patterns in Cangjie input method --- p.37 / Chapter 2.5. --- Similarities and differences between Chinese and English typed text --- p.41 / Chapter 2.5.1. --- Similarities --- p.41 / Chapter 2.5.2. --- Differences --- p.42 / Chapter 2.6. --- Proposed algorithm for automatic Chinese text correction --- p.44 / Chapter 2.6.1. --- Sentence level --- p.44 / Chapter 2.6.2. --- Part-of-speech level --- p.45 / Chapter 2.6.3. --- Character level --- p.47 / Conclusion --- p.50 / Appendix A Cangjie Radix Table --- p.51 / Appendix B Sample Text --- p.52 / Article 1 --- p.52 / Article 2 --- p.53 / Article 3 --- p.56 / Article 4 --- p.58 / Appendix C Error Statistics --- p.61 / References --- p.65 Chinese language--Data processing Text processing (Computer science) User interfaces (Computer systems) Input design, Computer
42	M&A2: a complete associative word network based Chinese document search engine. January 2001 (has links) Hu Ke. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 56-58). / Abstracts in English and Chinese. Web search engines Chinese language--Data processing Information retrieval Text processing (Computer science) World Wide Web
43	Automatic construction of wrappers for semi-structured documents. January 2001 (has links) Lin Wai-yip. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 114-123). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Information Extraction --- p.1 / Chapter 1.2 --- IE from Semi-structured Documents --- p.3 / Chapter 1.3 --- Thesis Contributions --- p.7 / Chapter 1.4 --- Thesis Organization --- p.9 / Chapter 2 --- Related Work --- p.11 / Chapter 2.1 --- Existing Approaches --- p.11 / Chapter 2.2 --- Limitations of Existing Approaches --- p.18 / Chapter 2.3 --- Our HISER Approach --- p.20 / Chapter 3 --- System Overview --- p.23 / Chapter 3.1 --- Hierarchical record Structure and Extraction Rule learning (HISER) --- p.23 / Chapter 3.2 --- Hierarchical Record Structure --- p.29 / Chapter 3.3 --- Extraction Rule --- p.29 / Chapter 3.4 --- Wrapper Adaptation --- p.32 / Chapter 4 --- Automatic Hierarchical Record Structure Construction --- p.34 / Chapter 4.1 --- Motivation --- p.34 / Chapter 4.2 --- Hierarchical Record Structure Representation --- p.36 / Chapter 4.3 --- Constructing Hierarchical Record Structure --- p.38 / Chapter 5 --- Extraction Rule Induction --- p.43 / Chapter 5.1 --- Rule Representation --- p.43 / Chapter 5.2 --- Extraction Rule Induction Algorithm --- p.47 / Chapter 6 --- Experimental Results of Wrapper Learning --- p.54 / Chapter 6.1 --- Experimental Methodology --- p.54 / Chapter 6.2 --- Results on Electronic Appliance Catalogs --- p.56 / Chapter 6.3 --- Results on Book Catalogs --- p.60 / Chapter 6.4 --- Results on Seminar Announcements --- p.62 / Chapter 7 --- Adapting Wrappers to Unseen Information Sources --- p.69 / Chapter 7.1 --- Motivation --- p.69 / Chapter 7.2 --- Support Vector Machines --- p.72 / Chapter 7.3 --- Feature Selection --- p.76 / Chapter 7.4 --- Automatic Annotation of Training Examples --- p.80 / Chapter 7.4.1 --- Building SVM Models --- p.81 / Chapter 7.4.2 --- Seeking Potential Training Example Candidates --- p.82 / Chapter 7.4.3 --- Classifying Potential Training Examples --- p.84 / Chapter 8 --- Experimental Results of Wrapper Adaptation --- p.86 / Chapter 8.1 --- Experimental Methodology --- p.86 / Chapter 8.2 --- Results on Electronic Appliance Catalogs --- p.89 / Chapter 8.3 --- Results on Book Catalogs --- p.93 / Chapter 9 --- Conclusions and Future Work --- p.97 / Chapter 9.1 --- Conclusions --- p.97 / Chapter 9.2 --- Future Work --- p.100 / Chapter A --- Sample Experimental Pages --- p.101 / Chapter B --- Detailed Experimental Results of Wrapper Adaptation of HISER --- p.109 / Bibliography --- p.114 Text processing (Computer science)
44	Extracting causation knowledge from natural language texts. January 2002 (has links) Chan Ki, Cecia. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 95-99). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Our Contributions --- p.4 / Chapter 1.2 --- Thesis Organization --- p.5 / Chapter 2 --- Related Work --- p.6 / Chapter 2.1 --- Using Knowledge-based Inferences --- p.7 / Chapter 2.2 --- Using Linguistic Techniques --- p.8 / Chapter 2.2.1 --- Using Linguistic Clues --- p.8 / Chapter 2.2.2 --- Using Graphical Patterns --- p.9 / Chapter 2.2.3 --- Using Lexicon-syntactic Patterns of Causative Verbs --- p.10 / Chapter 2.2.4 --- Comparisons with Our Approach --- p.10 / Chapter 2.3 --- Discovery of Extraction Patterns for Extracting Relations --- p.11 / Chapter 2.3.1 --- Snowball system --- p.12 / Chapter 2.3.2 --- DIRT system --- p.12 / Chapter 2.3.3 --- Comparisons with Our Approach --- p.13 / Chapter 3 --- Semantic Expectation-based Knowledge Extraction --- p.14 / Chapter 3.1 --- Semantic Expectations --- p.14 / Chapter 3.2 --- Semantic Template --- p.16 / Chapter 3.2.1 --- Causation Semantic Template --- p.16 / Chapter 3.3 --- Sentence Templates --- p.17 / Chapter 3.4 --- Consequence and Reason Templates --- p.22 / Chapter 3.5 --- Causation Knowledge Extraction Framework --- p.25 / Chapter 3.5.1 --- Template Design --- p.25 / Chapter 3.5.2 --- Sentence Screening --- p.27 / Chapter 3.5.3 --- Semantic Processing --- p.28 / Chapter 4 --- Using Thesaurus and Pattern Discovery for SEKE --- p.33 / Chapter 4.1 --- Using a Thesaurus --- p.34 / Chapter 4.2 --- Pattern Discovery --- p.37 / Chapter 4.2.1 --- Use of Semantic Expectation-based Knowledge Extraction --- p.37 / Chapter 4.2.2 --- Use of Part of Speech Information --- p.39 / Chapter 4.2.3 --- Pattern Representation --- p.39 / Chapter 4.2.4 --- Constructing the Patterns --- p.40 / Chapter 4.2.5 --- Merging the Patterns --- p.43 / Chapter 4.3 --- Pattern Matching --- p.44 / Chapter 4.3.1 --- Matching Score --- p.46 / Chapter 4.3.2 --- Support of Patterns --- p.48 / Chapter 4.3.3 --- Relevancy of Sentence Templates --- p.48 / Chapter 4.4 --- Applying the Newly Discovered Patterns --- p.49 / Chapter 5 --- Applying SEKE on Hong Kong Stock Market Domain --- p.52 / Chapter 5.1 --- Template Design --- p.53 / Chapter 5.1.1 --- Semantic Templates --- p.53 / Chapter 5.1.2 --- Sentence Templates --- p.53 / Chapter 5.1.3 --- Consequence and Reason Templates: --- p.55 / Chapter 5.2 --- Pattern Discovery --- p.58 / Chapter 5.2.1 --- Support of Patterns --- p.58 / Chapter 5.2.2 --- Relevancy of Sentence Templates --- p.58 / Chapter 5.3 --- Causation Knowledge Extraction Result --- p.58 / Chapter 5.3.1 --- Evaluation Approach --- p.61 / Chapter 5.3.2 --- Parameter Investigations --- p.61 / Chapter 5.3.3 --- Experimental Results --- p.65 / Chapter 5.3.4 --- Knowledge Discovered --- p.68 / Chapter 5.3.5 --- Parameter Effect --- p.75 / Chapter 6 --- Applying SEKE on Global Warming Domain --- p.80 / Chapter 6.1 --- Template Design --- p.80 / Chapter 6.1.1 --- Semantic Templates --- p.81 / Chapter 6.1.2 --- Sentence Templates --- p.81 / Chapter 6.1.3 --- Consequence and Reason Templates --- p.83 / Chapter 6.2 --- Pattern Discovery --- p.85 / Chapter 6.2.1 --- Support of Patterns --- p.85 / Chapter 6.2.2 --- Relevancy of Sentence Templates --- p.85 / Chapter 6.3 --- Global Warming Domain Result --- p.85 / Chapter 6.3.1 --- Evaluation Approach --- p.85 / Chapter 6.3.2 --- Experimental Results --- p.88 / Chapter 6.3.3 --- Knowledge Discovered --- p.89 / Chapter 7 --- Conclusions and Future Directions --- p.92 / Chapter 7.1 --- Conclusions --- p.92 / Chapter 7.2 --- Future Directions --- p.93 / Bibliography --- p.95 / Chapter A --- Penn Treebank Part of Speech Tags --- p.100 Text processing (Computer science) Semantics--Data processing Computational linguistics
45	Automatic text categorization for information filtering. January 1998 (has links) Ho Chao Yang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 157-163). / Abstract also in Chinese. / Abstract --- p.i / Acknowledgment --- p.iii / List of Figures --- p.viii / List of Tables --- p.xiv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Document Categorization --- p.1 / Chapter 1.2 --- Information Filtering --- p.3 / Chapter 1.3 --- Contributions --- p.6 / Chapter 1.4 --- Organization of the Thesis --- p.7 / Chapter 2 --- Related Work --- p.9 / Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9 / Chapter 2.1.1 --- Rule-Based Approach --- p.10 / Chapter 2.1.2 --- Similarity-Based Approach --- p.13 / Chapter 2.2 --- Existing Information Filtering Approaches --- p.19 / Chapter 2.2.1 --- Information Filtering Systems --- p.19 / Chapter 2.2.2 --- Filtering in TREC --- p.21 / Chapter 3 --- Document Pre-Processing --- p.23 / Chapter 3.1 --- Document Representation --- p.23 / Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26 / Chapter 4 --- A New Approach - IBRI --- p.31 / Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31 / Chapter 4.2 --- The IBRI Representation and Definitions --- p.34 / Chapter 4.3 --- The IBRI Learning Algorithm --- p.37 / Chapter 5 --- IBRI Experiments --- p.43 / Chapter 5.1 --- Experimental Setup --- p.43 / Chapter 5.2 --- Evaluation Metric --- p.45 / Chapter 5.3 --- Results --- p.46 / Chapter 6 --- A New Approach - GIS --- p.50 / Chapter 6.1 --- Motivation of GIS --- p.50 / Chapter 6.2 --- Similarity-Based Learning --- p.51 / Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58 / Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63 / Chapter 6.5 --- Time Complexity --- p.64 / Chapter 7 --- GIS Experiments --- p.68 / Chapter 7.1 --- Experimental Setup --- p.68 / Chapter 7.2 --- Results --- p.73 / Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87 / Chapter 8.1 --- Information Filtering Systems --- p.87 / Chapter 8.2 --- GIS-Based Information Filtering --- p.90 / Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95 / Chapter 9.1 --- Experimental Setup --- p.95 / Chapter 9.2 --- Results --- p.100 / Chapter 10 --- Conclusions and Future Work --- p.108 / Chapter 10.1 --- Conclusions --- p.108 / Chapter 10.2 --- Future Work --- p.110 / Chapter A --- Sample Documents in the corpora --- p.111 / Chapter B --- Details of Experimental Results of GIS --- p.120 / Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141 Text processing (Computer science) Nearest neighbor analysis (Statistics) Information retrieval
46	Associative information network and applications to an intelligent search engine. / CUHK electronic theses & dissertations collection January 1998 (has links) Qin An. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (p. 135-142). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese. Web search engines Text processing (Computer science) World Wide Web Chinese language--Data processing
47	Probabilistic models for information extraction: from cascaded approach to joint approach. / CUHK electronic theses & dissertations collection January 2010 (has links) Based on these observations and analysis, we propose a joint discriminative probabilistic framework to optimize all relevant subtasks simultaneously. This framework defines a joint probability distribution for both segmentations in sequence data and relations of segments in the form of an exponential family. This model allows tight interactions between segmentations and relations of segments and it offers a natural way for IE tasks. Since exact parameter estimation and inference are prohibitively intractable, a structured variational inference algorithm is developed to perform parameter estimation approximately. For inference, we propose a strong bi-directional MH approach to find the MAP assignments for joint segmentations and relations to explore mutual benefits on both directions, such that segmentations can aid relations, and vice-versa. / Information Extraction (IE) aims at identifying specific pieces of information (data) in a unstructured or semi-structured textual document and transforming unstructured information in a corpus of documents or Web pages into a structured database. There are several representative tasks in IE: named entity recognition (NER), which aims at identifying phrases that denote types of named entities, entity relation extraction, which aims at discovering the events or relations related to the entities, and the task of coreference resolution, aims at determining whether two extracted mentions of entities refer to the same object. IE is useful for a wide variety of applications. / The end-to-end performance of high-level IE systems for compound tasks is often hampered by the use of cascaded frameworks. The integrated model we proposed can alleviate some of these problems, but it is only loosely coupled. Parameter estimation is performed independently and it only allows information to flow in one direction. In this top-down integration model, the decision of the bottom sub-model could guide the decision of the upper sub-model, but not vice-versa. Thus, deep interactions and dependencies between different tasks can hardly be well captured. / We have investigated and developed a cascaded framework in an attempt to consider entity extraction and qualitative domain knowledge based on undirected, discriminatively-trained probabilistic graphical models. This framework consists of two stages and it is the combination of statistical learning and first-order logic. As a pipeline model, the first stage is a base model and the second stage is used to validate and correct the errors made in the base model. We incorporated domain knowledge that can be well formulated into first-order logic to extract entity candidates from the base model. We have applied this framework and achieved encouraging results in Chinese NER on the People's Daily corpus. / We perform extensive experiments on three important IE tasks using real-world datasets, namely Chinese NER, entity identification and relationship extraction from Wikipedia's encyclopedic articles, and citation matching, to test our proposed models, including the bidirectional model, the integrated model, and the joint model. Experimental results show that our models significantly outperform current state-of-the-art probabilistic models, such as decoupled and joint models, illustrating the feasibility and promise of our proposed approaches. (Abstract shortened by UMI.) / We present a general, strongly-coupled, and bidirectional architecture based on discriminatively trained factor graphs for information extraction, which consists of two components---segmentation and relation. First we introduce joint factors connecting variables of relevant subtasks to capture dependencies and interactions between them. We then propose a strong bidirectional Markov chain Monte Carlo (MCMC) sampling inference algorithm which allows information to flow in both directions to find the approximate maximum a posteriori (MAP) solution for all subtasks. Notably, our framework is considerably simpler to implement, and outperforms previous ones. / Yu, Xiaofeng. / Adviser: Zam Wai. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 109-123). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Graphical modeling (Statistics) Names, Chinese Random fields Text processing (Computer science)
48	An empirical study on Chinese text compression: from character-based to word-based approach. January 1997 (has links) by Kwok-Shing Cheng. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 114-120). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Importance of Text Compression --- p.1 / Chapter 1.2 --- Motivation of this Research --- p.2 / Chapter 1.3 --- Characteristics of Chinese --- p.2 / Chapter 1.3.1 --- Huge size of character set --- p.3 / Chapter 1.3.2 --- Lack of word segmentation --- p.3 / Chapter 1.3.3 --- Rich semantics --- p.3 / Chapter 1.4 --- Different Coding Schemes for Chinese --- p.4 / Chapter 1.4.1 --- Big5 Code --- p.4 / Chapter 1.4.2 --- GB (Guo Biao) Code --- p.4 / Chapter 1.4.3 --- HZ (Hanzi) Code --- p.5 / Chapter 1.4.4 --- Unicode Code --- p.5 / Chapter 1.5 --- Modeling and Coding for Chinese Text --- p.6 / Chapter 1.6 --- Static and Adaptive Modeling --- p.6 / Chapter 1.7 --- One-Pass and Two-Pass Modeling --- p.8 / Chapter 1.8 --- Ordering of models --- p.9 / Chapter 1.9 --- Two Sets of Benchmark Files and the Platform --- p.9 / Chapter 1.10 --- Outline of the Thesis --- p.11 / Chapter 2 --- A Survey of Chinese Text Compression --- p.13 / Chapter 2.1 --- Entropy for Chinese Text --- p.14 / Chapter 2.2 --- Weakness of Traditional Compression Algorithms on Chinese Text --- p.15 / Chapter 2.3 --- Statistical Class Algorithms for Compressing Chinese --- p.16 / Chapter 2.3.1 --- Huffman coding scheme --- p.17 / Chapter 2.3.2 --- Arithmetic Coding Scheme --- p.22 / Chapter 2.3.3 --- Restricted Variable Length Coding Scheme --- p.26 / Chapter 2.4 --- Dictionary-based Class Algorithms for Compressing Chinese --- p.27 / Chapter 2.5 --- Experiments and Results --- p.32 / Chapter 2.6 --- Chapter Summary --- p.35 / Chapter 3 --- Indicator Dependent Huffman Coding Scheme --- p.37 / Chapter 3.1 --- Chinese Character Identification Routine --- p.37 / Chapter 3.2 --- Reduction of Header Size --- p.39 / Chapter 3.3 --- Semi-adaptive IDC for Chinese Text --- p.44 / Chapter 3.3.1 --- Theoretical Analysis of Partition Technique for Com- pression --- p.48 / Chapter 3.3.2 --- Experiments and Results of the Semi-adaptive IDC --- p.50 / Chapter 3.4 --- Adaptive IDC for Chinese Text --- p.54 / Chapter 3.4.1 --- Experiments and Results of the Adaptive IDC --- p.57 / Chapter 3.5 --- Chapter Summary --- p.58 / Chapter 4 --- Cascading LZ Algorithms with Huffman Coding Schemes --- p.59 / Chapter 4.1 --- Variations of Huffman Coding Scheme --- p.60 / Chapter 4.1.1 --- Analysis of EPDC and PDC --- p.60 / Chapter 4.1.2 --- "Analysis of PDC, 16Huff and IDC" --- p.65 / Chapter 4.1.3 --- Time and Memory Consumption --- p.71 / Chapter 4.2 --- "Cascading LZSS with PDC, 16Huff and IDC" --- p.73 / Chapter 4.2.1 --- Experimental Results --- p.76 / Chapter 4.3 --- "Cascading LZW with PDC, 16Huff and IDC" --- p.79 / Chapter 4.3.1 --- Experimental Results --- p.82 / Chapter 4.4 --- Chapter Summary --- p.84 / Chapter 5 --- Applying Compression Algorithms to Word-segmented Chi- nese Text --- p.85 / Chapter 5.1 --- Background of word-based compression algorithms --- p.86 / Chapter 5.2 --- Terminology and Benchmark Files for Word Segmentation Model --- p.88 / Chapter 5.3 --- Word Segmentation Model --- p.88 / Chapter 5.4 --- Chinese Entropy from Byte to Word --- p.91 / Chapter 5.5 --- The Generalized Compression and Decompression Model for Word-segmented Chinese text --- p.92 / Chapter 5.6 --- Applying Huffman Coding Scheme to Word-segmented Chinese text --- p.94 / Chapter 5.7 --- Applying WLZSSHUF to Word-segmented Chinese text --- p.97 / Chapter 5.8 --- Applying WLZWHUF to Word-segmented Chinese text --- p.102 / Chapter 5.9 --- Match Ratio and Compression Ratio --- p.105 / Chapter 5.10 --- Chapter Summary --- p.108 / Chapter 6 --- Concluding Remarks --- p.110 / Chapter 6.1 --- Conclusions --- p.110 / Chapter 6.2 --- Contributions --- p.111 / Chapter 6.3 --- Future Directions --- p.112 / Chapter 6.3.1 --- Integrate Decremental Coding Scheme with IDC --- p.112 / Chapter 6.3.2 --- Re-order the Character Sequences in the Sliding Window of LZSS --- p.113 / Chapter 6.3.3 --- Multiple Huffman Trees for Word-based Compression --- p.113 / Bibliography --- p.114 Data compression (Computer Science) Text processing (Computer Science) Chinese language--Data processing Computer algorithms
49	Automatic construction and adaptation of wrappers for semi-structured web documents. January 2003 (has links) Wong Tak Lam. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 88-94). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Wrapper Induction for Semi-structured Web Documents --- p.1 / Chapter 1.2 --- Adapting Wrappers to Unseen Web Sites --- p.6 / Chapter 1.3 --- Thesis Contributions --- p.7 / Chapter 1.4 --- Thesis Organization --- p.8 / Chapter 2 --- Related Work --- p.10 / Chapter 2.1 --- Related Work on Wrapper Induction --- p.10 / Chapter 2.2 --- Related Work on Wrapper Adaptation --- p.16 / Chapter 3 --- Automatic Construction of Hierarchical Wrappers --- p.20 / Chapter 3.1 --- Hierarchical Record Structure Inference --- p.22 / Chapter 3.2 --- Extraction Rule Induction --- p.30 / Chapter 3.3 --- Applying Hierarchical Wrappers --- p.38 / Chapter 4 --- Experimental Results for Wrapper Induction --- p.40 / Chapter 5 --- Adaptation of Wrappers for Unseen Web Sites --- p.52 / Chapter 5.1 --- Problem Definition --- p.52 / Chapter 5.2 --- Overview of Wrapper Adaptation Framework --- p.55 / Chapter 5.3 --- Potential Training Example Candidate Identification --- p.58 / Chapter 5.3.1 --- Useful Text Fragments --- p.58 / Chapter 5.3.2 --- Training Example Generation from the Unseen Web Site --- p.60 / Chapter 5.3.3 --- Modified Nearest Neighbour Classification --- p.63 / Chapter 5.4 --- Machine Annotated Training Example Discovery and New Wrap- per Learning --- p.64 / Chapter 5.4.1 --- Text Fragment Classification --- p.64 / Chapter 5.4.2 --- New Wrapper Learning --- p.69 / Chapter 6 --- Case Study and Experimental Results for Wrapper Adapta- tion --- p.71 / Chapter 6.1 --- Case Study on Wrapper Adaptation --- p.71 / Chapter 6.2 --- Experimental Results --- p.73 / Chapter 6.2.1 --- Book Domain --- p.74 / Chapter 6.2.2 --- Consumer Electronic Appliance Domain --- p.79 / Chapter 7 --- Conclusions and Future Work --- p.83 / Bibliography --- p.88 / Chapter A --- Detailed Performance of Wrapper Induction for Book Do- main --- p.95 / Chapter B --- Detailed Performance of Wrapper Induction for Consumer Electronic Appliance Domain --- p.99 Text processing (Computer science) World Wide Web
50	Generating documents by means of computational registers Oldham, Joseph Dowell. January 2000 (has links) (PDF) Thesis (Ph. D.)--University of Kentucky, 2000. / Title from document title page. Document formatted into pages; contains ix, 169 p. : ill. Includes abstract. Includes bibliographical references (p. 160-167).

Search results