Global ETD Search

1	Improvements in the style of computer generated natural language text / Juell, Paul Lincoln January 1981 (has links) No description available. Computer Science English language--Data processing
2	On-line recognition of English and numerical characters. January 1992 (has links) by Cheung Wai-Hung Wellis. / Thesis (M.Sc.)--Chinese University of Hong Kong, 1992. / Includes bibliographical references (leaves 52-54). / ACKNOWLEDGEMENTS / ABSTRACT / Chapter 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- CLASSIFICATION OF CHARACTER RECOGNITION --- p.1 / Chapter 1.2 --- HISTORICAL DEVELOPMENT --- p.3 / Chapter 1.3 --- RECOGNITION METHODOLOGY --- p.4 / Chapter 2 --- ORGANIZATION OF THIS REPORT --- p.7 / Chapter 3 --- DATA SAMPLING --- p.8 / Chapter 3.1 --- GENERAL CONSIDERATION --- p.8 / Chapter 3.2 --- IMPLEMENTATION --- p.9 / Chapter 4 --- PREPROCESSING --- p.10 / Chapter 4.1 --- GENERAL CONSIDERATION --- p.10 / Chapter 4.2 --- IMPLEMENTATION --- p.12 / Chapter 4.2.1 --- Stroke connection --- p.12 / Chapter 4.2.2 --- Rotation --- p.12 / Chapter 4.2.3 --- Scaling --- p.14 / Chapter 4.2.4 --- De-skewing --- p.15 / Chapter 5 --- STROKE SEGMENTATION --- p.17 / Chapter 5.1 --- CONSIDERATION --- p.17 / Chapter 5.2 --- IMPLEMENTATION --- p.20 / Chapter 6 --- LEARNING --- p.26 / Chapter 7 --- PROTOTYPE MANAGEMENT --- p.27 / Chapter 8 --- RECOGNITION --- p.29 / Chapter 8.1 --- CONSIDERATION --- p.29 / Chapter 8.1.1 --- Delayed Stroke Tagging --- p.29 / Chapter 8.1.2 --- Bi-gram --- p.29 / Chapter 8.1.3 --- Character Scoring --- p.30 / Chapter 8.1.4 --- Ligature Handling --- p.32 / Chapter 8.1.5 --- Word Scoring --- p.32 / Chapter 8.2 --- IMPLEMENTATION --- p.33 / Chapter 8.2.1 --- Simple Matching --- p.33 / Chapter 8.2.2 --- Best First Search Matching --- p.33 / Chapter 8.2.3 --- Multiple Track Method --- p.35 / Chapter 8.3 --- SYSTEM PERFORMANCE TUNING --- p.37 / Chapter 9 --- POST-PROCESSING --- p.38 / Chapter 9.1 --- PROBABILITY MODEL --- p.38 / Chapter 9.2 --- WORD DICTIONARY APPROACH --- p.39 / Chapter 10 --- SYSTEM IMPLEMENTATION AND PERFORMANCE --- p.41 / Chapter 11 --- DISCUSSION --- p.43 / Chapter 12 --- EPILOG --- p.47 / Chapter APPENDIX I - --- PROBLEMS ENCOUNTERED AND SUGGESTED ENHANCEMENTS ON THE SYSTEM --- p.48 / Chapter APPENDIX II - --- GLOSSARIES --- p.51 / REFERENCES --- p.52 Pattern perception Pattern recognition systems English language--Data processing
3	Development of a Cantonese-English code-mixing speech recognition system. / CUHK electronic theses & dissertations collection January 2011 (has links) A data-driven computational approach is adopted to reveal significant pronunciation variations in Cantonese-English code-mixing speech. The findings are successfully applied to constructing a more relevant bilingual pronunciation dictionary and for selecting effective training materials for code-mixing ASR. For acoustic modeling, it is shown that cross-lingual acoustic models are more appropriate than language-dependent models. Various cross-lingual inventories are derived based on different combination schemes and similarity measurements. We have shown that the proposed data-driven approach based on K-L divergence and phonetic confusion matrix outperforms the IPA-based approach using merely phonetic knowledge. It is also found that initials and finals are more appropriate to be used as the basic Cantonese units than phonemes in code-mixing speech recognition applications. A text database with more than 9 million characters is compiled for language modeling of code-mixing ASR. Classbased language models with automatic clustering classes have been proven inefficient for code-mixing speech recognition. A semantics-based n-gram mapping approach is proposed to increase the counts of code-mixing n-gram at language boundaries. The language model perplexity and recognition performance has been significantly improved with the proposed semantics-based language models. The proposed code-mixing speech recognition system achieves 75.0% overall accuracy for Cantonese-English code-mixing speech, while the accuracy for Cantonese characters is 76.1% and accuracy for English lexicons is 65.5%. It also attains a reasonable character accuracy of 75.3% for monolingual Cantonese speech. / Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two languages in a spoken utterance. This thesis addresses the problem of the automatic recognition of Cantonese-English code-mixing speech, which is widely used in Hong Kong. / Cross-lingual speaker adaptation has also been investigated in the thesis. Speaker independent (SI) model mapping between Cantonese and English is established at different levels of acoustic units, viz phones, states, and Gaussian mixture components. A novel approach for cross-lingual speaker adaptation via Gaussian component mapping is proposed and has been proved to be effective in most speech recognition tasks. / This study starts with the investigation of the linguistic properties of Cantonese-English code-mixing, which is based on a large number of real code-mixing text corpora collected from the internet and other sources. The effects of language mixing for the automatic recognition of Cantonese-English codemixing utterances are analyzed in a systematic way. The problem of pronunciation dictionary, acoustic modeling and language modeling are investigated. Subsequently, a large-vocabulary code-mixing speech recognition system is developed and implemented. / While automatic speech recognition (ASR) of either Cantonese or English alone has achieved a great degree of success, recognition of Cantonese-English code-mixing speech is not as trivial. Unknown language boundary, accents in code-switched English words, phonetic and phonological differences between Cantonese and English, no regulated grammatical structure, and lack of speech and text data make the ASR of code-mixing utterances much more than a simple integration of two monolingual speech recognition systems. On the other hand, we have little understanding of this highly dynamic language phenomenon. Unlike in monolingual speech recognition research, there are very few linguistic studies that can be referred to. / Cao, Houwei. / Adviser: P.C. Ching. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 129-140). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Automatic speech recognition Cantonese dialects--Data processing Code switching (Linguistics) English language--Data processing
4	Automatic topic detection of multi-lingual news stories. January 2000 (has links) Wong Kam Lai. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 92-98). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Our Contributions --- p.5 / Chapter 1.2 --- Organization of this Thesis --- p.5 / Chapter 2 --- Literature Review --- p.7 / Chapter 2.1 --- Dragon Systems --- p.7 / Chapter 2.2 --- Carnegie Mellon University (CMU) --- p.9 / Chapter 2.3 --- University of Massachusetts (UMass) --- p.10 / Chapter 2.4 --- IBM T.J. Watson Research Center --- p.11 / Chapter 2.5 --- BBN Technologies --- p.12 / Chapter 2.6 --- National Taiwan University (NTU) --- p.13 / Chapter 2.7 --- Drawbacks of Existing Approaches --- p.14 / Chapter 3 --- Overview of Proposed Approach --- p.15 / Chapter 3.1 --- News Source --- p.15 / Chapter 3.2 --- Story Preprocessing --- p.18 / Chapter 3.3 --- Concept Term Generation --- p.20 / Chapter 3.4 --- Named Entity Extraction --- p.21 / Chapter 3.5 --- Gross Translation of Chinese to English --- p.21 / Chapter 3.6 --- Topic Detection method --- p.22 / Chapter 3.6.1 --- Deferral Period --- p.22 / Chapter 3.6.2 --- Detection Approach --- p.23 / Chapter 4 --- Concept Term Model --- p.25 / Chapter 4.1 --- Background of Contextual Analysis --- p.25 / Chapter 4.2 --- Concept Term Generation --- p.28 / Chapter 4.2.1 --- Concept Generation Algorithm --- p.28 / Chapter 4.2.2 --- Concept Term Representation for Detection --- p.33 / Chapter 5 --- Topic Detection Model --- p.35 / Chapter 5.1 --- Text Representation and Term Weights --- p.35 / Chapter 5.1.1 --- Story Representation --- p.35 / Chapter 5.1.2 --- Topic Representation --- p.43 / Chapter 5.1.3 --- Similarity Score --- p.43 / Chapter 5.1.4 --- Time adjustment scheme --- p.46 / Chapter 5.2 --- Gross Translation Method --- p.48 / Chapter 5.3 --- The Detection System --- p.50 / Chapter 5.3.1 --- Detection Requirement --- p.50 / Chapter 5.3.2 --- The Top Level Model --- p.52 / Chapter 5.4 --- The Clustering Algorithm --- p.55 / Chapter 5.4.1 --- Similarity Calculation --- p.55 / Chapter 5.4.2 --- Grouping Related Elements --- p.56 / Chapter 5.4.3 --- Topic Identification --- p.60 / Chapter 6 --- Experimental Results and Analysis --- p.63 / Chapter 6.1 --- Evaluation Model --- p.63 / Chapter 6.1.1 --- Evaluation Methodology --- p.64 / Chapter 6.2 --- Experiments on the effects of tuning the parameter --- p.68 / Chapter 6.2.1 --- Experiment Setup --- p.68 / Chapter 6.2.2 --- Results and Analysis --- p.69 / Chapter 6.3 --- Experiments on the effects of named entities and concept terms --- p.74 / Chapter 6.3.1 --- Experiment Setup --- p.74 / Chapter 6.3.2 --- Results and Analysis --- p.75 / Chapter 6.4 --- Experiments on the effect of using time adjustment --- p.77 / Chapter 6.4.1 --- Experiment Setup --- p.77 / Chapter 6.4.2 --- Results and Analysis --- p.79 / Chapter 6.5 --- Experiments on mono-lingual detection --- p.80 / Chapter 6.5.1 --- Experiment Setup --- p.80 / Chapter 6.5.2 --- Results and Analysis --- p.80 / Chapter 7 --- Conclusions and Future Work --- p.83 / Chapter 7.1 --- Conclusions --- p.83 / Chapter 7.2 --- Future Work --- p.85 / Chapter A --- List of Topics annotated for TDT3 Corpus --- p.86 / Chapter B --- Matching evaluation topics to hypothesized topics --- p.90 / Bibliography --- p.92 Broadcast journalism--Data processing Information retrieval English language--Data processing Chinese language--Data processing
5	Towards a computational theory of definite anaphora comprehension in English discourse Sidner, Candace Lee January 1979 (has links) Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1979. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Bibliography: leaves 275-282. / by Candace Lee Sidner. / Ph.D. Anaphora (Linguistics) English language Data processing Mathematical linguistics
6	Automatic topic detection from news stories. January 2001 (has links) Hui Kin. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 115-120). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Topic Detection Problem --- p.2 / Chapter 1.1.1 --- What is a Topic? --- p.2 / Chapter 1.1.2 --- Topic Detection --- p.3 / Chapter 1.2 --- Our Contributions --- p.5 / Chapter 1.2.1 --- Thesis Organization --- p.6 / Chapter 2 --- Literature Review --- p.7 / Chapter 2.1 --- Dragon Systems --- p.7 / Chapter 2.2 --- University of Massachusetts (UMass) --- p.9 / Chapter 2.3 --- Carnegie Mellon University (CMU) --- p.10 / Chapter 2.4 --- BBN Technologies --- p.11 / Chapter 2.5 --- IBM T. J. Watson Research Center --- p.12 / Chapter 2.6 --- National Taiwan University (NTU) --- p.13 / Chapter 2.7 --- Drawbacks of Existing Approaches --- p.14 / Chapter 3 --- System Overview --- p.16 / Chapter 3.1 --- News Sources --- p.17 / Chapter 3.2 --- Story Preprocessing --- p.21 / Chapter 3.3 --- Named Entity Extraction --- p.22 / Chapter 3.4 --- Gross Translation --- p.22 / Chapter 3.5 --- Unsupervised Learning Module --- p.24 / Chapter 4 --- Term Extraction and Story Representation --- p.27 / Chapter 4.1 --- IBM Intelligent Miner For Text --- p.28 / Chapter 4.2 --- Transformation-based Error-driven Learning --- p.31 / Chapter 4.2.1 --- Learning Stage --- p.32 / Chapter 4.2.2 --- Design of New Tags --- p.33 / Chapter 4.2.3 --- Lexical Rules Learning --- p.35 / Chapter 4.2.4 --- Contextual Rules Learning --- p.39 / Chapter 4.3 --- Extracting Named Entities Using Learned Rules --- p.42 / Chapter 4.4 --- Story Representation --- p.46 / Chapter 4.4.1 --- Basic Representation --- p.46 / Chapter 4.4.2 --- Enhanced Representation --- p.47 / Chapter 5 --- Gross Translation --- p.52 / Chapter 5.1 --- Basic Translation --- p.52 / Chapter 5.2 --- Enhanced Translation --- p.60 / Chapter 5.2.1 --- Parallel Corpus Alignment Approach --- p.60 / Chapter 5.2.2 --- Enhanced Translation Approach --- p.62 / Chapter 6 --- Unsupervised Learning Module --- p.68 / Chapter 6.1 --- Overview of the Discovery Algorithm --- p.68 / Chapter 6.2 --- Topic Representation --- p.70 / Chapter 6.3 --- Similarity Calculation --- p.72 / Chapter 6.3.1 --- Similarity Score Calculation --- p.72 / Chapter 6.3.2 --- Time Adjustment Scheme --- p.74 / Chapter 6.3.3 --- Language Normalization Scheme --- p.75 / Chapter 6.4 --- Related Elements Combination --- p.78 / Chapter 7 --- Experimental Results and Analysis --- p.84 / Chapter 7.1 --- TDT corpora --- p.84 / Chapter 7.2 --- Evaluation Methodology --- p.85 / Chapter 7.3 --- Experimental Results on Various Parameter Settings --- p.88 / Chapter 7.4 --- Experiments Results on Various Named Entity Extraction Ap- proaches --- p.89 / Chapter 7.5 --- Experiments Results on Various Story Representation Approaches --- p.100 / Chapter 7.6 --- Experiments Results on Various Translation Approaches --- p.104 / Chapter 7.7 --- Experiments Results on the Effect of Language Normalization Scheme on Detection Approaches --- p.106 / Chapter 7.8 --- TDT2000 Topic Detection Result --- p.110 / Chapter 8 --- Conclusions and Future Works --- p.112 / Chapter 8.1 --- Conclusions --- p.112 / Chapter 8.2 --- Future Work --- p.114 / Bibliography --- p.115 / Chapter A --- List of Topics annotated for TDT2 Corpus --- p.121 / Chapter B --- Significant Test Results --- p.124 Broadcast journalism--Data processing Information retrieval English language--Data processing Chinese language--Data processing
7	You talking to me? : zero auxiliary constructions in British English Caines, Andrew Paul January 2011 (has links) No description available. 400
8	Simulation games through the computer to teach ESL students Grubbs, Vivian Louise January 1982 (has links) This thesis looks at the potential use of computer simulation games in the ESL classroom. Simulation games, if educational as well as entertaining, come under the field of Computer Aided Instruction (CAI). CAI is expensive, both for computers and programs; programs are difficult to create and often specialized; and computer hardware is made differently, so software is difficult to transfer between computers. Yet, the advantages far out-weigh the disadvantages. With CAI, the student receives instant feedback and individualized instruction. The student can work at his/her own pace, and the computer can devote full attention to the student. It has not been fully determined whether students learn better or faster with CAI as compared to traditional classroom instruction.The major portion of this thesis is a computer simulation game written to instruct in the directions: right, left, straight, and back-up. With this program and support from in-class instruction, the student should learn directions quickly. English language -- Data processing. Simulation games in education.
9	Entropy reduction of English text using variable length grouping Ast, Vincent Norman 01 July 1972 (has links) It is known that the entropy of English text can be reduced by arranging the text into groups of two or more letters each. The higher the order of the grouping the greater is the entropy reduction. Using this principle in a computer text compressing system brings about difficulties, however, because the number of entries required in the translation table increases exponentially with group size. This experiment examined the possibility of using a translation table containing only selected entries of all group sizes with the expectation of obtaining a substantial entropy reduction with a relatively small table. An expression was derived that showed that the groups which should be included in the table are not necessarily those that occur frequently but rather occur more frequently than would be expected due to random occurrence. This was complicated by the fact that any grouping affects the frequency of occurrence of many other related groups. An algorithm was developed in which the table originally starts with the regular 26 letters of the alphabet and the space. Entries, which consist of letter groups, complete words, and word groups, are then added one by one based on the selection criterion. After each entry is added adjustments are made to account for the interaction of the groups. This algorithm was programmed on a computer and was run using a text sample of about 7000 words. The results showed that the entropy could easily be reduced down to 3 bits per letter with a table of less than 200 entries. With about 500 entries the entropy could be reduced to about 2.5 bits per letter. About 60% of the table was composed of letter groups, 42% of single words and 8% of word groups and indicated that the extra complications involved in handling word groups may not be worthwhile. A visual examination of the table showed that many entries were very much oriented to the particular sample. This may or may not be desirable depending on the intended use of the translating system. Information measurement English language -- Data processing Computer and Systems Architecture Data Storage Systems Library and Information Science
10	Concept space approach for cross-lingual information retrieval 陸穎剛, Luk, Wing-kong. January 2000 (has links) published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy Information retrieval. Boundary element methods. Chinese language - Data processing. English language - Data processing.

Search results