Global ETD Search

31	Using a rewriting system to model individual writing styles Lin, Jing January 2012 (has links) Each individual has a distinguished writing style. But natural language generation systems pro- duce text with much less variety. Is it possible to produce more human-like text from natural language generation systems by mimicking the style of particular authors? We start by analysing the text of real authors. We collect a corpus of texts from a single genre (food recipes) with each text identified with its author, and summarise a variety of writing features in these texts. Each author's writing style is the combination of a set of features. Analysis of the writing features shows that not only does each individual author write differently but the differences are consistent over the whole of their corpus. Hence we conclude that authors do keep consistent style consisting of a variety of different features. When we discuss notions such as the style and meaning of texts, we are referring to the reac- tion that readers have to them. It is important, therefore, in the field of computational linguistics to experiment by showing texts to people and assessing their interpretation of the texts. In our research we move the thesis from simple discussion and statistical analysis of the properties of text and NLG systems, to perform experiments to verify the actual impact that lexical preference has on real readers. Through experiments that require participants to follow a recipe and prepare food, we conclude that it is possible to alter the lexicon of a recipe without altering the actions performed by the cook, hence that word choice is an aspect of style rather than semantics; and also that word choice is one of the writing features employed by readers in identifying the author of a text. Among all writing features, individual lexical preference is very important both for analysing and generating texts. So we choose individual lexical choice as our principal topic of research. Using a modified version of distributional similarity CDS) helps us to choose words used by in- dividual authors without the limitation of many other solutions such as a pre-built thesauri. We present an algorithm for analysis and rewriting, and assess the results. Based on the results we propose some further improvements. 025.410285635
32	Domain independent generation from RDF instance date Sun, Xiantang January 2008 (has links) The next generation of the web, the Semantic Web, integrates distributed web resources from various domains by allowing data (instantial and ontological data) to be shared and reused across applications, enterprise and community boundaries based on the Resource Description Framework (RDF). Nevertheless, the RDF was not developed for casual users who are unfamiliar with the RDF but interested in data represented using RDF. NLG may be a possible solution to bridging the gap between the casual users and RDF data, but the cost of separately applying fine grained NLG techniques for every domain in the Semantic Web would be extremely high, and hence not realistic. 006.35
33	Geographic referring expressions : doing geometry with words Gomes de Oliveira, Rodrigo January 2017 (has links) No description available. 004
34	GLR parsing with multiple grammars for natural language queries. January 2000 (has links) Luk Po Chui. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 97-100). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Efficiency and Memory --- p.2 / Chapter 1.2 --- Ambiguity --- p.3 / Chapter 1.3 --- Robustness --- p.4 / Chapter 1.4 --- Thesis Organization --- p.5 / Chapter 2 --- Background --- p.7 / Chapter 2.1 --- Introduction --- p.7 / Chapter 2.2 --- Context-Free Grammars --- p.8 / Chapter 2.3 --- The LR Parsing Algorithm --- p.9 / Chapter 2.4 --- The Generalized LR Parsing Algorithm --- p.12 / Chapter 2.4.1 --- Graph-Structured Stack --- p.12 / Chapter 2.4.2 --- Packed Shared Parse Forest --- p.14 / Chapter 2.5 --- Time and Space Complexity --- p.16 / Chapter 2.6 --- Related Work on Parsing --- p.17 / Chapter 2.6.1 --- GLR* --- p.17 / Chapter 2.6.2 --- TINA --- p.18 / Chapter 2.6.3 --- PHOENIX --- p.19 / Chapter 2.7 --- Chapter Summary --- p.21 / Chapter 3 --- Grammar Partitioning --- p.22 / Chapter 3.1 --- Introduction --- p.22 / Chapter 3.2 --- Motivation --- p.22 / Chapter 3.3 --- Previous Work on Grammar Partitioning --- p.24 / Chapter 3.4 --- Our Grammar Partitioning Approach --- p.26 / Chapter 3.4.1 --- Definitions and Concepts --- p.26 / Chapter 3.4.2 --- Guidelines for Grammar Partitioning --- p.29 / Chapter 3.5 --- An Example --- p.30 / Chapter 3.6 --- Chapter Summary --- p.34 / Chapter 4 --- Parser Composition --- p.35 / Chapter 4.1 --- Introduction --- p.35 / Chapter 4.2 --- GLR Lattice Parsing --- p.36 / Chapter 4.2.1 --- Lattice with Multiple Granularity --- p.36 / Chapter 4.2.2 --- Modifications to the GLR Parsing Algorithm --- p.37 / Chapter 4.3 --- Parser Composition Algorithms --- p.45 / Chapter 4.3.1 --- Parser Composition by Cascading --- p.46 / Chapter 4 3.2 --- Parser Composition with Predictive Pruning --- p.48 / Chapter 4.3.3 --- Comparison of Parser Composition by Cascading and Parser Composition with Predictive Pruning --- p.54 / Chapter 4.4 --- Chapter Summary --- p.54 / Chapter 5 --- Experimental Results and Analysis --- p.56 / Chapter 5.1 --- Introduction --- p.56 / Chapter 5.2 --- Experimental Corpus --- p.57 / Chapter 5.3 --- ATIS Grammar Development --- p.60 / Chapter 5.4 --- Grammar Partitioning and Parser Composition on ATIS Domain --- p.62 / Chapter 5.4.1 --- ATIS Grammar Partitioning --- p.62 / Chapter 5.4.2 --- Parser Composition on ATIS --- p.63 / Chapter 5.5 --- Ambiguity Handling --- p.66 / Chapter 5.6 --- Semantic Interpretation --- p.69 / Chapter 5.6.1 --- Best Path Selection --- p.69 / Chapter 5.6.2 --- Semantic Frame Generation --- p.71 / Chapter 5.6.3 --- Post-Processing --- p.72 / Chapter 5.7 --- Experiments --- p.73 / Chapter 5.7.1 --- Grammar Coverage --- p.73 / Chapter 5.7.2 --- Size of Parsing Table --- p.74 / Chapter 5.7.3 --- Computational Costs --- p.76 / Chapter 5.7.4 --- Accuracy Measures in Natural Language Understanding --- p.81 / Chapter 5.7.5 --- Summary of Results --- p.90 / Chapter 5.8 --- Chapter Summary --- p.91 / Chapter 6 --- Conclusions --- p.92 / Chapter 6.1 --- Thesis Summary --- p.92 / Chapter 6.2 --- Thesis Contributions --- p.93 / Chapter 6.3 --- Future Work --- p.94 / Chapter 6.3.1 --- Statistical Approach on Grammar Partitioning --- p.94 / Chapter 6.3.2 --- Probabilistic modeling for Best Parse Selection --- p.95 / Chapter 6.3.3 --- Robust Parsing Strategies --- p.96 / Bibliography --- p.97 / Chapter A --- ATIS-3 Grammar --- p.101 / Chapter A.l --- English ATIS-3 Grammar Rules --- p.101 / Chapter A.2 --- Chinese ATIS-3 Grammar Rules --- p.104 Parsing (Computer grammar) Computational linguistics
35	Natural language understanding across application domains and languages. January 2002 (has links) Tsui Wai-Ching. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 115-122). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Natural Language Understanding Using Belief Networks --- p.5 / Chapter 1.3 --- Integrating Speech Recognition with Natural Language Un- derstanding --- p.7 / Chapter 1.4 --- Thesis Goals --- p.9 / Chapter 1.5 --- Thesis Organization --- p.10 / Chapter 2 --- Background --- p.12 / Chapter 2.1 --- Natural Language Understanding Approaches --- p.13 / Chapter 2.1.1 --- Rule-based Approaches --- p.15 / Chapter 2.1.2 --- Stochastic Approaches --- p.16 / Chapter 2.1.3 --- Mixed Approaches --- p.18 / Chapter 2.2 --- Portability of Natural Language Understanding Frameworks --- p.19 / Chapter 2.2.1 --- Portability across Domains --- p.19 / Chapter 2.2.2 --- Portability across Languages --- p.20 / Chapter 2.2.3 --- Portability across both Domains and Languages --- p.21 / Chapter 2.3 --- Spoken Language Understanding --- p.21 / Chapter 2.3.1 --- Integration of Speech Recognition Confidence into Nat- ural Language Understanding --- p.22 / Chapter 2.3.2 --- Integration of Other Potential Confidence Features into Natural Language Understanding --- p.24 / Chapter 2.4 --- Belief Networks --- p.24 / Chapter 2.4.1 --- Overview --- p.24 / Chapter 2.4.2 --- Bayesian Inference --- p.26 / Chapter 2.5 --- Transformation-based Parsing Technique --- p.27 / Chapter 2.6 --- Chapter Summary --- p.28 / Chapter 3 --- Portability of the Natural Language Understanding Frame- work across Application Domains and Languages --- p.31 / Chapter 3.1 --- Natural Language Understanding Framework --- p.32 / Chapter 3.1.1 --- Semantic Tagging --- p.33 / Chapter 3.1.2 --- Informational Goal Inference with Belief Networks --- p.34 / Chapter 3.2 --- The ISIS Stocks Domain --- p.36 / Chapter 3.3 --- A Unified Framework for English and Chinese --- p.38 / Chapter 3.3.1 --- Semantic Tagging for the ISIS domain --- p.39 / Chapter 3.3.2 --- Transformation-based Parsing --- p.40 / Chapter 3.3.3 --- Informational Goal Inference with Belief Networks for the ISIS domain --- p.43 / Chapter 3.4 --- Experiments --- p.45 / Chapter 3.4.1 --- Goal Identification Experiments --- p.45 / Chapter 3.4.2 --- A Cross-language Experiment --- p.49 / Chapter 3.5 --- Chapter Summary --- p.55 / Chapter 4 --- Enhancement in the Belief Networks for Informational Goal Inference --- p.57 / Chapter 4.1 --- Semantic Concept Selection in Belief Networks --- p.58 / Chapter 4.1.1 --- Selection of Positive Evidence --- p.58 / Chapter 4.1.2 --- Selection of Negative Evidence --- p.62 / Chapter 4.2 --- Estimation of Statistical Probabilities in the Enhanced Belief Networks --- p.64 / Chapter 4.2.1 --- Estimation of Prior Probabilities --- p.65 / Chapter 4.2.2 --- Estimation of Posterior Probabilities --- p.66 / Chapter 4.3 --- Experiments --- p.73 / Chapter 4.3.1 --- Belief Networks Developed with Positive Evidence --- p.74 / Chapter 4.3.2 --- Belief Networks with the Injection of Negative Evidence --- p.76 / Chapter 4.4 --- Chapter Summary --- p.82 / Chapter 5 --- Integration between Speech Recognition and Natural Lan- guage Understanding --- p.84 / Chapter 5.1 --- The Speech Corpus for the Chinese ISIS Stocks Domain --- p.86 / Chapter 5.2 --- Our Extended Natural Language Understanding Framework for Spoken Language Understanding --- p.90 / Chapter 5.2.1 --- Integrated Scoring for Chinese Speech Recognition and Natural Language Understanding --- p.92 / Chapter 5.3 --- Experiments --- p.92 / Chapter 5.3.1 --- Training and Testing on the Perfect Reference Data Sets --- p.93 / Chapter 5.3.2 --- Mismatched Training and Testing Conditions ´ؤ Perfect Reference versus Imperfect Hypotheses --- p.93 / Chapter 5.3.3 --- Comparing Goal Identification between the Use of Single- best versus N-best Recognition Hypotheses --- p.95 / Chapter 5.3.4 --- Integration of Speech Recognition Confidence Scores into Natural Language Understanding --- p.97 / Chapter 5.3.5 --- Feasibility of Our Approach for Spoken Language Un- derstanding --- p.99 / Chapter 5.3.6 --- Justification of Using Max-of-max Classifier in Our Single Goal Identification Scheme --- p.107 / Chapter 5.4 --- Chapter Summary --- p.109 / Chapter 6 --- Conclusions and Future Work --- p.110 / Chapter 6.1 --- Conclusions --- p.110 / Chapter 6.2 --- Contributions --- p.112 / Chapter 6.3 --- Future Work --- p.113 / Bibliography --- p.115 / Chapter A --- Semantic Frames for Chinese --- p.123 / Chapter B --- Semantic Frames for English --- p.127 / Chapter C --- The Concept Set of Positive Evidence for the Nine Goalsin English --- p.131 / Chapter D --- The Concept Set of Positive Evidence for the Ten Goalsin Chinese --- p.133 / Chapter E --- The Complete Concept Set including Both the Positive and Negative Evidence for the Ten Goals in English --- p.135 / Chapter F --- The Complete Concept Set including Both the Positive and Negative Evidence for the Ten Goals in Chinese --- p.138 / Chapter G --- The Assignment of Statistical Probabilities for Each Selected Concept under the Corresponding Goals in Chinese --- p.141 / Chapter H --- The Assignment of Statistical Probabilities for Each Selected Concept under the Corresponding Goals in English --- p.146 Machine learning Automatic speech recognition
36	Robust parsing with confluent preorder parser. / CUHK electronic theses & dissertations collection January 1996 (has links) by Ho, Kei Shiu Edward. / "June 1996." / Thesis (Ph.D.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (p. 186-193). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. Parsing (Computer grammar) Robust statistics
37	Spectral Methods for Natural Language Processing Stratos, Karl January 2016 (has links) Many state-of-the-art results in natural language processing (NLP) are achieved with statistical models involving latent variables. Unfortunately, computational problems associated with such models (for instance, finding the optimal parameter values) are typically intractable, forcing practitioners to rely on heuristic methods without strong guarantees. While heuristics are often sufficient for empirical purposes, their de-emphasis on theoretical aspects has certain negative ramifications. First, it can impede the development of rigorous theoretical understanding which can generate new ideas and algorithms. Second, it can lead to black art solutions that are unreliable and difficult to reproduce. In this thesis, we argue that spectral methods---that is, methods that use singular value decomposition or other similar matrix or tensor factorization---can effectively remedy these negative ramifications. To this end, we develop spectral methods for two unsupervised language processing tasks. The first task is learning lexical representations from unannotated text (e.g., hierarchical clustering of a vocabulary). The second task is estimating parameters of latent-variable models used in NLP applications (e.g., for unsupervised part-of-speech tagging). We show that our spectral algorithms have the following advantages over previous methods: 1. The algorithms provide a new theoretical framework that is amenable to rigorous analysis. In particular, they are shown to be statistically consistent. 2. The algorithms are simple to implement, efficient, and scalable to large amounts of data. They also yield results that are competitive with the state-of-the-art. Computer algorithms Computer science
38	Data-Driven Solutions to Bottlenecks in Natural Language Generation Biran, Or January 2016 (has links) Concept-to-text generation suffers from what can be called generation bottlenecks - aspects of the generated text which should change for different subject domains, and which are usually hard to obtain or require manual work. Some examples are domain-specific content, a type system, a dictionary, discourse style and lexical style. These bottlenecks have stifled attempts to create generation systems that are generic, or at least apply to a wide range of domains in non-trivial applications. This thesis is comprised of two parts. In the first, we propose data-driven solutions that automate obtaining the information and models required to solve some of these bottlenecks. Specifically, we present an approach to mining domain-specific paraphrasal templates from a simple text corpus; an approach to extracting a domain-specific taxonomic thesaurus from Wikipedia; and a novel document planning model which determines both ordering and discourse relations, and which can be extracted from a domain corpus. We evaluate each solution individually and independently from its ultimate use in generation, and show significant improvements in each. In the second part of the thesis, we describe a framework for creating generation systems that rely on these solutions, as well as on hybrid concept-to-text and text-to-text generation, and which can be automatically adapted to any domain using only a domain-specific corpus. We illustrate the breadth of applications that this framework applies to with three examples: biography generation and company description generation, which we use to evaluate the framework itself and the contribution of our solutions; and justification of machine learning predictions, a novel application which we evaluate in a task-based study to show its importance to users. Computer science Artificial intelligence
39	Apply syntactic features in a maximum entropy framework for English and Chinese reading comprehension. / CUHK electronic theses & dissertations collection January 2008 (has links) Automatic reading comprehension (RC) systems integrate various kinds of natural language processing (NLP) technologies to analyze a given passage and generate or extract answers in response to questions about the passage. Previous work applied a lot of NLP technologies including shallow syntactic analyses (e.g. base noun phrases), semantic analyses (e.g. named entities) and discourse analyses (e.g. pronoun referents) in the bag-of-words (BOW) matching approach. This thesis proposes a novel RC approach that integrates a set of NLP technologies in a maximum entropy (ME) framework to estimate candidate answer sentences' probabilities being answers. In contrast to previous RC approaches, which are in English-only, the presented RC approach is the first one for both English and Chinese, the two languages used by most people in the world. In order to support the evaluation of the bilingual RC systems, a parallel English and Chinese corpus is also designed and developed. Annotations deemed relevant to the RC task are also included in the corpus. In addition, useful NLP technologies are explored from a new perspective---referring the pedagogical guidelines of humans, reading skills are summarized and mapped to various NLP technologies. Practical NLP technologies, categorized as shallow syntactic analyses (i.e. part-of-speech tags, voices and tenses) and deep syntactic analyses (i.e. syntactic parse trees and dependency parse trees) are then selected for integration. The proposed approach is evaluated on an English corpus, namely Remedia and our bilingual corpus. The experimental results show that our approach significantly improves the RC results on both English and Chinese corpora. / Xu, Kui. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3618. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 132-141). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Entropy (Information theory) Reading comprehension
40	A natural language based indexing technique for Chinese information retrieval. January 1997 (has links) Pang Chun Kiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 101-107). / Chapter 1 --- Introduction --- p.2 / Chapter 1.1 --- Chinese Indexing using Noun Phrases --- p.6 / Chapter 1.2 --- Objectives --- p.8 / Chapter 1.3 --- An Overview of the Thesis --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Technology Influences on Information Retrieval --- p.10 / Chapter 2.2 --- Related Work --- p.13 / Chapter 2.2.1 --- Statistical/Keyword Approaches --- p.13 / Chapter 2.2.2 --- Syntactical approaches --- p.15 / Chapter 2.2.3 --- Semantic approaches --- p.17 / Chapter 2.2.4 --- Noun Phrases Approach --- p.18 / Chapter 2.2.5 --- Chinese Information Retrieval --- p.20 / Chapter 2.3 --- Our Approach --- p.21 / Chapter 3 --- Chinese Noun Phrases --- p.23 / Chapter 3.1 --- Different types of Chinese Noun Phrases --- p.23 / Chapter 3.2 --- Ambiguous noun phrases --- p.27 / Chapter 3.2.1 --- Ambiguous English Noun Phrases --- p.27 / Chapter 3.2.2 --- Ambiguous Chinese Noun Phrases --- p.28 / Chapter 3.2.3 --- Statistical data on the three NPs --- p.33 / Chapter 4 --- Index Extraction from De-de Conj. NP --- p.35 / Chapter 4.1 --- Word Segmentation --- p.36 / Chapter 4.2 --- Part-of-speech tagging --- p.37 / Chapter 4.3 --- Noun Phrase Extraction --- p.37 / Chapter 4.4 --- The Chinese noun phrase partial parser --- p.38 / Chapter 4.5 --- Handling Parsing Ambiguity --- p.40 / Chapter 4.6 --- Index Building Strategy --- p.41 / Chapter 4.7 --- The cross-set generation rules --- p.44 / Chapter 4.8 --- Example 1: Indexing De-de NP --- p.46 / Chapter 4.9 --- Example 2: Indexing Conjunctive NP --- p.48 / Chapter 4.10 --- Experimental results and Discussion --- p.49 / Chapter 5 --- Indexing Compound Nouns --- p.52 / Chapter 5.1 --- Previous Researches on Compound Nouns --- p.53 / Chapter 5.2 --- Indexing two-term Compound Nouns --- p.55 / Chapter 5.2.1 --- About the thesaurus《同義詞詞林》 --- p.56 / Chapter 5.3 --- Indexing Compound Nouns of three or more terms --- p.58 / Chapter 5.4 --- Corpus learning approach --- p.59 / Chapter 5.4.1 --- An Example --- p.60 / Chapter 5.4.2 --- Experimental Setup --- p.63 / Chapter 5.4.3 --- An Experiment using the third level of the Cilin --- p.65 / Chapter 5.4.4 --- An Experiment using the second level of the Cilin --- p.66 / Chapter 5.5 --- Contextual Approach --- p.68 / Chapter 5.5.1 --- The algorithm --- p.69 / Chapter 5.5.2 --- An Illustrative Example --- p.71 / Chapter 5.5.3 --- Experiments on compound nouns --- p.72 / Chapter 5.5.4 --- Experiment I: Word Distance Based Extraction --- p.73 / Chapter 5.5.5 --- Experiment II: Semantic Class Based Extraction --- p.75 / Chapter 5.5.6 --- Experiments III: On different boundaries --- p.76 / Chapter 5.5.7 --- The Final Algorithm --- p.79 / Chapter 5.5.8 --- Experiments on other compounds --- p.82 / Chapter 5.5.9 --- Discussion --- p.83 / Chapter 6 --- Overall Effectiveness --- p.85 / Chapter 6.1 --- Illustrative Example for the Integrated Algorithm --- p.86 / Chapter 6.2 --- Experimental Setup --- p.90 / Chapter 6.3 --- Experimental Results & Discussion --- p.91 / Chapter 7 --- Conclusion --- p.95 / Chapter 7.1 --- Summary --- p.95 / Chapter 7.2 --- Contributions --- p.97 / Chapter 7.3 --- Future Directions --- p.98 / Chapter 7.3.1 --- Word-sense determination --- p.98 / Chapter 7.3.2 --- Hybrid approach for compound noun indexing --- p.99 / Chapter A --- Cross-set Generation Rules --- p.108 / Chapter B --- Tag set by Tsinghua University --- p.110 / Chapter C --- Noun Phrases Test Set --- p.113 / Chapter D --- Compound Nouns Test Set --- p.124 / Chapter D.l --- Three-term Compound Nouns --- p.125 / Chapter D.1.1 --- NVN --- p.125 / Chapter D.1.2 --- Other three-term compound nouns --- p.129 / Chapter D.2 --- Four-term Compound Nouns --- p.133 / Chapter D.3 --- Five-term and six-term Compound Nouns --- p.134 Chinese language--Data processing Indexing

Search results