Global ETD Search

121	Using a rewriting system to model individual writing styles Lin, Jing January 2012 (has links) Each individual has a distinguished writing style. But natural language generation systems pro- duce text with much less variety. Is it possible to produce more human-like text from natural language generation systems by mimicking the style of particular authors? We start by analysing the text of real authors. We collect a corpus of texts from a single genre (food recipes) with each text identified with its author, and summarise a variety of writing features in these texts. Each author's writing style is the combination of a set of features. Analysis of the writing features shows that not only does each individual author write differently but the differences are consistent over the whole of their corpus. Hence we conclude that authors do keep consistent style consisting of a variety of different features. When we discuss notions such as the style and meaning of texts, we are referring to the reac- tion that readers have to them. It is important, therefore, in the field of computational linguistics to experiment by showing texts to people and assessing their interpretation of the texts. In our research we move the thesis from simple discussion and statistical analysis of the properties of text and NLG systems, to perform experiments to verify the actual impact that lexical preference has on real readers. Through experiments that require participants to follow a recipe and prepare food, we conclude that it is possible to alter the lexicon of a recipe without altering the actions performed by the cook, hence that word choice is an aspect of style rather than semantics; and also that word choice is one of the writing features employed by readers in identifying the author of a text. Among all writing features, individual lexical preference is very important both for analysing and generating texts. So we choose individual lexical choice as our principal topic of research. Using a modified version of distributional similarity CDS) helps us to choose words used by in- dividual authors without the limitation of many other solutions such as a pre-built thesauri. We present an algorithm for analysis and rewriting, and assess the results. Based on the results we propose some further improvements. 025.410285635
122	Domain independent generation from RDF instance date Sun, Xiantang January 2008 (has links) The next generation of the web, the Semantic Web, integrates distributed web resources from various domains by allowing data (instantial and ontological data) to be shared and reused across applications, enterprise and community boundaries based on the Resource Description Framework (RDF). Nevertheless, the RDF was not developed for casual users who are unfamiliar with the RDF but interested in data represented using RDF. NLG may be a possible solution to bridging the gap between the casual users and RDF data, but the cost of separately applying fine grained NLG techniques for every domain in the Semantic Web would be extremely high, and hence not realistic. 006.35
123	An Entropy Estimate of Written Language and Twitter Language : A Comparison between English and Swedish Juhlin, Sanna January 2017 (has links) The purpose of this study is to estimate and compare the entropy and redundancy of written English and Swedish. We also investigate and compare the entropy and redundancy of Twitter language. This is done by extracting n consecutive characters called n-grams and calculating their frequencies. No precise values are obtained, due to the amount of text being finite, while the entropy is estimated for text length tending towards infinity. However we do obtain results for n = 1,...,6 and the results show that written Swedish has higher entropy than written English and that the redundancy is lower for Swedish language. When comparing Twitter with the standard languages we find that for Twitter, the entropy is higher and the redundancy is lower. entropy entropy rate redundancy Twitter natural language Mathematics Matematik
124	Automatic Tagging of Communication Data Hoyt, Matthew Ray 08 1900 (has links) Globally distributed software teams are widespread throughout industry. But finding reliable methods that can properly assess a team's activities is a real challenge. Methods such as surveys and manual coding of activities are too time consuming and are often unreliable. Recent advances in information retrieval and linguistics, however, suggest that automated and/or semi-automated text classification algorithms could be an effective way of finding differences in the communication patterns among individuals and groups. Communication among group members is frequent and generates a significant amount of data. Thus having a web-based tool that can automatically analyze the communication patterns among global software teams could lead to a better understanding of group performance. The goal of this thesis, therefore, is to compare automatic and semi-automatic measures of communication and evaluate their effectiveness in classifying different types of group activities that occur within a global software development project. In order to achieve this goal, we developed a web-based component that can be used to help clean and classify communication activities. The component was then used to compare different automated text classification techniques on various group activities to determine their effectiveness in correctly classifying data from a global software development team project. Tagging machine learning global software development natural language processing
125	Content-Based Geolocation Prediction of Canadian Twitter Users and Their Tweets Metin, Ali Mert 13 August 2019 (has links) Last decade witnessed the rise of online social networks, especially Twitter. Today, Twitteris a giant social platform with over 250 million users \|who produce massive amounts of data everyday. This creates many research opportunities, speci cally for Natural Language Processing (NLP) in which text is utilized to extract information that could be used in many applications. One problem NLP might help solving is geolocation inference or geolocation detection from online social networks. Detecting the location of Twitter users based on the text of their tweets is useful since not many users publicly declare their locations or geotag their tweets. Location information is crucial for a variety of applications such as event detection, disease and illness tracking and user pro ling. These tasks are not trivial, because online content is often noisy; it includes misspellings, incomplete words or phrases, idiomatic expressions, abbreviations, acronyms, and Twitter-speci c literature. In this work, we attempted to detect the location of Canadian users \|and tweets sent from Canada \|at metropolitan areas and province level; this was not done before, to the best of our knowledge. In order to do this, we collected two di erent datasets, and applied a variety of machine learning, including deep learning methods. Besides, we also attempted to geolocate users based on their social graph (i.e., user's friends and followers) as a novel approach. Geolocation Canadian Twitter users Twitter Natural language processing
126	Geographic referring expressions : doing geometry with words Gomes de Oliveira, Rodrigo January 2017 (has links) No description available. 004
127	GLR parsing with multiple grammars for natural language queries. January 2000 (has links) Luk Po Chui. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 97-100). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Efficiency and Memory --- p.2 / Chapter 1.2 --- Ambiguity --- p.3 / Chapter 1.3 --- Robustness --- p.4 / Chapter 1.4 --- Thesis Organization --- p.5 / Chapter 2 --- Background --- p.7 / Chapter 2.1 --- Introduction --- p.7 / Chapter 2.2 --- Context-Free Grammars --- p.8 / Chapter 2.3 --- The LR Parsing Algorithm --- p.9 / Chapter 2.4 --- The Generalized LR Parsing Algorithm --- p.12 / Chapter 2.4.1 --- Graph-Structured Stack --- p.12 / Chapter 2.4.2 --- Packed Shared Parse Forest --- p.14 / Chapter 2.5 --- Time and Space Complexity --- p.16 / Chapter 2.6 --- Related Work on Parsing --- p.17 / Chapter 2.6.1 --- GLR* --- p.17 / Chapter 2.6.2 --- TINA --- p.18 / Chapter 2.6.3 --- PHOENIX --- p.19 / Chapter 2.7 --- Chapter Summary --- p.21 / Chapter 3 --- Grammar Partitioning --- p.22 / Chapter 3.1 --- Introduction --- p.22 / Chapter 3.2 --- Motivation --- p.22 / Chapter 3.3 --- Previous Work on Grammar Partitioning --- p.24 / Chapter 3.4 --- Our Grammar Partitioning Approach --- p.26 / Chapter 3.4.1 --- Definitions and Concepts --- p.26 / Chapter 3.4.2 --- Guidelines for Grammar Partitioning --- p.29 / Chapter 3.5 --- An Example --- p.30 / Chapter 3.6 --- Chapter Summary --- p.34 / Chapter 4 --- Parser Composition --- p.35 / Chapter 4.1 --- Introduction --- p.35 / Chapter 4.2 --- GLR Lattice Parsing --- p.36 / Chapter 4.2.1 --- Lattice with Multiple Granularity --- p.36 / Chapter 4.2.2 --- Modifications to the GLR Parsing Algorithm --- p.37 / Chapter 4.3 --- Parser Composition Algorithms --- p.45 / Chapter 4.3.1 --- Parser Composition by Cascading --- p.46 / Chapter 4 3.2 --- Parser Composition with Predictive Pruning --- p.48 / Chapter 4.3.3 --- Comparison of Parser Composition by Cascading and Parser Composition with Predictive Pruning --- p.54 / Chapter 4.4 --- Chapter Summary --- p.54 / Chapter 5 --- Experimental Results and Analysis --- p.56 / Chapter 5.1 --- Introduction --- p.56 / Chapter 5.2 --- Experimental Corpus --- p.57 / Chapter 5.3 --- ATIS Grammar Development --- p.60 / Chapter 5.4 --- Grammar Partitioning and Parser Composition on ATIS Domain --- p.62 / Chapter 5.4.1 --- ATIS Grammar Partitioning --- p.62 / Chapter 5.4.2 --- Parser Composition on ATIS --- p.63 / Chapter 5.5 --- Ambiguity Handling --- p.66 / Chapter 5.6 --- Semantic Interpretation --- p.69 / Chapter 5.6.1 --- Best Path Selection --- p.69 / Chapter 5.6.2 --- Semantic Frame Generation --- p.71 / Chapter 5.6.3 --- Post-Processing --- p.72 / Chapter 5.7 --- Experiments --- p.73 / Chapter 5.7.1 --- Grammar Coverage --- p.73 / Chapter 5.7.2 --- Size of Parsing Table --- p.74 / Chapter 5.7.3 --- Computational Costs --- p.76 / Chapter 5.7.4 --- Accuracy Measures in Natural Language Understanding --- p.81 / Chapter 5.7.5 --- Summary of Results --- p.90 / Chapter 5.8 --- Chapter Summary --- p.91 / Chapter 6 --- Conclusions --- p.92 / Chapter 6.1 --- Thesis Summary --- p.92 / Chapter 6.2 --- Thesis Contributions --- p.93 / Chapter 6.3 --- Future Work --- p.94 / Chapter 6.3.1 --- Statistical Approach on Grammar Partitioning --- p.94 / Chapter 6.3.2 --- Probabilistic modeling for Best Parse Selection --- p.95 / Chapter 6.3.3 --- Robust Parsing Strategies --- p.96 / Bibliography --- p.97 / Chapter A --- ATIS-3 Grammar --- p.101 / Chapter A.l --- English ATIS-3 Grammar Rules --- p.101 / Chapter A.2 --- Chinese ATIS-3 Grammar Rules --- p.104 Parsing (Computer grammar) Computational linguistics
128	A framework for specifying business rules based on logic with a syntax close to natural language Roettenbacher, Christian Wolfgang January 2017 (has links) The systematic interaction of software developers with the business domain experts that are usually no software developers is crucial to software system maintenance and creation and has surfaced as the big challenge of modern software engineering. Existing frameworks promoting the typical programming languages with artificial syntax are suitable to be processed by computers but do not cater to domain experts, who are used to documents written in natural language as a means of interaction. Other frameworks that claim to be fully automated, such as those using natural language processing, are too imprecise to handle the typical requirements documents written in heterogeneous natural language flavours. In this thesis, a framework is proposed that can support the specification of business rules that is, on the one hand, understandable for nonprogrammers and on the other hand semantically founded, which enables computer processability. This is achieved by the novel language Adaptive Business Process and Rule Integration Language (APRIL). Specifications in APRIL can be written in a style close to natural language and are thus suitable for humans, which was empirically evaluated with a representative group of test persons. A useful and uncommon feature of APRIL is the ability to define reusable abstract mixfix operators as sentence patterns, that can mimic natural language. The semantic underpinning of the mixfix operators is achieved by customizable atomic formulas, allowing to tailor APRIL to specific domains. Atomic formulas are underpinned by a denotational semantics, which is based on Tempura (executable subset of Interval Temporal Logic (ITL)) to describe behaviour and the Object Constraint Language (OCL) to describe invariants and pre- and postconditions. APRIL statements can be used as the basis for automatically generating test code for software systems. An additional aspect of enhancing the quality of specification documents comes with a novel formal method technique (ISEPI) applicable to behavioural business rules semantically based on Propositional Interval Temporal Logic (PITL) and complying with the newly discovered 2-to-1 property. This work discovers how the ISE subset of ISEPI can be used to express complex behavioural business rules in a more concise and understandable way. The evaluation of ISE is done by an example specification taken from the car industry describing system behaviour, using the tools MONA and PITL2MONA. Finally, a methodology is presented that helps to guide a continuous transformation starting from purely natural language business rule specification to the APRIL specification which can then be transformed to test code. The methodologies, language concepts, algorithms, tools and techniques devised in this work are part of the APRIL-framework. 600
129	Robust parsing with confluent preorder parser. / CUHK electronic theses & dissertations collection January 1996 (has links) by Ho, Kei Shiu Edward. / "June 1996." / Thesis (Ph.D.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (p. 186-193). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. Parsing (Computer grammar) Robust statistics
130	Spectral Methods for Natural Language Processing Stratos, Karl January 2016 (has links) Many state-of-the-art results in natural language processing (NLP) are achieved with statistical models involving latent variables. Unfortunately, computational problems associated with such models (for instance, finding the optimal parameter values) are typically intractable, forcing practitioners to rely on heuristic methods without strong guarantees. While heuristics are often sufficient for empirical purposes, their de-emphasis on theoretical aspects has certain negative ramifications. First, it can impede the development of rigorous theoretical understanding which can generate new ideas and algorithms. Second, it can lead to black art solutions that are unreliable and difficult to reproduce. In this thesis, we argue that spectral methods---that is, methods that use singular value decomposition or other similar matrix or tensor factorization---can effectively remedy these negative ramifications. To this end, we develop spectral methods for two unsupervised language processing tasks. The first task is learning lexical representations from unannotated text (e.g., hierarchical clustering of a vocabulary). The second task is estimating parameters of latent-variable models used in NLP applications (e.g., for unsupervised part-of-speech tagging). We show that our spectral algorithms have the following advantages over previous methods: 1. The algorithms provide a new theoretical framework that is amenable to rigorous analysis. In particular, they are shown to be statistically consistent. 2. The algorithms are simple to implement, efficient, and scalable to large amounts of data. They also yield results that are competitive with the state-of-the-art. Computer algorithms Computer science

Search results