Global ETD Search

21	The production of prosodic focus and contour in dialogue Youd, Nicholas John January 1992 (has links) Computer programs designed to converse with humans in natural language provide a framework against which to test supra-sentential theories of language production and interpretation. This thesis seeks to flesh out, in terms of a computer model, two basic assumptions concerning prosody-that speakers use intonation to convey intention, or attitude, and that prosodic prominence serves to convey conceptual prommence. A model of an information-providing agent in is proposed, based on an analysis of a corpus of spontaneous dialogues. This uses an architecture of communicating processes, which perform interpretation, application-specific planning, repair, and the production of output. Dialogue acts are then defined as feature bundles corresponding to significant events. A corpus of read dialogues is analysed in terms of these features, and using conventional intonational labelling. Correlations between the two are examined. Prosodic prominence is examined at three levels. At the level of surface encoding, re-use of substrings and structural parallelism can reduce processing for the speaker, and the listener. At the level of conceptual planning, similar benefits exist, given that speakers and listeners assume a common discourse model wherever possible. At these levels use is made of a short-term buffer of recent forms. A speaker may additionally use contrastive prominence to draw the listener's attention to disparities. Finally, at the level of intentions, a speaker wish to highlight certain information, regardless of accessibility. Prosodic focus is represented relationally, rather than via a simple binary-valued feature. This has the advantage of facilitating the mapping between levels; it also renders straightforward the notion of focus as the product of a number of potentially conflicting influences. Those parts of the theory concerned with discourse representation, language generation, and prosodic focus have been implemented as part of the Sundial dialogue system. In this system, discoursal and pragmatic decisions affecting prosody are converted to annotations on a text string, for realisation by a rule-based synthesizer. 410 Natural language generation
22	Driving semantics for a limited domain Palmer, Martha Stone January 1985 (has links) No description available. 410 Natural language semantics
23	Planning multisentential English text using communicative acts Maybury, Mark Thomas January 1991 (has links) The goal of this research is to develop explanation presentation mechanisms for knowledge based systems which enable them to define domain terminology and concepts, narrate events, elucidate plans, processes, or propositions and argue to support a claim or advocate action. This requires the development of devices which select, structure, order and then linguistically realize explanation content as coherent and cohesive English text. With the goal of identifying generic explanation presentation strategies, a wide range of naturally occurring texts were analyzed with respect to their communicative sttucture, function, content and intended effects on the reader. This motivated an integrated theory of communicative acts which characterizes text at the level of rhetorical acts (e.g., describe, define, narrate), illocutionary acts (e.g., inform, request), and locutionary acts (e.g., ask, command). Taken as a whole, the identified communicative acts characterize the structure, content and intended effects of four types of text: description, narration, exposition, argument. These text types have distinct effects such as getting the reader to know about entities, to know about events, to understand plans, processes, or propositions, or to believe propositions or want to perform actions. In addition to identifying the communicative function and effect of text at multiple levels of abstraction, this dissertation details a tripartite theory of focus of attention (discourse focus, temporal focus, and spatial focus) which constrains the planning and linguistic realization of text. To test the integrated theory of communicative acts and tripartite theory of focus of attention, a text generation system TEXPLAN (Textual EXplanation PLANner) was implemented that plans and linguistically realizes multisentential and multiparagraph explanations from knowledge based systems. The communicative acts identified during text analysis were formalized as over sixty compositional and (in some cases) recursive plan operators in the library of a hierarchical planner. Discourse, temporal, and spatial focus models were implemented to track and use attentional information to guide the organization and realization of text. Because the plan operators distinguish between the communicative function (e.g., argue for a proposition) and the expected effect (e.g., the reader believes the proposition) of communicative acts, the system is able to construct a discourse model of the structure and function of its textual responses as well as a user model of the expected effects of its responses on the reader's knowledge, beliefs, and desires. The system uses both the discourse model and user model to guide subsequent utterances. To test its generality, the system was interfaced to a variety of domain applications including a neuropsychological diagnosis system, a mission planning system, and a knowledge based mission simulator. The system produces descriptions, narrations, expositions, and arguments from these applications, thus exhibiting a broader range of rhetorical coverage than previous text generation systems. 005 Natural language processing
24	Analysis of Moving Events Using Tweets Patil, Supritha Basavaraj 02 July 2019 (has links) The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date. / Master of Science / News now travels faster on social media than through news channels. Information from social media can help retrieve minute details that might not be emphasized in news. People tend to describe their actions or sentiments in tweets. I aim at studying if such collections of tweets are dependable sources for identifying paths of moving events. In events like hurricanes, using Twitter can help in analyzing people’s reaction to such moving events. These may include actions such as dislocation or emotions during different phases of the event. The results obtained in the experiments concur with the actual path of the events with respect to the regions affected and time. The frequency of tweets increases during event peaks. The number of locations affected that are identified are significantly more than in news wires. Natural Language Processing Twitter
25	Personality and alignment processes in dialogue : towards a lexically-based unified model Brockmann, Carsten January 2009 (has links) This thesis explores approaches to modelling individual differences in language use. The differences under consideration fall into two broad categories: Variation of the personality projected through language, and modelling of language alignment behaviour between dialogue partners. In a way, these two aspects oppose each other – language related to varying personalities should be recognisably different, while aligning speakers agree on common language during a dialogue. The central hypothesis is that such variation can be captured and produced with restricted computational means. Results from research on personality psychology and psycholinguistics are transformed into a series of lexically-based Affective Language Production Models (ALPMs) which are parameterisable for personality and alignment. The models are then explored by varying the parameters and observing the language they generate. ALPM-1 and ALPM-2 re-generate dialogues from existing utterances which are ranked and filtered according to manually selected linguistic and psycholinguistic features that were found to be related to personality. ALPM-3 is based on true overgeneration of paraphrases from semantic representations using the OPENCCG framework for Combinatory Categorial Grammar (CCG), in combination with corpus-based ranking and filtering by way of n-gram language models. Personality effects are achieved through language models built from the language of speakers of known personality. In ALPM-4, alignment is captured via a cache language model that remembers the previous utterance and thus influences the choice of the next. This model provides a unified treatment of personality and alignment processes in dialogue. In order to evaluate the ALPMs, dialogues between computer characters were generated and presented to human judges who were asked to assess the characters’ personality. In further internal simulations, cache language models were used to reproduce results of psycholinguistic priming studies. The experiments showed that the models are capable of producing natural language dialogue which exhibits human-like personality and alignment effects. 410
26	A System for Natural Language Unmarked Clausal Transformations in Text-to-Text Applications Miller, Daniel 01 June 2009 (has links) A system is proposed which separates clauses from complex sentences into simpler stand-alone sentences. This is useful as an initial step on raw text, where the resulting processed text may be fed into text-to-text applications such as Automatic Summarization, Question Answering, and Machine Translation, where complex sentences are difficult to process. Grammatical natural language transformations provide a possible method to simplify complex sentences to enhance the results of text-to-text applications. Using shallow parsing, this system improves the performance of existing systems to identify and separate marked and unmarked embedded clauses in complex sentence structure resulting in syntactically simplified source for further processing. Natural Language Processing Natural Language Information Retrieval Other Computer Engineering
27	Examination of Gender Bias in News Articles Damin Zhang (11814182) 19 December 2021 (has links) Reading news articles from online sources has become a major choice of obtaining information for many people. Authors who wrote news articles could introduce their own biases either unintentionally or intentionally by using or choosing to use different words to describe otherwise neutral and factual information. Such intentional word choices could create conflicts among different social groups, showing explicit and implicit biases. Any type of biases within the text could affect the reader’s view of the information. One type of biases in natural language is gender bias that had been discovered in a lot of Natural Language Processing (NLP) models, largely attributed to implicit biases in the training text corpora. Analyzing gender bias or stereotypes in such large corpora is a hard task. Previous methods of bias detection were applied to short text like tweets, and to manually built datasets, but little works had been done on long text like news articles in large corpora. Simply detecting bias on annotated text does not help to understand how it was generated and reproduced. Instead, we used structural topic modeling on a large unlabelled corpus of news articles, incorporated qualitative results and quantitative analysis to examine how gender bias was generated and reproduced. This research extends the prior knowledge of bias detection and proposed a method for understanding gender bias in real-world settings. We found that author gender correlated to the topic-gender prevalence and skewed media-gender distribution assist understanding gender bias within news articles. Natural Language Processing Natural Language Processing Topic Modeling Gender Bias
28	From distributional to semantic similarity Curran, James Richard January 2004 (has links) Lexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited coverage, bias and inconsistency. Automated and semi-automated methods for developing such resources are therefore crucial for further resource development and improved application performance. Systems that extract thesauri often identify similar words using the distributional hypothesis that similar words appear in similar contexts. This approach involves using corpora to examine the contexts each word appears in and then calculating the similarity between context distributions. Different definitions of context can be used, and I begin by examining how different types of extracted context influence similarity. To be of most benefit these systems must be capable of finding synonyms for rare words. Reliable context counts for rare events can only be extracted from vast collections of text. In this dissertation I describe how to extract contexts from a corpus of over 2 billion words. I describe techniques for processing text on this scale and examine the trade-off between context accuracy, information content and quantity of text analysed. Distributional similarity is at best an approximation to semantic similarity. I develop improved approximations motivated by the intuition that some events in the context distribution are more indicative of meaning than others. For instance, the object-of-verb context wear is far more indicative of a clothing noun than get. However, existing distributional techniques do not effectively utilise this information. The new context-weighted similarity metric I propose in this dissertation significantly outperforms every distributional similarity metric described in the literature. Nearest-neighbour similarity algorithms scale poorly with vocabulary and context vector size. To overcome this problem I introduce a new context-weighted approximation algorithm with bounded complexity in context vector size that significantly reduces the system runtime with only a minor performance penalty. I also describe a parallelized version of the system that runs on a Beowulf cluster for the 2 billion word experiments. To evaluate the context-weighted similarity measure I compare ranked similarity lists against gold-standard resources using precision and recall-based measures from Information Retrieval, since the alternative, application-based evaluation, can often be influenced by distributional as well as semantic similarity. I also perform a detailed analysis of the final results using WORDNET. Finally, I apply my similarity metric to the task of assigning words to WORDNET semantic categories. I demonstrate that this new approach outperforms existing methods and overcomes some of their weaknesses. 005.3 Natural Language Processing ; thesauri
29	Improving statistical machine translation with linguistic information Hoang, Hieu January 2011 (has links) Statistical machine translation (SMT) should benefit from linguistic information to improve performance but current state-of-the-art models rely purely on data-driven models. There are several reasons why prior efforts to build linguistically annotated models have failed or not even been attempted. Firstly, the practical implementation often requires too much work to be cost effective. Where ad-hoc implementations have been created, they impose too strict constraints to be of general use. Lastly, many linguistically-motivated approaches are language dependent, tackling peculiarities in certain languages that do not apply to other languages. This thesis successfully integrates linguistic information about part-of-speech tags, lemmas and phrase structure to improve MT quality. The major contributions of this thesis are: 1. We enhance the phrase-based model to incorporate linguistic information as additional factors in the word representation. The factored phrase-based model allows us to make use of different types of linguistic information in a systematic way within the predefined framework. We show how this model improves translation by as much as 0.9 BLEU for small German-English training corpora, and 0.2 BLEU for larger corpora. 2. We extend the factored model to the factored template model to focus on improving reordering. We show that by generalising translation with part-of-speech tags, we can improve performance by as much as 1.1 BLEU on a small French- English system. 3. Finally, we switch from the phrase-based model to a syntax-based model with the mixed syntax model. This allows us to transition from the word-level approaches using factors to multiword linguistic information such as syntactic labels and shallow tags. The mixed syntax model uses source language syntactic information to inform translation. We show that the model is able to explain translation better, leading to a 0.8 BLEU improvement over the baseline hierarchical phrase-based model for a small German-English task. Also, the model requires only labels on continuous source spans, it is not dependent on a tree structure, therefore, other types of syntactic information can be integrated into the model. We experimented with a shallow parser and see a gain of 0.5 BLEU for the same dataset. Training with more training data, we improve translation by 0.6 BLEU (1.3 BLEU out-of-domain) over the hierarchical baseline. During the development of these three models, we discover that attempting to rigidly model translation as linguistic transfer process results in degraded performance. However, by combining the advantages of standard SMT models with linguistically-motivated models, we are able to achieve better translation performance. Our work shows the importance of balancing the specificity of linguistic information with the robustness of simpler models. 005.3 machine translation ; natural language
30	A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methods Sedghi, Elham 30 March 2017 (has links) Early detection and treatment of stroke can save lives. Before any procedure is planned, the patient is traditionally subjected to a brain scan such as Magnetic Resonance Imaging (MRI) in order to make sure he/she receives a safe treatment. Before any imaging is performed, the patient is checked into Emergency Room (ER) and clinicians from the Stroke Rapid Assessment Unit (SRAU) perform an evaluation of the patient's signs and symptoms. The question we address in this thesis is: Can Data Mining (DM) algorithms be employed to reliably predict the occurrence of stroke in a patient based on the signs and symptoms gathered by the clinicians and other staff in the ER or the SRAU? A reliable DM algorithm would be very useful in helping the clinicians make a better decision whether to escalate the case or classify it as a non-life threatening mimic and not put the patient through unnecessary imaging and tests. Such an algorithm would not only make the life of patients and clinicians easier but would also enable the hospitals to cut down on their costs. Most of the signs and symptoms gathered by clinicians in the ER or the SRAU are stored in free-text format in hospital information systems. Using techniques from Natural Language Processing (NLP), the vocabularies of interest can be extracted and classiffied. A big challenge in this process is that medical narratives are full of misspelled words and clinical abbreviations. It is a well known fact that the quality of data mining results crucially depends on the quality of input data. In this thesis, as a rst contribution, we describe a procedure to preprocess the raw data and transform it into clean, well-structured data that can be effectively used by DM learning algorithms. Another contribution of this thesis is producing a set of carefully crafted rules to perform detection of negated meaning in free-text sentences. Using these rules, we were able to get the correct semantics of sentences and provide much more useful datasets to DM learning algorithms. This thesis consists of three main parts. In the first part, we focus on building classi ers to reliably distinguish stroke and Transient Ischemic Attack (TIA) from mimic cases. For this, we used text extracted from the "chief complaint" and "history of patient illness" fields available in the patients' les at the Victoria General Hospital (VGH). In collaboration with stroke specialists, we identified a well-de ned set of stroke-related keywords. Next, we created practical tools to accurately assign keywords from this set to each patient. Then, we performed extensive experiments for nding the right learning algorithm to build the best classifier that provides a good balance between sensitivity, specificity, and a host of other quality indicators. In the second part, we focus on the most important mimic case, migraine, and how to e ectively distinguish it from stroke or TIA. This is a challenging problem because migraine has many signs and symptoms that are similar to those of stroke or TIA. Another challenge we address is the imbalance that our datasets have with respect to migraine. Namely the migraine cases are a minority of the overall cases. In order to alleviate this rarity problem, we propose a randomization procedure which is able to drastically improve the classi er quality. Finally, in the third part, we provide a detailed study on datamining algorithms for extracting the most important predictors that can help to detect and prevent Posterior circulation stroke. We compared our finding with the attributes reported by the Heart and Stroke Foundation of Canada, and the features found in our study performed better in accuracy, sensitivity, and ROC. / Graduate Data Mining Natural Language Processing

Search results