Spelling suggestions: "subject:"batural language processing."" "subject:"datural language processing.""
11 |
Planning multisentential English text using communicative actsMaybury, Mark Thomas January 1991 (has links)
The goal of this research is to develop explanation presentation mechanisms for knowledge based systems which enable them to define domain terminology and concepts, narrate events, elucidate plans, processes, or propositions and argue to support a claim or advocate action. This requires the development of devices which select, structure, order and then linguistically realize explanation content as coherent and cohesive English text. With the goal of identifying generic explanation presentation strategies, a wide range of naturally occurring texts were analyzed with respect to their communicative sttucture, function, content and intended effects on the reader. This motivated an integrated theory of communicative acts which characterizes text at the level of rhetorical acts (e.g., describe, define, narrate), illocutionary acts (e.g., inform, request), and locutionary acts (e.g., ask, command). Taken as a whole, the identified communicative acts characterize the structure, content and intended effects of four types of text: description, narration, exposition, argument. These text types have distinct effects such as getting the reader to know about entities, to know about events, to understand plans, processes, or propositions, or to believe propositions or want to perform actions. In addition to identifying the communicative function and effect of text at multiple levels of abstraction, this dissertation details a tripartite theory of focus of attention (discourse focus, temporal focus, and spatial focus) which constrains the planning and linguistic realization of text. To test the integrated theory of communicative acts and tripartite theory of focus of attention, a text generation system TEXPLAN (Textual EXplanation PLANner) was implemented that plans and linguistically realizes multisentential and multiparagraph explanations from knowledge based systems. The communicative acts identified during text analysis were formalized as over sixty compositional and (in some cases) recursive plan operators in the library of a hierarchical planner. Discourse, temporal, and spatial focus models were implemented to track and use attentional information to guide the organization and realization of text. Because the plan operators distinguish between the communicative function (e.g., argue for a proposition) and the expected effect (e.g., the reader believes the proposition) of communicative acts, the system is able to construct a discourse model of the structure and function of its textual responses as well as a user model of the expected effects of its responses on the reader's knowledge, beliefs, and desires. The system uses both the discourse model and user model to guide subsequent utterances. To test its generality, the system was interfaced to a variety of domain applications including a neuropsychological diagnosis system, a mission planning system, and a knowledge based mission simulator. The system produces descriptions, narrations, expositions, and arguments from these applications, thus exhibiting a broader range of rhetorical coverage than previous text generation systems.
|
12 |
Analysis of Moving Events Using TweetsPatil, Supritha Basavaraj 02 July 2019 (has links)
The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events.
I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date. / Master of Science / News now travels faster on social media than through news channels. Information from social media can help retrieve minute details that might not be emphasized in news. People tend to describe their actions or sentiments in tweets. I aim at studying if such collections of tweets are dependable sources for identifying paths of moving events. In events like hurricanes, using Twitter can help in analyzing people’s reaction to such moving events. These may include actions such as dislocation or emotions during different phases of the event. The results obtained in the experiments concur with the actual path of the events with respect to the regions affected and time. The frequency of tweets increases during event peaks. The number of locations affected that are identified are significantly more than in news wires.
|
13 |
Examination of Gender Bias in News ArticlesDamin Zhang (11814182) 19 December 2021 (has links)
Reading news articles from online sources has become a major choice of obtaining
information for many people. Authors who wrote news articles could introduce their own biases
either unintentionally or intentionally by using or choosing to use different words to describe
otherwise neutral and factual information. Such intentional word choices could create conflicts
among different social groups, showing explicit and implicit biases. Any type of biases within the
text could affect the reader’s view of the information. One type of biases in natural language is
gender bias that had been discovered in a lot of Natural Language Processing (NLP) models,
largely attributed to implicit biases in the training text corpora. Analyzing gender bias or
stereotypes in such large corpora is a hard task. Previous methods of bias detection were applied
to short text like tweets, and to manually built datasets, but little works had been done on long text
like news articles in large corpora. Simply detecting bias on annotated text does not help to
understand how it was generated and reproduced. Instead, we used structural topic modeling on a
large unlabelled corpus of news articles, incorporated qualitative results and quantitative analysis
to examine how gender bias was generated and reproduced. This research extends the prior
knowledge of bias detection and proposed a method for understanding gender bias in real-world
settings. We found that author gender correlated to the topic-gender prevalence and skewed
media-gender distribution assist understanding gender bias within news articles.
|
14 |
From distributional to semantic similarityCurran, James Richard January 2004 (has links)
Lexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited coverage, bias and inconsistency. Automated and semi-automated methods for developing such resources are therefore crucial for further resource development and improved application performance. Systems that extract thesauri often identify similar words using the distributional hypothesis that similar words appear in similar contexts. This approach involves using corpora to examine the contexts each word appears in and then calculating the similarity between context distributions. Different definitions of context can be used, and I begin by examining how different types of extracted context influence similarity. To be of most benefit these systems must be capable of finding synonyms for rare words. Reliable context counts for rare events can only be extracted from vast collections of text. In this dissertation I describe how to extract contexts from a corpus of over 2 billion words. I describe techniques for processing text on this scale and examine the trade-off between context accuracy, information content and quantity of text analysed. Distributional similarity is at best an approximation to semantic similarity. I develop improved approximations motivated by the intuition that some events in the context distribution are more indicative of meaning than others. For instance, the object-of-verb context wear is far more indicative of a clothing noun than get. However, existing distributional techniques do not effectively utilise this information. The new context-weighted similarity metric I propose in this dissertation significantly outperforms every distributional similarity metric described in the literature. Nearest-neighbour similarity algorithms scale poorly with vocabulary and context vector size. To overcome this problem I introduce a new context-weighted approximation algorithm with bounded complexity in context vector size that significantly reduces the system runtime with only a minor performance penalty. I also describe a parallelized version of the system that runs on a Beowulf cluster for the 2 billion word experiments. To evaluate the context-weighted similarity measure I compare ranked similarity lists against gold-standard resources using precision and recall-based measures from Information Retrieval, since the alternative, application-based evaluation, can often be influenced by distributional as well as semantic similarity. I also perform a detailed analysis of the final results using WORDNET. Finally, I apply my similarity metric to the task of assigning words to WORDNET semantic categories. I demonstrate that this new approach outperforms existing methods and overcomes some of their weaknesses.
|
15 |
A novel stroke prediction model based on clinical natural language processing (NLP) and data mining methodsSedghi, Elham 30 March 2017 (has links)
Early detection and treatment of stroke can save lives. Before any procedure is
planned, the patient is traditionally subjected to a brain scan such as Magnetic Resonance Imaging (MRI) in order to make sure he/she receives a safe treatment. Before any imaging is performed, the patient is checked into Emergency Room (ER) and clinicians from the Stroke Rapid Assessment Unit (SRAU) perform an evaluation of the patient's signs and symptoms. The question we address in this thesis is: Can Data Mining (DM) algorithms be employed to reliably predict the occurrence of stroke in a patient based on the signs and symptoms gathered by the clinicians and other staff in the ER or the SRAU? A reliable DM algorithm would be very useful in helping the clinicians make a better decision whether to escalate the case or classify it as a non-life threatening mimic and not put the patient through unnecessary imaging and tests. Such an algorithm would not only make the life of patients and clinicians easier but would also enable the hospitals to cut down on their costs. Most of the signs and symptoms gathered by clinicians in the ER or the SRAU are stored in free-text format in hospital information systems. Using techniques from Natural Language Processing (NLP), the vocabularies of interest can be extracted and classiffied. A big challenge in this process is that medical narratives are full of misspelled words and clinical abbreviations. It is a well known fact that the quality of data mining results crucially depends on the quality of input data. In this thesis, as a rst contribution, we describe a procedure to preprocess the raw data and transform it into clean, well-structured data that can be effectively used by DM learning algorithms. Another contribution of this thesis is producing a set of carefully crafted rules to perform detection of negated meaning in free-text sentences. Using these rules, we were able to get the correct semantics of sentences and provide much more useful datasets to DM learning algorithms. This thesis consists of three main parts. In the first part, we focus on building classi ers to reliably distinguish stroke and Transient Ischemic Attack (TIA) from mimic cases. For this, we used text extracted from the "chief complaint" and "history of patient illness" fields available in the patients' les at the Victoria General Hospital (VGH). In collaboration with stroke specialists, we identified a well-de ned set of stroke-related keywords. Next, we created practical tools to accurately assign keywords from this set to each patient. Then, we performed extensive experiments for nding the right learning algorithm to build the best classifier that provides a good balance between sensitivity, specificity, and a host of other quality indicators. In the second part, we focus on the most important mimic case, migraine, and how to e ectively distinguish it from stroke or TIA. This is a challenging problem because migraine has many signs and symptoms that are similar to those of stroke or TIA. Another challenge we address is the imbalance that our datasets have with respect to migraine. Namely the migraine cases are a minority of the overall cases. In order to alleviate this rarity problem, we propose a randomization procedure which is able to drastically improve the classi er quality. Finally, in the third part, we provide a detailed study on datamining algorithms for extracting the most important predictors that can help to detect and prevent Posterior circulation stroke. We compared our finding with the attributes reported by the Heart and Stroke Foundation of Canada, and the features found in our study performed better in accuracy, sensitivity, and ROC. / Graduate
|
16 |
An architecture for the semantic processing of natural language input to a policy workbenchCusty, E. John 03 1900 (has links)
Approved for Public Release; distribution is unlimited / Formal methods hold significant potential for automating the development, refinement, and implementation of policy. For this potential to be realized, however, improved techniques are required for converting natural-language statements of policy into a computational form. In this paper we present and analyze an architecture for carrying out this conversion. The architecture employs semantic networks to represent both policy statements and objects in the domain of those statements. We present a case study which illustrates how a system based on this architecture could be developed. The case study consists of an analysis of natural language policy statements taken from a policy document for web sites at a university, and is carried out with support from a software tool we developed which converts text output from a natural language parser into a graphical form. / Naval Postgraduate School author (civilian).
|
17 |
SemNet : the knowledge representation of LOLITABaring-Gould, Sengan January 2000 (has links)
Many systems of Knowledge Representation exist, but none were designed specifically for general purpose large scale natural language processing. This thesis introduces a set of metrics to evaluate the suitability of representations for this purpose, derived from an analysis of the problems such processing introduces. These metrics address three broad categories of question: Is the representation sufficiently expressive to perform its task? What implications has its design on the architecture of the system using it? What inefficiencies are intrinsic to its design? An evaluation of existing Knowledge Representation systems reveals that none of them satisfies the needs of general purpose large scale natural language processing. To remedy this lack, this thesis develops a new representation: SemNet. SemNet benefits not only from the detailed requirements analysis but also from insights gained from its use as the core representation of the large scale general purpose system LOLITA (Large-scale Object-based Linguistic Interactor, Translator, and Analyser). The mapping process between Natural language and representation is presented in detail, showing that the representation achieves its goals in practice.
|
18 |
Improvement on belief network framework for natural language understanding.January 2003 (has links)
Mok, Oi Yan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 94-99). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Goals --- p.3 / Chapter 1.3 --- Thesis Outline --- p.4 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Natural Language Understanding --- p.5 / Chapter 2.1.1 --- Rule-based Approaches --- p.7 / Chapter 2.1.2 --- Phrase-spotting Approaches --- p.8 / Chapter 2.1.3 --- Stochastic Approaches --- p.9 / Chapter 2.2 --- Belief Network Framework - the N Binary Formulation --- p.11 / Chapter 2.2.1 --- Introduction of Belief Network --- p.11 / Chapter 2.2.2 --- The N Binary Formulation --- p.13 / Chapter 2.2.3 --- Semantic Tagging --- p.13 / Chapter 2.2.4 --- Belief Networks Development --- p.14 / Chapter 2.2.5 --- Goal Inference --- p.15 / Chapter 2.2.6 --- Potential Problems --- p.16 / Chapter 2.3 --- The ATIS Domain --- p.17 / Chapter 2.4 --- Chapter Summary --- p.19 / Chapter 3 --- Belief Network Framework - the One N-ary Formulation --- p.21 / Chapter 3.1 --- The One N-ary Formulation --- p.22 / Chapter 3.2 --- Belief Network Development --- p.23 / Chapter 3.3 --- Goal Inference --- p.24 / Chapter 3.3.1 --- Multiple Selection Strategy --- p.25 / Chapter 3.3.2 --- Maximum Selection Strategy --- p.26 / Chapter 3.4 --- Advantages of the One N-ary Formulation --- p.27 / Chapter 3.5 --- Chapter Summary --- p.29 / Chapter 4 --- Evaluation on the N Binary and the One N-ary Formula- tions --- p.30 / Chapter 4.1 --- Evaluation Metrics --- p.31 / Chapter 4.1.1 --- Accuracy Measure --- p.32 / Chapter 4.1.2 --- Macro-Averaging --- p.32 / Chapter 4.1.3 --- Micro-Averaging --- p.35 / Chapter 4.2 --- Experiments --- p.35 / Chapter 4.2.1 --- Network Dimensions --- p.38 / Chapter 4.2.2 --- Thresholds --- p.39 / Chapter 4.2.3 --- Overall Goal Identification --- p.43 / Chapter 4.2.4 --- Out-Of-Domain Rejection --- p.65 / Chapter 4.2.5 --- Multiple Goal Identification --- p.67 / Chapter 4.2.6 --- Computation --- p.68 / Chapter 4.3 --- Chapter Summary --- p.70 / Chapter 5 --- Portability to Chinese --- p.72 / Chapter 5.1 --- The Chinese ATIS Domain --- p.72 / Chapter 5.1.1 --- Word Tokenization and Parsing --- p.73 / Chapter 5.2 --- Experiments --- p.74 / Chapter 5.2.1 --- Network Dimension --- p.76 / Chapter 5.2.2 --- Overall Goal Identification --- p.77 / Chapter 5.2.3 --- Out-Of-Domain Rejection --- p.83 / Chapter 5.2.4 --- Multiple Goal Identification --- p.86 / Chapter 5.3 --- Chapter Summary --- p.88 / Chapter 6 --- Conclusions --- p.39 / Chapter 6.1 --- Summary --- p.89 / Chapter 6.2 --- Contributions --- p.91 / Chapter 6.3 --- Future Work --- p.92 / Bibliography --- p.94 / Chapter A --- The Communicative Goals --- p.100 / Chapter B --- Distribution of the Communicative Goals --- p.101 / Chapter C --- The Hand-Designed Grammar Rules --- p.103 / Chapter D --- The Selected Concepts for each Belief Network --- p.115 / Chapter E --- The Recalls and Precisions of the Goal Identifiers in Macro- Averaging --- p.125
|
19 |
Natural language response generation in mixed-initiative dialogs.January 2004 (has links)
Yip Wing Lin Winnie. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 102-105). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Thesis Goals --- p.3 / Chapter 1.3 --- Thesis Outline --- p.5 / Chapter 2 --- Background --- p.6 / Chapter 2.1 --- Natural Language Generation --- p.6 / Chapter 2.1.1 --- Template-based Approach --- p.7 / Chapter 2.1.2 --- Rule-based Approach --- p.8 / Chapter 2.1.3 --- Statistical Approach --- p.9 / Chapter 2.1.4 --- Hybrid Approach --- p.10 / Chapter 2.1.5 --- Machine Learning Approach --- p.11 / Chapter 2.2 --- Evaluation Method --- p.12 / Chapter 2.2.1 --- Cooperative Principles --- p.13 / Chapter 2.3 --- Chapter Summary --- p.13 / Chapter 3 --- Natural Language Understanding --- p.14 / Chapter 3.1 --- The CUHK Restaurant Domain --- p.15 / Chapter 3.2 --- "Task Goals, Dialog Acts, Concept Categories and Annotation" --- p.17 / Chapter 3.2.1 --- Task Goals (TGs) and Dialog Acts (DAs) --- p.17 / Chapter 3.2.2 --- Concept Categories (CTG/CDA) --- p.20 / Chapter 3.2.3 --- Utterance Segmentation and Annotation --- p.21 / Chapter 3.3 --- Task Goal and Dialog Act Identification --- p.22 / Chapter 3.3.1 --- Belief Networks Development --- p.22 / Chapter 3.3.2 --- Task Goal and Dialog Act Inference --- p.24 / Chapter 3.3.3 --- Network Dimensions --- p.25 / Chapter 3.4 --- Chapter Summary --- p.29 / Chapter 4 --- Automatic Utterance Segmentation --- p.30 / Chapter 4.1 --- Utterance Definition --- p.31 / Chapter 4.2 --- Segmentation Procedure --- p.33 / Chapter 4.2.1 --- Tokenization --- p.35 / Chapter 4.2.2 --- POS Tagging --- p.36 / Chapter 4.2.3 --- Multi-Parser Architecture (MPA) Language Parsing --- p.38 / Chapter 4.2.4 --- Top-down Generalized Representation --- p.40 / Chapter 4.3 --- Evaluation --- p.47 / Chapter 4.3.1 --- Results --- p.47 / Chapter 4.3.2 --- Analysis --- p.48 / Chapter 4.4 --- Chapter Summary --- p.50 / Chapter 5 --- Natural Language Response Generation --- p.52 / Chapter 5.1 --- System Overview --- p.52 / Chapter 5.2 --- Corpus-derived Dialog State Transition Rules --- p.55 / Chapter 5.3 --- Hand-designed Text Generation Templates --- p.56 / Chapter 5.4 --- Performance Evaluation --- p.59 / Chapter 5.4.1 --- Task Completion Rate --- p.61 / Chapter 5.4.2 --- Grice's Maxims and Perceived User Satisfaction --- p.62 / Chapter 5.4.3 --- Error Analysis --- p.64 / Chapter 5.5 --- Chapter Summary --- p.65 / Chapter 6 --- Bilingual Response Generation using Semi-Automatically- Induced Response Templates --- p.67 / Chapter 6.1 --- Response Data --- p.68 / Chapter 6.2 --- Semi-Automatic Grammar Induction --- p.69 / Chapter 6.2.1 --- Agglomerative Clustering --- p.69 / Chapter 6.2.2 --- Parameters Selection --- p.70 / Chapter 6.3 --- Application to Response Grammar Induction --- p.71 / Chapter 6.3.1 --- Parameters Selection --- p.73 / Chapter 6.3.2 --- Unsupervised Grammar Induction --- p.76 / Chapter 6.3.3 --- Post-processing --- p.80 / Chapter 6.3.4 --- Prior Knowledge Injection --- p.82 / Chapter 6.4 --- Response Templates Generation --- p.84 / Chapter 6.4.1 --- Induced Response Grammar --- p.84 / Chapter 6.4.2 --- Template Formation --- p.84 / Chapter 6.4.3 --- Bilingual Response Templates --- p.89 / Chapter 6.5 --- Evaluation --- p.89 / Chapter 6.5.1 --- "Task Completion Rate, Grice's Maxims and User Sat- isfaction" --- p.91 / Chapter 6.6 --- Chapter Summary --- p.94 / Chapter 7 --- Conclusion --- p.96 / Chapter 7.1 --- Summary --- p.96 / Chapter 7.2 --- Contributions --- p.98 / Chapter 7.3 --- Future Work --- p.100 / Bibliography --- p.102 / Chapter A --- Domain-Specific Task Goals in the CUHK Restaurants Do- main --- p.107 / Chapter B --- Full List of VERBMOBIL-2 Dialog Acts --- p.109 / Chapter C --- Dialog Acts for Customer Requests and Waiter Responsesin the CUHK Restaurants Domain --- p.111 / Chapter D --- Grammar for Task Goal and Dialog Act Identification --- p.116 / Chapter E --- Utterance Definition --- p.119 / Chapter F --- Dialog State Transition Rules --- p.121 / Chapter G --- Full List of Templates Selection Conditions --- p.125 / Chapter H --- Hand-designed Text Generation Templates --- p.130 / Chapter I --- Evaluation Test Questionnaire for Dialog System in the CUHK Restaurant Domain --- p.135 / Chapter J --- POS Tags --- p.137 / Chapter K --- Full List of Lexicon and contextual rule modifications --- p.139 / Chapter L --- Top-down Generalized Representations --- p.141 / Chapter M --- Sample Outputs for Automatic Utterance Segmentation --- p.144 / Chapter N --- Induced Grammar --- p.145 / Chapter O --- Seeded Categories --- p.148 / Chapter P --- Semi-Automatically-Induced Response Templates --- p.150 / Chapter Q --- Details of the Statistical Testing Regarding Grice's Maxims and User Satisfaction --- p.156
|
20 |
Inference of string mappings for speech technologyJansche, Martin, January 2003 (has links)
Thesis (Ph. D.)--Ohio State University, 2003. / Title from first page of PDF file. Document formatted into pages; contains xv, 268 p.; also includes graphics. Includes abstract and vita. Advisor: Chris Brew, Dept. of Linguistics. Includes bibliographical references (p. 252-266) and index.
|
Page generated in 0.5488 seconds