Global ETD Search

201	Proposition-based summarization with a coherence-driven incremental model Fang, Yimai January 2019 (has links) Summarization models which operate on meaning representations of documents have been neglected in the past, although they are a very promising and interesting class of methods for summarization and text understanding. In this thesis, I present one such summarizer, which uses the proposition as its meaning representation. My summarizer is an implementation of Kintsch and van Dijk's model of comprehension, which uses a tree of propositions to represent the working memory. The input document is processed incrementally in iterations. In each iteration, new propositions are connected to the tree under the principle of local coherence, and then a forgetting mechanism is applied so that only a few important propositions are retained in the tree for the next iteration. A summary can be generated using the propositions which are frequently retained. Originally, this model was only played through by hand by its inventors using human-created propositions. In this work, I turned it into a fully automatic model using current NLP technologies. First, I create propositions by obtaining and then transforming a syntactic parse. Second, I have devised algorithms to numerically evaluate alternative ways of adding a new proposition, as well as to predict necessary changes in the tree. Third, I compared different methods of modelling local coherence, including coreference resolution, distributional similarity, and lexical chains. In the first group of experiments, my summarizer realizes summary propositions by sentence extraction. These experiments show that my summarizer outperforms several state-of-the-art summarizers. The second group of experiments concerns abstractive generation from propositions, which is a collaborative project. I have investigated the option of compressing extracted sentences, but generation from propositions has been shown to provide better information packaging.
202	Rule Mining and Sequential Pattern Based Predictive Modeling with EMR Data Abar, Orhan 01 January 2019 (has links) Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patients’ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of Kentucky healthcare facilities, we explore data mining and machine learning methods for association rule (AR) mining and predictive modeling with mood and anxiety disorders as use-cases. Our first work involves analysis of existing quantitative measures of rule interestingness to assess how they align with a practicing psychiatrist’s sense of novelty/surprise corresponding to ARs identified from EMRs. Our second effort involves mining causal ARs with depression and anxiety disorders as target conditions through matching methods accounting for computationally identified confounding attributes. Our final effort involves efficient implementation (via GPUs) and application of contrast pattern mining to predictive modeling for mental conditions using various representational methods and recurrent neural networks. Overall, we demonstrate the effectiveness of rule mining methods in secondary analyses of EMR data for identifying causal associations and building predictive models for diseases. NLP Machine Learning Deep Learning Association Rule Mining Contrast Sequential Rule Mining Causal Association Artificial Intelligence and Robotics
203	”Man får uppskattning om man skriver kreativt men inte om man stavar kreativt” : en litteraturstudie om fenomenet dyslexi Jeppsson, Rebecka January 2007 (has links) <p>The aim of the study was to study the phenomenon of dyslexia on the basis of a literature study. I did a thorough exposition of what dyslexia is and its history and previous research in the area. Further I assumed two questions at issue which were What testimonies are there from the literature of how it is to live with dyslexia? and What learning styles are there for people with dyslexia? I have taken part of testimonies that are to be found about how it is to live with dyslexia according to the people themselves that live with it. In the second question at issue I move on to present a couple of learning styles that are used. The ones I present are Learning through your senses (Neurolinguistic programming), The Dunn and Dunn learning styles model and Support from the computer. The results were analysed on the basis of the Labelling theory. The results showed that a lot of people with dyslexia have similar experiences from how it is to live with dyslexia, among other things they have felt left outside and have met foolishness from their surroundings. The results from the learning styles models shows that especially the first two models takes consideration to the students unique learning style in different ways. A woman tells us about how she conquered her dyslexia.</p> Dyslexi Läs- och skrivsvårigheter Inlärning via sinnen Neurolingvistisk programmering NLP datorstöd Social work Socialt arbete
204	Automatisk FAQ med Latent Semantisk Analys Larsson, Patrik January 2009 (has links) <p>I denna uppsats presenteras teknik för att automatiskt besvara frågor skrivna i naturligt språk, givet att man har tillgång till en samling tidigare ställda frågor och deras respektive svar.</p><p>Jag bygger ett prototypsystem som utgår från en databas med epost-konversationer från HP Help Desk. Systemet kombinerar Latent Semantisk Analys med en täthetsbaserad klustringsalgoritm och en enkel klassificeringsalgoritm för att identifiera frekventa svar och besvara nya frågor.</p><p>De automatgenererade svaren utvärderas automatiskt och resultaten jämförs med de som tidigare presenterats för samma datamängd. Inverkan av olika parametrar studeras också i detalj.</p><p>Studien visar att detta tillvägagångssätt ger goda resultat, utan att man behöver utföra någon som helst lingvistisk förbearbetning.</p> NLP Vektorrumsmodeller Latent Semantisk Analys Qua- lity Threshold Clustering KNN Rouge HelpDesk Information Retri- eval TECHNOLOGY TEKNIKVETENSKAP
205	”Man får uppskattning om man skriver kreativt men inte om man stavar kreativt” : en litteraturstudie om fenomenet dyslexi Jeppsson, Rebecka January 2007 (has links) The aim of the study was to study the phenomenon of dyslexia on the basis of a literature study. I did a thorough exposition of what dyslexia is and its history and previous research in the area. Further I assumed two questions at issue which were What testimonies are there from the literature of how it is to live with dyslexia? and What learning styles are there for people with dyslexia? I have taken part of testimonies that are to be found about how it is to live with dyslexia according to the people themselves that live with it. In the second question at issue I move on to present a couple of learning styles that are used. The ones I present are Learning through your senses (Neurolinguistic programming), The Dunn and Dunn learning styles model and Support from the computer. The results were analysed on the basis of the Labelling theory. The results showed that a lot of people with dyslexia have similar experiences from how it is to live with dyslexia, among other things they have felt left outside and have met foolishness from their surroundings. The results from the learning styles models shows that especially the first two models takes consideration to the students unique learning style in different ways. A woman tells us about how she conquered her dyslexia. Dyslexi Läs- och skrivsvårigheter Inlärning via sinnen Neurolingvistisk programmering NLP datorstöd Social work Socialt arbete
206	Unsupervised Natural Language Processing for Knowledge Extraction from Domain-specific Textual Resources Hänig, Christian 25 April 2013 (has links) (PDF) This thesis aims to develop a Relation Extraction algorithm to extract knowledge out of automotive data. While most approaches to Relation Extraction are only evaluated on newspaper data dealing with general relations from the business world their applicability to other data sets is not well studied. Part I of this thesis deals with theoretical foundations of Information Extraction algorithms. Text mining cannot be seen as the simple application of data mining methods to textual data. Instead, sophisticated methods have to be employed to accurately extract knowledge from text which then can be mined using statistical methods from the field of data mining. Information Extraction itself can be divided into two subtasks: Entity Detection and Relation Extraction. The detection of entities is very domain-dependent due to terminology, abbreviations and general language use within the given domain. Thus, this task has to be solved for each domain employing thesauri or another type of lexicon. Supervised approaches to Named Entity Recognition will not achieve reasonable results unless they have been trained for the given type of data. The task of Relation Extraction can be basically approached by pattern-based and kernel-based algorithms. The latter achieve state-of-the-art results on newspaper data and point out the importance of linguistic features. In order to analyze relations contained in textual data, syntactic features like part-of-speech tags and syntactic parses are essential. Chapter 4 presents machine learning approaches and linguistic foundations being essential for syntactic annotation of textual data and Relation Extraction. Chapter 6 analyzes the performance of state-of-the-art algorithms of POS tagging, syntactic parsing and Relation Extraction on automotive data. The findings are: supervised methods trained on newspaper corpora do not achieve accurate results when being applied on automotive data. This is grounded in various reasons. Besides low-quality text, the nature of automotive relations states the main challenge. Automotive relation types of interest (e. g. component – symptom) are rather arbitrary compared to well-studied relation types like is-a or is-head-of. In order to achieve acceptable results, algorithms have to be trained directly on this kind of data. As the manual annotation of data for each language and data type is too costly and inflexible, unsupervised methods are the ones to rely on. Part II deals with the development of dedicated algorithms for all three essential tasks. Unsupervised POS tagging (Chapter 7) is a well-studied task and algorithms achieving accurate tagging exist. All of them do not disambiguate high frequency words, only out-of-lexicon words are disambiguated. Most high frequency words bear syntactic information and thus, it is very important to differentiate between their different functions. Especially domain languages contain ambiguous and high frequent words bearing semantic information (e. g. pump). In order to improve POS tagging, an algorithm for disambiguation is developed and used to enhance an existing state-of-the-art tagger. This approach is based on context clustering which is used to detect a word type’s different syntactic functions. Evaluation shows that tagging accuracy is raised significantly. An approach to unsupervised syntactic parsing (Chapter 8) is developed in order to suffice the requirements of Relation Extraction. These requirements include high precision results on nominal and prepositional phrases as they contain the entities being relevant for Relation Extraction. Furthermore, accurate shallow parsing is more desirable than deep binary parsing as it facilitates Relation Extraction more than deep parsing. Endocentric and exocentric constructions can be distinguished and improve proper phrase labeling. unsuParse is based on preferred positions of word types within phrases to detect phrase candidates. Iterating the detection of simple phrases successively induces deeper structures. The proposed algorithm fulfills all demanded criteria and achieves competitive results on standard evaluation setups. Syntactic Relation Extraction (Chapter 9) is an approach exploiting syntactic statistics and text characteristics to extract relations between previously annotated entities. The approach is based on entity distributions given in a corpus and thus, provides a possibility to extend text mining processes to new data in an unsupervised manner. Evaluation on two different languages and two different text types of the automotive domain shows that it achieves accurate results on repair order data. Results are less accurate on internet data, but the task of sentiment analysis and extraction of the opinion target can be mastered. Thus, the incorporation of internet data is possible and important as it provides useful insight into the customer\'s thoughts. To conclude, this thesis presents a complete unsupervised workflow for Relation Extraction – except for the highly domain-dependent Entity Detection task – improving performance of each of the involved subtasks compared to state-of-the-art approaches. Furthermore, this work applies Natural Language Processing methods and Relation Extraction approaches to real world data unveiling challenges that do not occur in high quality newspaper corpora. Text Mining Sprachverarbeitung Informationsextraktion Relationsextraktion POS Tagging Parsing Text Mining NLP Information Extraction Relation Extraction POS Tagging Parsing ddc:500
207	Automatisk FAQ med Latent Semantisk Analys Larsson, Patrik January 2009 (has links) I denna uppsats presenteras teknik för att automatiskt besvara frågor skrivna i naturligt språk, givet att man har tillgång till en samling tidigare ställda frågor och deras respektive svar. Jag bygger ett prototypsystem som utgår från en databas med epost-konversationer från HP Help Desk. Systemet kombinerar Latent Semantisk Analys med en täthetsbaserad klustringsalgoritm och en enkel klassificeringsalgoritm för att identifiera frekventa svar och besvara nya frågor. De automatgenererade svaren utvärderas automatiskt och resultaten jämförs med de som tidigare presenterats för samma datamängd. Inverkan av olika parametrar studeras också i detalj. Studien visar att detta tillvägagångssätt ger goda resultat, utan att man behöver utföra någon som helst lingvistisk förbearbetning. NLP Vektorrumsmodeller Latent Semantisk Analys Qua- lity Threshold Clustering KNN Rouge HelpDesk Information Retri- eval TECHNOLOGY TEKNIKVETENSKAP
208	Improvement Of Corpus-based Semantic Word Similarity Using Vector Space Model Esin, Yunus Emre 01 July 2009 (has links) (PDF) This study presents a new approach for finding semantically similar words from corpora using window based context methods. Previous studies mainly concentrate on either finding new combination of distance-weight measurement methods or proposing new context methods. The main difference of this new approach is that this study reprocesses the outputs of the existing methods to update the representation of related word vectors used for measuring semantic distance between words, to improve the results further. Moreover, this novel technique provides a solution to the data sparseness of vectors which is a common problem in methods which uses vector space model. The main advantage of this new approach is that it is applicable to many of the existing word similarity methods using the vector space model. The other and the most important advantage of this approach is that it improves the performance of some of these existing word similarity measuring methods.
209	Application of common sense computing for the development of a novel knowledge-based opinion mining engine Erik, Cambria January 2011 (has links) The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience. 006.3
210	Natural language processing techniques for the purpose of sentinel event information extraction Barrett, Neil 23 November 2012 (has links) An approach to biomedical language processing is to apply existing natural language processing (NLP) solutions to biomedical texts. Often, existing NLP solutions are less successful in the biomedical domain relative to their non-biomedical domain performance (e.g., relative to newspaper text). Biomedical NLP is likely best served by methods, information and tools that account for its particular challenges. In this thesis, I describe an NLP system specifically engineered for sentinel event extraction from clinical documents. The NLP system's design accounts for several biomedical NLP challenges. The specific contributions are as follows. - Biomedical tokenizers differ, lack consensus over output tokens and are difficult to extend. I developed an extensible tokenizer, providing a tokenizer design pattern and implementation guidelines. It evaluated as equivalent to a leading biomedical tokenizer (MedPost). - Biomedical part-of-speech (POS) taggers are often trained on non-biomedical corpora and applied to biomedical corpora. This results in a decrease in tagging accuracy. I built a token centric POS tagger, TcT, that is more accurate than three existing POS taggers (mxpost, TnT and Brill) when trained on a non-biomedical corpus and evaluated on biomedical corpora. TcT achieves this increase in tagging accuracy by ignoring previously assigned POS tags and restricting the tagger's scope to the current token, previous token and following token. - Two parsers, MST and Malt, have been evaluated using perfect POS tag input. Given that perfect input is unlikely in biomedical NLP tasks, I evaluated these two parsers on imperfect POS tag input and compared their results. MST was most affected by imperfectly POS tagged biomedical text. I attributed MST's drop in performance to verbs and adjectives where MST had more potential for performance loss than Malt. I attributed Malt's resilience to POS tagging errors to its use of a rich feature set and a local scope in decision making. - Previous automated clinical coding (ACC) research focuses on mapping narrative phrases to terminological descriptions (e.g., concept descriptions). These methods make little or no use of the additional semantic information available through topology. I developed a token-based ACC approach that encodes tokens and manipulates token-level encodings by mapping linguistic structures to topological operations in SNOMED CT. My ACC method recalled most concepts given their descriptions and performed significantly better than MetaMap. I extended my contributions for the purpose of sentinel event extraction from clinical letters. The extensions account for negation in text, use medication brand names during ACC and model (coarse) temporal information. My software system's performance is similar to state-of-the-art results. Given all of the above, my thesis is a blueprint for building a biomedical NLP system. Furthermore, my contributions likely apply to NLP systems in general. / Graduate natural language processing medical language processing biomedical language processing sentinel event clinical documents NLP MLP CLU clinical language processing

Search results