Global ETD Search

181	MINING CAUSAL ASSOCIATIONS FROM GERIATRIC LITERATURE Krishnan, Anand 14 August 2013 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Literature pertaining to geriatric care contains rich information regarding the best practices related to geriatric health care issues. The publication domain of geriatric care is small as compared to other health related areas, however, there are over a million articles pertaining to different cases and case interventions capturing best practice outcomes. If the data found in these articles could be harvested and processed effectively, such knowledge could then be translated from research to practice in a quicker and more efficient manner. Geriatric literature contains multiple domains or practice areas and within these domains is a wealth of information such as interventions, information on care for elderly, case studies, and real life scenarios. These articles are comprised of a variety of causal relationships such as the relationship between interventions and disorders. The goal of this study is to identify these causal relations from published abstracts. Natural language processing and statistical methods were adopted to identify and extract these causal relations. Using the developed methods, causal relations were extracted with precision of 79.54%, recall of 81% while only having a false positive rate 8%. Geriatrics -- Research Data mining Medical informatics Bayesian statistical decision theory Causation
182	The Acquisition Of Lexical Knowledge From The Web For Aspects Of Semantic Interpretation Schwartz, Hansen A 01 January 2011 (has links) This work investigates the effective acquisition of lexical knowledge from the Web to perform semantic interpretation. The Web provides an unprecedented amount of natural language from which to gain knowledge useful for semantic interpretation. The knowledge acquired is described as common sense knowledge, information one uses in his or her daily life to understand language and perception. Novel approaches are presented for both the acquisition of this knowledge and use of the knowledge in semantic interpretation algorithms. The goal is to increase accuracy over other automatic semantic interpretation systems, and in turn enable stronger real world applications such as machine translation, advanced Web search, sentiment analysis, and question answering. The major contributions of this dissertation consist of two methods of acquiring lexical knowledge from the Web, namely a database of common sense knowledge and Web selectors. The first method is a framework for acquiring a database of concept relationships. To acquire this knowledge, relationships between nouns are found on the Web and analyzed over WordNet using information-theory, producing information about concepts rather than ambiguous words. For the second contribution, words called Web selectors are retrieved which take the place of an instance of a target word in its local context. The selectors serve for the system to learn the types of concepts that the sense of a target word should be similar. Web selectors are acquired dynamically as part of a semantic interpretation algorithm, while the relationships in the database are useful to iii stand-alone programs. A final contribution of this dissertation concerns a novel semantic similarity measure and an evaluation of similarity and relatedness measures on tasks of concept similarity. Such tasks are useful when applying acquired knowledge to semantic interpretation. Applications to word sense disambiguation, an aspect of semantic interpretation, are used to evaluate the contributions. Disambiguation systems which utilize semantically annotated training data are considered supervised. The algorithms of this dissertation are considered minimallysupervised; they do not require training data created by humans, though they may use humancreated data sources. In the case of evaluating a database of common sense knowledge, integrating the knowledge into an existing minimally-supervised disambiguation system significantly improved results – a 20.5% error reduction. Similarly, the Web selectors disambiguation system, which acquires knowledge directly as part of the algorithm, achieved results comparable with top minimally-supervised systems, an F-score of 80.2% on a standard noun disambiguation task. This work enables the study of many subsequent related tasks for improving semantic interpretation and its application to real-world technologies. Other aspects of semantic interpretation, such as semantic role labeling could utilize the same methods presented here for word sense disambiguation. As the Web continues to grow, the capabilities of the systems in this dissertation are expected to increase. Although the Web selectors system achieves great results, a study in this dissertation shows likely improvements from acquiring more data. Furthermore, the methods for acquiring a database of common sense knowledge could be applied in a more exhaustive fashion for other types of common sense knowledge. Finally, perhaps the greatest benefits from this work will come from the enabling of real world technologies that utilize semantic interpretation. Commonsense reasoning Computational linguistics Knowledge acquisition (Expert systems) Lexicology -- Data processing Semantics World wide web Electrical and Computer Engineering Electrical and Electronics Engineering
183	Determining Whether and When People Participate in the Events They Tweet About Sanagavarapu, Krishna Chaitanya 05 1900 (has links) This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions. Twitter events author participation machine learning natural language processing social media corpus analysis Computer Science Microblogs. Discourse analysis.
184	Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo / Automatic filtering of false opinions: comprehensive comparison of content-based methods Cardoso, Emerson Freitas 04 August 2017 (has links) Submitted by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T17:30:32Z No. of bitstreams: 1 CARDOSO_Emerson_2017.pdf: 3299853 bytes, checksum: bda5605a1fb8e64f503215e839d2a9a6 (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T17:30:45Z (GMT) No. of bitstreams: 1 CARDOSO_Emerson_2017.pdf: 3299853 bytes, checksum: bda5605a1fb8e64f503215e839d2a9a6 (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T17:32:37Z (GMT) No. of bitstreams: 1 CARDOSO_Emerson_2017.pdf: 3299853 bytes, checksum: bda5605a1fb8e64f503215e839d2a9a6 (MD5) / Made available in DSpace on 2017-10-09T17:32:49Z (GMT). No. of bitstreams: 1 CARDOSO_Emerson_2017.pdf: 3299853 bytes, checksum: bda5605a1fb8e64f503215e839d2a9a6 (MD5) Previous issue date: 2017-08-04 / Não recebi financiamento / Before buying a product or choosing for a trip destination, people often seek other people’s opinions to obtain a vision of the quality of what they want to acquire. Given that, opinions always had great influence on the purchase decision. Following the enhancements of the Internet and a huge increase in the volume of data traffic, social networks were created to help users post and view all kinds of information, and this caused people to also search for opinions on the Web. Sites like TripAdvisor and Yelp make it easier to share online reviews, since they help users to post their opinions from anywhere via smartphones and enable product manufacturers to gain relevant feedback quickly in a centralized way. As a result, most people nowadays trust personal recommendations as much as online reviews. However, competition between service providers and product manufacturers have also increased in social media, leading to the first cases of spam reviews: deceptive opinions published by hired people that try to promote or defame products or businesses. These reviews are carefully written in order to look like authentic ones, making it difficult to be detected by humans or automatic methods. Thus, they are used, in a misleading way, in attempt to control the general opinion, causing financial harm to business owners and users. Several approaches have been proposed for spam review detection and most of them use techniques involving machine learning and natural language processing. However, despite all progress made, there are still relevant questions that remain open, which require a criterious analysis in order to be properly answered. For instance, there is no consensus whether the performance of traditional classification methods can be affected by incremental learning or changes in reviews’ features over time; also, there is no consensus whether there is statistical difference between performances of content-based classification methods. In this scenario, this work offers a comprehensive comparison between traditional machine learning methods applied in spam review detection. This comparison is made in multiple setups, employing different types of learning and data sets. The experiments performed along with statistical analysis of the results corroborate offering appropriate answers to the existing questions. In addition, all results obtained can be used as baseline for future comparisons. / Antes de comprar um produto ou escolher um destino de viagem, muitas pessoas costumam buscar por opiniões alheias para obter uma visão da qualidade daquilo que se deseja adquirir. Assim, as opiniões sempre exerceram grande influência na decisão de compra. Com o avanço da Internet e aumento no volume de informações trafegadas, surgiram redes sociais que possibilitam compartilhar e visualizar informações de todo o tipo, fazendo com que pessoas passassem a buscar também por opiniões na Web. Atualmente, sites especializados, como TripAdvisor e Yelp, oferecem um sistema de compartilhamento de opiniões online (reviews) de maneira fácil, pois possibilitam que usuários publiquem suas opiniões de qualquer lugar através de smartphones, assim como também permitem que fabricantes de produtos e prestadores de serviços obtenham feedbacks relevantes de maneira centralizada e rápida. Em virtude disso, estudos indicam que atualmente a maioria dos usuários confia tanto em recomendações pessoais quanto em reviews online. No entanto, a competição entre prestadores de serviços e fabricantes de produtos também aumentou nas redes sociais, o que levou aos primeiros casos de spam reviews: opiniões enganosas publicadas por pessoas contratadas que tentam promover ou difamar produtos ou serviços. Esses reviews são escritos cuidadosamente para parecerem autênticos, o que dificulta sua detecção por humanos ou por métodos automáticos. Assim, eles são usados para tentar, de maneira enganosa, controlar a opinião geral, podendo causar prejuízos para empresas e usuários. Diversas abordagens para a detecção de spam reviews vêm sendo propostas, sendo que a grande maioria emprega técnicas de aprendizado de máquina e processamento de linguagem natural. No entanto, apesar dos avanços já realizados, ainda há questionamentos relevantes que permanecem em aberto e demandam uma análise criteriosa para serem respondidos. Por exemplo, não há um consenso se o desempenho de métodos tradicionais de classificação pode ser afetado em cenários que demandam aprendizado incremental ou por mudanças nas características dos reviews devido ao fator cronológico, assim como também não há um consenso se existe diferença estatística entre os desempenhos dos métodos baseados no conteúdo das mensagens. Neste cenário, esta dissertação oferece uma análise e comparação compreensiva dos métodos tradicionais de aprendizado de máquina, aplicados na detecção de spam reviews. A comparação é realizada em múltiplos cenários, empregando-se diferentes tipos de aprendizado e bases de dados. Os experimentos realizados, juntamente com análise estatística dos resultados, corroboram a oferecer respostas adequadas para os questionamentos existentes. Além disso, os resultados obtidos podem ser usados como baseline para comparações futuras. Spam (Mensagens eletrônicas) Opiniões falsas Classificação Processamento de linguagem natural Aprendizado de máquina Spam (Electronic mail) Spam reviews Classification Natural language processing Machine learning
185	Normalização textual e indexação semântica aplicadas da filtragem de SMS spam / Texto normalization and semantic indexing to enhance SMS spam filtering Silva, Tiago Pasqualini da 01 July 2016 (has links) Submitted by Milena Rubi (milenarubi@ufscar.br) on 2017-06-01T17:49:19Z No. of bitstreams: 1 SILVA_Tiago_2016.pdf: 13631569 bytes, checksum: 7774c3913aa556cc48c0669f686cd3b5 (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-06-01T17:49:26Z (GMT) No. of bitstreams: 1 SILVA_Tiago_2016.pdf: 13631569 bytes, checksum: 7774c3913aa556cc48c0669f686cd3b5 (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-06-01T17:49:32Z (GMT) No. of bitstreams: 1 SILVA_Tiago_2016.pdf: 13631569 bytes, checksum: 7774c3913aa556cc48c0669f686cd3b5 (MD5) / Made available in DSpace on 2017-06-01T17:49:38Z (GMT). No. of bitstreams: 1 SILVA_Tiago_2016.pdf: 13631569 bytes, checksum: 7774c3913aa556cc48c0669f686cd3b5 (MD5) Previous issue date: 2016-07-01 / Não recebi financiamento / The rapid popularization of smartphones has contributed to the growth of SMS usage as an alternative way of communication. The increasing number of users, along with the trust they inherently have in their devices, makes SMS messages a propitious environment for spammers. In fact, reports clearly indicate that volume of mobile phone spam is dramatically increasing year by year. SMS spam represents a challenging problem for traditional filtering methods nowadays, since such messages are usually fairly short and normally rife with slangs, idioms, symbols and acronyms that make even tokenization a difficult task. In this scenario, this thesis proposes and then evaluates a method to normalize and expand original short and messy SMS text messages in order to acquire better attributes and enhance the classification performance. The proposed text processing approach is based on lexicography and semantic dictionaries along with the state-of-the-art techniques for semantic analysis and context detection. This technique is used to normalize terms and create new attributes in order to change and expand original text samples aiming to alleviate factors that can degrade the algorithms performance, such as redundancies and inconsistencies. The approach was validated with a public, real and non-encoded dataset along with several established machine learning methods. The experiments were diligently designed to ensure statistically sound results which indicate that the proposed text processing techniques can in fact enhance SMS spam filtering. / A popularização dos smartphones contribuiu para o crescimento do uso de mensagens SMS como forma alternativa de comunicação. O crescente número de usuários, aliado à confiança que eles possuem nos seus dispositivos tornam as mensagem SMS um ambiente propício aos spammers. Relatórios recentes indicam que o volume de spam enviados via SMS está aumentando vertiginosamente nos últimos anos. SMS spam representa um problema desafiador para os métodos tradicionais de detecção de spam, uma vez que essas mensagens são curtas e geralmente repletas de gírias, símbolos, abreviações e emoticons, que torna até mesmo a tokenização uma tarefa difícil. Diante desse cenário, esta dissertação propõe e avalia um método para normalizar e expandir amostras curtas e ruidosas de mensagens SMS de forma a obter atributos mais representativos e, com isso, melhorar o desempenho geral na tarefa de classificação. O método proposto é baseado em dicionários lexicográficos e semânticos e utiliza técnicas modernas de análise semântica e detecção de contexto. Ele é empregado para normalizar os termos que compõem as mensagens e criar novos atributos para alterar e expandir as amostras originais de texto com o objetivo de mitigar fatores que podem degradar o desempenho dos métodos de classificação, tais como redundâncias e inconsistências. A proposta foi avaliada usando uma base de dados real, pública e não codificada, além de vários métodos consagrados de aprendizado de máquina. Os experimentos foram conduzidos para garantir resultados estatisticamente corretos e indicaram que o método proposto pode de fato melhorar a detecção de spam em SMS. Smartphones Aplicativos móveis Filtragem de SMS spam Aprendizado de máquina Categorização de texto Mobile apps SMS spam filtering Text categorization Machine learning
186	Um sistema de disseminação seletiva da informação baseado em Cross-Document Structure Theory Beltrame, Walber Antonio Ramos 30 August 2011 (has links) Made available in DSpace on 2016-12-23T14:33:46Z (GMT). No. of bitstreams: 1 Dissertacao Walber.pdf: 1673761 bytes, checksum: 5ada541492a23b9653e4a80bea3aaa40 (MD5) Previous issue date: 2011-08-30 / A System for Selective Dissemination of Information is a type of information system that aims to harness new intellectual products, from any source, for environments where the probability of interest is high. The inherent challenge is to establish a computational model that maps specific information needs, to a large audience, in a personalized way. Therefore, it is necessary to mediate informational structure of unit, so that includes a plurality of attributes to be considered by process of content selection. In recent publications, systems are proposed based on text markup data (meta-data models), so that treatment of manifest information between computing semi-structured data and inference mechanisms on meta-models. Such approaches only use the data structure associated with the profile of interest. To improve this characteristic, this paper proposes construction of a system for selective dissemination of information based on analysis of multiple discourses through automatic generation of conceptual graphs from texts, introduced in solution also unstructured data (text). The proposed model is motivated by Cross-Document Structure Theory, introduced in area of Natural Language Processing, focusing on automatic generation of summaries. The model aims to establish correlations between semantic of discourse, for example, if there are identical information, additional or contradictory between multiple texts. Thus, an aspects discussed in this dissertation is that these correlations can be used in process of content selection, which had already been shown in other related work. Additionally, the algorithm of the original model is revised in order to make it easy to apply / Um Sistema de Disseminação Seletiva da Informação é um tipo de Sistema de Informação que visa canalizar novas produções intelectuais, provenientes de quaisquer fontes, para ambientes onde a probabilidade de interesse seja alta. O desafio computacional inerente é estabelecer um modelo que mapeie as necessidades específicas de informação, para um grande público, de modo personalizado. Para tanto, é necessário mediar à estruturação da unidade informacional, de maneira que contemple a pluralidade de atributos a serem considerados pelo processo de seleção de conteúdo. Em recentes publicações acadêmicas, são propostos sistemas baseados em marcação de dados sobre textos (modelos de meta-dados), de forma que o tratamento da informação manifesta-se entre computação de dados semi-estruturados e mecanismos de inferência sobre meta-modelos. Tais abordagens utilizam-se apenas da associação da estrutura de dados com o perfil de interesse. Para aperfeiçoar tal característica, este trabalho propõe a construção de um sistema de disseminação seletiva da informação baseado em análise de múltiplos discursos por meio da geração automática de grafos conceituais a partir de textos, concernindo à solução também os dados não estruturados (textos). A proposta é motivada pelo modelo Cross-Document Structure Theory, recentemente difundido na área de Processamento de Língua Natural, voltado para geração automática de resumos. O modelo visa estabelecer correlações de natureza semântica entre discursos, por exemplo, se existem informações idênticas, adicionais ou contraditórias entre múltiplos textos. Desse modo, um dos aspectos discutidos nesta dissertação é que essas correlações podem ser usadas no processo de seleção de conteúdo, o que já fora evidenciado em outros trabalhos correlatos. Adicionalmente, o algoritmo do modelo original é revisado, a fim de torná-lo de fácil aplicabilidade Disseminação seletiva da informação Recuperação da informação Teoria dos grafos Selective dissemination of information Retrieval of information Theory of graphs
187	Extrator de conhecimento coletivo : uma ferramenta para democracia participativa / Extractor Collective Knowledge : a tool for participatory democracy Angelo, Tiago Novaes, 1983- 26 August 2018 (has links) Orientadores: Ricardo Ribeiro Gudwin, Cesar José Bonjuani Pagan / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-26T04:03:32Z (GMT). No. of bitstreams: 1 Angelo_TiagoNovaes_M.pdf: 3900207 bytes, checksum: 2eed8dd66c9bdc37e4d58e9eac614c9d (MD5) Previous issue date: 2014 / Resumo: O surgimento das Tecnologias de Comunicação e Informação trouxe uma nova perspectiva para o fortalecimento da democracia nas sociedades modernas. A democracia representativa, modelo predominante nas sociedades atuais, atravessa uma crise de credibilidade cuja principal consequência é o afastamento do cidadão na participação política, enfraquecendo os ideais democráticos. Neste contexto, a tecnologia surge como possibilidade para construção de um novo modelo de participação popular que resgate uma cidadania mais ativa, inaugurando o que denomina-se de democracia digital. O objetivo desta pesquisa foi desenvolver e implementar uma ferramenta, denominada "Extrator de Conhecimento Coletivo", com o propósito de conhecer o que um coletivo pensa a respeito de sua realidade a partir de pequenos relatos de seus participantes, dando voz à população num processo de democracia participativa. Os fundamentos teóricos baseiam-se em métodos de mineração de dados, sumarizadores extrativos e redes complexas. A ferramenta foi implementada e testada usando um banco de dados formado por opiniões de clientes a respeito de suas estadias em um Hotel. Os resultados apresentaram-se satisfatórios. Para trabalhos futuros, a proposta é que o Extrator de Conhecimento Coletivo seja o núcleo de processamento de dados de um espaço virtual onde a população pode se expressar e exercer ativamente sua cidadania / Abstract: The emergence of Information and Communication Technologies brought a new perspective to the strengthening of democracy in modern societies. The representative democracy, prevalent model in today's societies, crosses a crisis of credibility whose main consequence is the removal of citizen participation in politics, weakening democratic ideals. In this context, technology emerges as a possibility for construction of a new model of popular participation to rescue more active citizenship, inaugurating what is called digital democracy. The objective of this research was to develop and implement a tool called "Collective Knowledge Extractor", with the purpose of knowing what the collective thinks about his reality through small reports of its participants, giving voice to the people in a process participatory democracy. The theoretical foundations are based on methods of data mining, extractive summarizers and complex networks. The tool was implemented and tested using a database consisting of customer reviews about their stay in a Hotel. The results were satisfactory. For future work, the proposal is that the Extractor Collective Knowledge be the core data processing of a virtual space where people can express themselves and actively exercise their citizenship / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Comunicações digitais Redes de informação - Aspectos sociais Redes complexas Digital communications Information networks - Social aspects Participatory management Natural language processing (Computer) Complex networks
188	Mining Biomedical Literature to Extract Pharmacokinetic Drug-Drug Interactions Karnik, Shreyas 03 February 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Polypharmacy is a general clinical practice, there is a high chance that multiple administered drugs will interfere with each other, such phenomenon is called drug-drug interaction (DDI). DDI occurs when drugs administered change each other's pharmacokinetic (PK) or pharmacodynamic (PD) response. DDIs in many ways affect the overall effectiveness of the drug or at some times pose a risk of serious side effects to the patients thus, it becomes very challenging to for the successful drug development and clinical patient care. Biomedical literature is rich source for in-vitro and in-vivo DDI reports and there is growing need to automated methods to extract the DDI related information from unstructured text. In this work we present an ontology (PK ontology), which defines annotation guidelines for annotation of PK DDI studies. Using the ontology we have put together a corpora of PK DDI studies, which serves as excellent resource for training machine learning, based DDI extraction algorithms. Finally we demonstrate the use of PK ontology and corpora for extracting PK DDIs from biomedical literature using machine learning algorithms. Bioinformatics Natural Language Processing Machine Learning Pharmacokinetics Information Extraction Polypharmacy -- Information services Drug interactions -- Research Pharmacokinetics Patients -- Care -- Research Machine learning Bioinformatics -- Research Ontology Algorithms -- Research Drug development -- Research Data mining -- Research Medical informatics
189	Advanced natural language processing and temporal mining for clinical discovery Mehrabi, Saeed 17 August 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / There has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered opportunities to extract information from unstructured clinical texts needed for various clinical applications. A popular method for enabling secondary uses of EHRs is information or concept extraction, a subtask of NLP that seeks to locate and classify elements within text based on the context. Extraction of clinical concepts without considering the context has many complications, including inaccurate diagnosis of patients and contamination of study cohorts. Identifying the negation status and whether a clinical concept belongs to patients or his family members are two of the challenges faced in context detection. A negation algorithm called Dependency Parser Negation (DEEPEN) has been developed in this research study by taking into account the dependency relationship between negation words and concepts within a sentence using the Stanford Dependency Parser. The study results demonstrate that DEEPEN, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. Additionally, an NLP system consisting of section segmentation and relation discovery was developed to identify patients' family history. To assess the generalizability of the negation and family history algorithm, data from a different clinical institution was used in both algorithm evaluations. Deep learning Family history Natural language processing Negation Pancreatic cancer Temporal pattern discovery Medical records -- Data processing Forms management Electronic records -- Access control Computational linguistics Data mining
190	Query Segmentation For E-Commerce Sites Gong, Xiaojing 12 July 2013 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Query segmentation module is an integral part of Natural Language Processing which analyzes users' query and divides them into separate phrases. Published works on the query segmentation focus on the web search using Google n-gram frequencies corpus or text retrieval from relational databases. However, this module is also useful in the domain of E-Commerce for product search. In this thesis, we will discuss query segmentation in the context of the E-Commerce area. We propose a hybrid unsupervised segmentation methodology which is based on prefix tree, mutual information and relative frequency count to compute the score of query pairs and involve Wikipedia for new words recognition. Furthermore, we use two unique E-Commerce evaluation methods to quantify the accuracy of our query segmentation method. Query Segmentation prefix tree unsupervised segmentation E-Commerce Query languages (Computer science) Electronic commerce -- Computer programs Computational linguistics Pattern recognition systems

Search results