Spelling suggestions: "subject:"text minining"" "subject:"text chanining""
341 |
Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix buildingCui, Hong, Xu, Dongfang, Chong, Steven S., Ramirez, Martin, Rodenhausen, Thomas, Macklin, James A., Ludäscher, Bertram, Morris, Robert A., Soto, Eduardo M., Koch, Nicolás Mongiardino 17 November 2016 (has links)
Background: Taxonomic descriptions are traditionally composed in natural language and published in a format that cannot be directly used by computers. The Exploring Taxon Concepts (ETC) project has been developing a set of web-based software tools that convert morphological descriptions published in telegraphic style to character data that can be reused and repurposed. This paper introduces the first semi-automated pipeline, to our knowledge, that converts morphological descriptions into taxon-character matrices to support systematics and evolutionary biology research. We then demonstrate and evaluate the use of the ETC Input Creation - Text Capture - Matrix Generation pipeline to generate body part measurement matrices from a set of 188 spider morphological descriptions and report the findings. Results: From the given set of spider taxonomic publications, two versions of input (original and normalized) were generated and used by the ETC Text Capture and ETC Matrix Generation tools. The tools produced two corresponding spider body part measurement matrices, and the matrix from the normalized input was found to be much more similar to a gold standard matrix hand-curated by the scientist co-authors. Special conventions utilized in the original descriptions (e.g., the omission of measurement units) were attributed to the lower performance of using the original input. The results show that simple normalization of the description text greatly increased the quality of the machine-generated matrix and reduced edit effort. The machine-generated matrix also helped identify issues in the gold standard matrix. Conclusions: ETC Text Capture and ETC Matrix Generation are low-barrier and effective tools for extracting measurement values from spider taxonomic descriptions and are more effective when the descriptions are self-contained. Special conventions that make the description text less self contained challenge automated extraction of data from biodiversity descriptions and hinder the automated reuse of the published knowledge. The tools will be updated to support new requirements revealed in this case study.
|
342 |
Identifying Genetic Pleiotropy through a Literature-wide Association Study (LitWAS) and a Phenotype Association Study (PheWAS) in the Age-related Eye Disease Study 2 (AREDS2)Simmons, Michael 26 May 2017 (has links)
A Thesis submitted to The University of Arizona College of Medicine - Phoenix in partial fulfillment of the requirements for the Degree of Doctor of Medicine. / Genetic association studies simplify genotype‐phenotype relationship investigation by considering only the presence of a given polymorphism and the presence or absence of a given downstream phenotype. Although such associations do not indicate causation, collections of phenotypes sharing association with a single genetic polymorphism may provide valuable mechanistic insights. In this thesis we explore such genetic pleiotropy with Deep Phenotype Association Studies (DeePAS) using data from the Age‐Related Eye Study 2 (AREDS2). We also employ a novel text mining approach to extract pleiotropic associations from the published literature as a hypothesis generation mechanism. Is it possible to identify pleiotropic genetic associations across multiple published abstracts and validate these in data from AREDS2? Data from the AREDS2 trial includes 123 phenotypes including AMD features, other ocular conditions, cognitive function and cardiovascular, neurological, gastrointestinal and endocrine disease. A previously validated relationship extraction algorithm was used to isolate descriptions of genetic associations with these phenotypes in MEDLINE abstracts. Results were filtered to exclude negated findings and normalize variant mentions. Genotype data was available for 1826 AREDS2 participants. A DeePAS was performed by evaluating the association between selected SNPs and all available phenotypes. Associations that remained significant after Bonferroni‐correction were replicated in AREDS. LitWAS analysis identified 9372 SNPs with literature support for at least two distinct phenotypes, with an average of 3.1 phenotypes/SNP. PheWAS analyses revealed that two variants of the ARMS2‐HTRA1 locus at 10q26, rs10490924 and rs3750846, were significantly associated with sub‐retinal hemorrhage in AMD (rs3750846 OR 1.79 (1.41‐2.27), p=1.17*10‐7). This associated remained significant even in populations of participants with neovascular AMD. Furthermore, odds ratios for the development of sub‐retinal hemorrhage in the presence of the rs3750846 SNP were similar between incident and prevalent AREDS2 sub‐populations (OR: 1.94 vs 1.75). This association was also replicated in data from the AREDS trial. No literature‐defined pleiotropic associations tested remained significant after multiple‐testing correction. The rs3750846 variant of the ARMS2‐HTRA1 locus is associated with sub‐retinal hemorrhage. Automatic literature mining, when paired with clinical data, is a promising method for exploring genotype‐phenotype relationships.
|
343 |
A WEB PERSONALIZATION ARTIFACT FOR UTILITY-SENSITIVE REVIEW ANALYSISFlory, Long, Mrs. 01 January 2015 (has links)
Online customer reviews are web content voluntarily posted by the users of a product (e.g. camera) or service (e.g. hotel) to express their opinions about the product or service. Online reviews are important resources for businesses and consumers. This dissertation focuses on the important consumer concern of review utility, i.e., the helpfulness or usefulness of online reviews to inform consumer purchase decisions. Review utility concerns consumers since not all online reviews are useful or helpful. And, the quantity of the online reviews of a product/service tends to be very large. Manual assessment of review utility is not only time consuming but also information overloading. To address this issue, review helpfulness research (RHR) has become a very active research stream dedicated to study utility-sensitive review analysis (USRA) techniques for automating review utility assessment.
Unfortunately, prior RHR solution is inadequate. RHR researchers call for more suitable USRA approaches. Our current research responds to this urgent call by addressing the research problem: What is an adequate USRA approach? We address this problem by offering novel Design Science (DS) artifacts for personalized USRA (PUSRA). Our proposed solution extends not only RHR research but also web personalization research (WPR), which studies web-based solutions for personalized web provision. We have evaluated the proposed solution by applying three evaluation methods: analytical, descriptive, and experimental. The evaluations corroborate the practical efficacy of our proposed solution.
This research contributes what we believe (1) the first DS artifacts to the knowledge body of RHR and WPR, and (2) the first PUSRA contribution to USRA practice. Moreover, we consider our evaluations of the proposed solution the first comprehensive assessment of USRA solutions. In addition, this research contributes to the advancement of decision support research and practice. The proposed solution is a web-based decision support artifact with the capability to substantially improve accurate personalized webpage provision. Also, website designers can apply our research solution to transform their works fundamentally. Such transformation can add substantial value to businesses.
|
344 |
Vulnerability Reports Analysis and Management / Vulnerability Reports Analysis and ManagementDomány, Dušan January 2011 (has links)
Various vulnerabilities in software products can often represent a significant security threat if they are discovered by malicious attackers. It is therefore important to identify these vulnerabilities and report their presence to responsible persons before they are exploited by malicious subjects. The number of security reports about discovered vulnerabilities in various software products has grown rapidly over the last decade. It is becoming more and more difficult to process all of the incoming reports manually. This work discusses various methods that can be used to automate several important processes in collecting and sorting the reports. The reports are analyzed in various ways, including techniques of text mining, and the results of the analysis are applied in form of practical implementation.
|
345 |
Abordagem simbólica de aprendizado de máquina na recuperação automática de artigos científicos a partir de web / Symbolic approach of machine learning in the scientific article automatic recovery from the webBrasil, Christiane Regina Soares 07 April 2006 (has links)
Atualmente, devido ao incessante aumento dos documentos científicos disponíveis na rede mundial de computadores, as ferrametas de busca tornaram-se um importante auxílio para recuperação de informação a partir da Internet em todas as áreas de conhecimento para pesquisadores e usuários. Entretanto, as atuais ferramentas de busca disponíveis selecionam uma enorme lista de páginas, cabendo ao usuário a tarefa final de escolher aquelas que realmente são relevantes a sua pesquisa. Assim, é importante o desenvolvimento de técnicas e ferramentas que não apenas retornem uma lista de possíveis documentos relacionados com a consulta apresentada pelo usuário, mas que organizem essa informação de acordo com o conteúdo de tais documentos, e apresentem o resultado da busca em uma representação gráfica que auxilie a exploração e o entendimento geral dos documentos recuperados. Neste contexto, foi proposto o projeto de uma Ferramenta Inteligente de Apoio à Pesquisa (FIP), do qual este trabalho é parte. O objetivo deste trabalho é analisar estratégias de recuperação automática de artigos científicos sobre uma determinada área de pesquisa a partir da Web, que poderá ser adotada pelo módulo de recuperação da FIP. Neste trabalho são considerados artigos escritos em inglês, no formato PDF, abrangendo as áreas da Ciência da Computação. Corpora de treino e teste foram usados para avaliação das abordagens simbólicas de Aprendizado de Máquina na indução de regras que poderão ser inseridas em um crawler inteligente para recuperação automática de artigos dessas áreas. Diversos experimentos foram executados para definir parâmetros de pré-processamento apropriados ao domínio, bem como para definir a melhor estratégia de aplicação das regras induzidas e do melhor algoritmo simbólico de indução. / Today, due to the increase of scientific documents available on the World Wide Web, search tools have become an important aid for information retrieval from the Internet in all fields of knowledge for researchers and users. However, the search tools currently available, in general, select a huge list of pages leaving the user with the final task of choosing those pages that actually fit its research. It is important to develop techniques and tools that return a list of documents related to the query made by the user in accordance with the content of such documents, and then present the result in a meaningful graphical representation with the aim to improve the exploration and understanding of the retrieved articles. In this context, a project of an Intelligent Tool for Research Supporting (FIP) was proposed. This MSc work is part of this project. The objective of this work is to analyze strategies of automatic scientific article retrieval on a specific field from the Web. Such strategy must fit the requirements of the retrieval module of the FIP. In this work articles written in English, in PDF format, covering the fields of Computer Science were considered. Corpora of training and testing were used to evaluate the symbolic approaches of Machine Learning in the induction of rules. These rules could be imbedded in an intelligent crawler for automatic retrieving of the articles in the chosen fields. Several experiments have been carried out in order to define parameters as attribute weights, cut-off point, stopwords in the corpora domain, a better strategy to apply the rules for the categorization of the articles and a better symbolic algorithm to induce the rules
|
346 |
Fatoração de matrizes no problema de coagrupamento com sobreposição de colunas / Matrix factorization for overlapping columns coclusteringBrunialti, Lucas Fernandes 31 August 2016 (has links)
Coagrupamento é uma estratégia para análise de dados capaz de encontrar grupos de dados, então denominados cogrupos, que são formados considerando subconjuntos diferentes das características descritivas dos dados. Contextos de aplicação caracterizados por apresentar subjetividade, como mineração de texto, são candidatos a serem submetidos à estratégia de coagrupamento; a flexibilidade em associar textos de acordo com características parciais representa um tratamento adequado a tal subjetividade. Um método para implementação de coagrupamento capaz de lidar com esse tipo de dados é a fatoração de matrizes. Nesta dissertação de mestrado são propostas duas estratégias para coagrupamento baseadas em fatoração de matrizes não-negativas, capazes de encontrar cogrupos organizados com sobreposição de colunas em uma matriz de valores reais positivos. As estratégias são apresentadas em termos de suas definições formais e seus algoritmos para implementação. Resultados experimentais quantitativos e qualitativos são fornecidos a partir de problemas baseados em conjuntos de dados sintéticos e em conjuntos de dados reais, sendo esses últimos contextualizados na área de mineração de texto. Os resultados são analisados em termos de quantização do espaço e capacidade de reconstrução, capacidade de agrupamento utilizando as métricas índice de Rand e informação mútua normalizada e geração de informação (interpretabilidade dos modelos). Os resultados confirmam a hipótese de que as estratégias propostas são capazes de descobrir cogrupos com sobreposição de forma natural, e que tal organização de cogrupos fornece informação detalhada, e portanto de valor diferenciado, para as áreas de análise de agrupamento e mineração de texto / Coclustering is a data analysis strategy which is able to discover data clusters, known as coclusters. This technique allows data to be clustered based on different subsets defined by data descriptive features. Application contexts characterized by subjectivity, such as text mining, are candidates for applying coclustering strategy due to the flexibility to associate documents according to partial features. The coclustering method can be implemented by means of matrix factorization, which is suitable to handle this type of data. In this thesis two strategies are proposed in non-negative matrix factorization for coclustering. These strategies are able to find column overlapping coclusters in a given dataset of positive data and are presented in terms of their formal definitions as well as their algorithms\' implementation. Quantitative and qualitative experimental results are presented through applying synthetic datasets and real datasets contextualized in text mining. This is accomplished by analyzing them in terms of space quantization, clustering capabilities and generated information (interpretability of models). The well known external metrics Rand index and normalized mutual information are used to achieve the analysis of clustering capabilities. Results confirm the hypothesis that the proposed strategies are able to discover overlapping coclusters naturally. Moreover, these coclusters produced by the new algorithms provide detailed information and are thus valuable for future research in cluster analysis and text mining
|
347 |
Evidence-based software engineering: systematic literature review process based on visual text mining / Engenharia de software baseada em evidências: processo de revisão sistemática de literatura baseado em mineração visual de textoScannavino, Katia Romero Felizardo 15 May 2012 (has links)
Context: Systematic literature review (SLR) is a methodology used to aggregate all relevant evidence of a specific research question. One of the activities associated with the SLR process is the selection of primary studies. The process used to select primary studies can be arduous, particularly when the researcher faces large volumes of primary studies. Another activity associated with an SLR is the presentation of results of the primary studies that meet the SLR purpose. The results are generally summarized in tables and an alternative to reduce the time consumed to understand the data is the use of graphic representations. Systematic mapping (SM) is a more open form of SLR used to build a classification and categorization scheme of a field of interest. The categorization and classification activities in SM are not trivial tasks, since they require manual effort and domain of knowledge by reviewers to achieve adequate results. Although clearly crucial, both SLR and SM processes are time-consuming and most activities are manually conducted. Objective: The aim of this research is to use Visual Text Mining (VTM) to support different activities of SLR and SM processes, e.g., support the selection of primary studies, the presentation of results of an SLR and the categorization and classification of an SM. Method: Extensions to the SLR and SM processes based on VTM were proposed. A series of case studies were conducted to demonstrate the usefulness of the VTM techniques in the selection, review, presentation of results and categorization context. Results: The findings have showed that the application of VTM is promising in terms of providing positive support to the study selection activity and that visual representations of SLR data have led to a reduction in the time taken for their analysis, with no loss of data comprehensibility. The application of VTM is relevant also in the context of SM. Conclusions: VTM techniques can be successfully employed to assist the SLR and SM processes / Contexto: Revisão Sistemática (RS) é uma metodologia utilizada para reunir evidências sobre uma quest~ao de pesquisa específica. Uma das atividades associadas à RS é a seleção de estudos primários. Quando o pesquisador se depara com grandes volumes de estudos, torna-se difícil selecionar artigos relevantes para uma análise mais aprofundada. Outra atividade associada à RS é a apresentação dos resultados dos estudos primários que atendem aos propósitos da RS. Os resultados são geralmente resumidos em tabelas e uma alternativa para reduzir o tempo consumido para entender os dados é o uso de representações gráficas. Mapeamento sistemático (MS) é uma forma mais aberta de RS, usado para construir um esquema de classificação e categorização sobre uma área de interesse. As atividades de categorização e classificação no MS não são tarefas triviais, pois exigem um esforço manual e conhecimento do domínio por parte dos revisores para a geração de resultados adequados. Embora relevantes, ambos os processos de RS e MS são demorados e muita das atividades são realizadas manualmente. Objetivo: O objetivo desta pesquisa é a utilização de Mineração Visual de Texto (VTM) para apoiar as diferentes atividades dos processos de RS e MS como, por exemplo, suporte à seleção de estudos primários, apresentação de resultados de RSs e a categorização e classificação em MSs. Métodos: Foram propostas extensões para os processos de RS e MS com base em VTM. Uma série de estudos de caso foram realizados para demonstrar a utilidade de técnicas VTM no contexto de seleção, revisão, apresentação de resultados e categorização. Resultados: Os resultados mostraram que a aplicação de VTM é promissora em termos de apoio positivo para a atividade de seleção de estudos primários e que o uso de representações visuais para apresentar resultados de RSs leva a uma redução do tempo necessário para sua análise, sem perda de compreensão de dados. A aplicação da VTM é relevante também no contexto da MS. Conclus~oes: Técnicas VTM podem ser empregadas com sucesso para ajudar nos processos de RS e MS
|
348 |
O efeito do uso de diferentes formas de extração de termos na compreensibilidade e representatividade dos termos em coleções textuais na língua portuguesa / The effect of using different forms of terms extraction on its comprehensibility and representability in Portuguese textual domainsConrado, Merley da Silva 10 September 2009 (has links)
A extração de termos em coleções textuais, que é uma atividade da etapa de Pré-Processamento da Mineração de Textos, pode ser empregada para diversos fins nos processos de extração de conhecimento. Esses termos devem ser cuidadosamente extraídos, uma vez que os resultados de todo o processo dependerão, em grande parte, da \"qualidade\" dos termos obtidos. A \"qualidade\" dos termos, neste trabalho, abrange tanto a representatividade dos termos no domínio em questão como sua compreensibilidade. Tendo em vista sua importância, neste trabalho, avaliou-se o efeito do uso de diferentes técnicas de simplificação de termos na compreensibilidade e representatividade dos termos em coleções textuais na Língua Portuguesa. Os termos foram extraídos seguindo os passos da metodologia apresentada neste trabalho e as técnicas utilizadas durante essa atividade de extração foram a radicalização, lematização e substantivação. Para apoiar tal metodologia, foi desenvolvida uma ferramenta, a ExtraT (Ferramenta para Extração de Termos). Visando garantir a \"qualidade\" dos termos extraídos, os mesmos são avaliados objetiva e subjetivamente. As avaliações subjetivas, ou seja, com o auxílio de especialistas do domínio em questão, abrangem a representatividade dos termos em seus respectivos documentos, a compreensibilidade dos termos obtidos ao utilizar cada técnica e a preferência geral subjetiva dos especialistas em cada técnica. As avaliações objetivas, que são auxiliadas por uma ferramenta desenvolvida (a TaxEM - Taxonomia em XML da Embrapa), levam em consideração a quantidade de termos extraídos por cada técnica, além de abranger tambéem a representatividade dos termos extraídos a partir de cada técnica em relação aos seus respectivos documentos. Essa avaliação objetiva da representatividade dos termos utiliza como suporte a medida CTW (Context Term Weight). Oito coleções de textos reais do domínio de agronegócio foram utilizadas na avaliaçao experimental. Como resultado foram indicadas algumas das características positivas e negativas da utilização das técnicas de simplificação de termos, mostrando que a escolha pelo uso de alguma dessas técnicas para o domínio em questão depende do objetivo principal pré-estabelecido, que pode ser desde a necessidade de se ter termos compreensíveis para o usuário até a necessidade de se trabalhar com uma menor quantidade de termos / The task of term extraction in textual domains, which is a subtask of the text pre-processing in Text Mining, can be used for many purposes in knowledge extraction processes. These terms must be carefully extracted since their quality will have a high impact in the results. In this work, the quality of these terms involves both representativity in the specific domain and comprehensibility. Considering this high importance, in this work the effects produced in the comprehensibility and representativity of terms were evaluated when different term simplification techniques are utilized in text collections in Portuguese. The term extraction process follows the methodology presented in this work and the techniques used were radicalization, lematization and substantivation. To support this metodology, a term extraction tool was developed and is presented as ExtraT. In order to guarantee the quality of the extracted terms, they were evaluated in an objective and subjective way. The subjective evaluations, assisted by domain specialists, analyze the representativity of the terms in related documents, the comprehensibility of the terms with each technique, and the specialist\'s opinion. The objective evaluations, which are assisted by TaxEM and by Thesagro (National Agricultural Thesaurus), consider the number of extracted terms by each technique and their representativity in the related documents. This objective evaluation of the representativity uses the CTW measure (Context Term Weight) as support. Eight real collections of the agronomy domain were used in the experimental evaluation. As a result, some positive and negative characteristics of each techniques were pointed out, showing that the best technique selection for this domain depends on the main pre-established goal, which can involve obtaining better comprehensibility terms for the user or reducing the quantity of extracted terms
|
349 |
Extraction d'information spatiale à partir de données textuelles non-standards / Spatial information extraction from non-standard textual dataZenasni, Sarah 05 January 2018 (has links)
L’extraction d’information spatiale à partir de données textuelles est désormais un sujet de recherche important dans le domaine du Traitement Automatique du Langage Naturel (TALN). Elle répond à un besoin devenu incontournable dans la société de l’information, en particulier pour améliorer l’efficacité des systèmes de Recherche d’Information (RI) pour différentes applications (tourisme, aménagement du territoire, analyse d’opinion, etc.). De tels systèmes demandent une analyse fine des informations spatiales contenues dans les données textuelles disponibles (pages web, courriels, tweets, SMS, etc.). Cependant, la multitude et la variété de ces données ainsi que l’émergence régulière de nouvelles formes d’écriture rendent difficile l’extraction automatique d’information à partir de corpus souvent peu standards d’un point de vue lexical voire syntaxique.Afin de relever ces défis, nous proposons, dans cette thèse, des approches originales de fouille de textes permettant l’identification automatique de nouvelles variantes d’entités et relations spatiales à partir de données textuelles issues de la communication médiée. Ces approches sont fondées sur trois principales contributions qui sont cruciales pour fournir des méthodes de navigation intelligente. Notre première contribution se concentre sur la problématique de reconnaissance et d’extraction des entités spatiales à partir de corpus de messages courts (SMS, tweets) marqués par une écriture peu standard. La deuxième contribution est dédiée à l’identification de nouvelles formes/variantes de relations spatiales à partir de ces corpus spécifiques. Enfin, la troisième contribution concerne l’identification des relations sémantiques associées à l’information spatiale contenue dans les textes. Les évaluations menées sur des corpus réels, principalement en français (SMS, tweets, presse), soulignent l’intérêt de ces contributions. Ces dernières permettent d’enrichir la typologie des relations spatiales définies dans la communauté scientifique et, plus largement, de décrire finement l’information spatiale véhiculée dans les données textuelles non standards issues d’une communication médiée aujourd’hui foisonnante. / The extraction of spatial information from textual data has become an important research topic in the field of Natural Language Processing (NLP). It meets a crucial need in the information society, in particular, to improve the efficiency of Information Retrieval (IR) systems for different applications (tourism, spatial planning, opinion analysis, etc.). Such systems require a detailed analysis of the spatial information contained in the available textual data (web pages, e-mails, tweets, SMS, etc.). However, the multitude and the variety of these data, as well as the regular emergence of new forms of writing, make difficult the automatic extraction of information from such corpora.To meet these challenges, we propose, in this thesis, new text mining approaches allowing the automatic identification of variants of spatial entities and relations from textual data of the mediated communication. These approaches are based on three main contributions that provide intelligent navigation methods. Our first contribution focuses on the problem of recognition and identification of spatial entities from short messages corpora (SMS, tweets) characterized by weakly standardized modes of writing. The second contribution is dedicated to the identification of new forms/variants of spatial relations from these specific corpora. Finally, the third contribution concerns the identification of the semantic relations associated withthe textual spatial information.
|
350 |
Aplicações de mineração de textos na gestão de operações / Applications of Text Mining Techniques in Operations ManagementLucini, Filipe Rissieri January 2018 (has links)
A presente tese apresenta proposições para o desenvolvimento e aplicação de técnicas de mineração de textos, de modo a contribuir para a gestão de operações nas áreas médicas e de negócios. Os objetivos desta tese são: (i) identificar e estruturar técnicas de mineração de texto, de modo a elaborar um método para prever internações de pacientes provenientes de emergências hospitalares, tendo como base somente os registros textuais não estruturados escritos por médicos durante o primeiro encontro médico-paciente; (ii) comparar previsões realizadas pelo método proposto no objetivo (i) com análises médicas realizadas por humanos, de modo a verificar se computadores podem atuar de forma autônoma na tarefa de previsão de internações de pacientes provenientes de emergências hospitalares; e (iii) identificar e estruturar técnicas de mineração de texto, de modo a elaborar um método para prever a satisfação de clientes de companhias aéreas, tendo como base as avaliações escritas e publicadas por passageiros na internet. Os métodos propostos utilizaram diferentes técnicas de mineração de textos, sendo validados por estudos de caso. Em relação à área médica, o método proposto pode realizar previsões em tempo real sobre a necessidade de leitos, ajudando as equipes de gerenciamento de leitos a melhorar os processos de fluxo de pacientes. Além disso, verificou-se que tanto médicos (iniciantes ou experientes), quanto máquina, tiveram desempenhos semelhantes na tarefa de previsão de internação de pacientes. Já em relação à área de negócios, o método proposto permitiu extrair dimensões de satisfação de avaliações online, além dos sentimentos associados a elas, considerando diferentes perfis de passageiros, serviços e períodos de tempo. Desta forma, foi possível prever a recomendação de companhias aéreas baseado nas avaliações escritas por passageiros. / This dissertation presents propositions for the development and application of text mining techniques, in order to contribute to operations management in the medical and business areas. The objectives of this dissertation are: (i) identify and structure text mining techniques, in order to propose a method to predict admissions of patients from hospital emergencies, based only on unstructured textual records written by physicians during the first encounter with patients; (ii) compare predictions made by the method proposed in objective (i) with medical analyses carried out by humans, in order to verify if computers can work autonomously in predicting hospitalizations of patients coming from hospital emergencies; and (iii) identify and structure text mining techniques to develop a method for predicting airline customer satisfaction based on online customer reviews. The proposed methods used different text mining techniques, being validated by case studies. Regarding the medical area, the proposed method was able to perform real-time forecasts about the need for beds, helping bed management teams to improve patient flow processes. In addition, it was found that both physicians (novice or experienced) and machine had similar performances in predicting patient hospitalization. In relation to the business area, the proposed method allowed to extract satisfaction dimensions of online customer reviews, as well as sentiments associated to them, considering different profiles of passengers, services and time periods. It also enabled the prediction of airline recommendation based on online customer reviews.
|
Page generated in 0.0797 seconds