Global ETD Search

571	Learning with Sparcity: Structures, Optimization and Applications Chen, Xi 01 July 2013 (has links) The development of modern information technology has enabled collecting data of unprecedented size and complexity. Examples include web text data, microarray & proteomics, and data from scientific domains (e.g., meteorology). To learn from these high dimensional and complex data, traditional machine learning techniques often suffer from the curse of dimensionality and unaffordable computational cost. However, learning from large-scale high-dimensional data promises big payoffs in text mining, gene analysis, and numerous other consequential tasks. Recently developed sparse learning techniques provide us a suite of tools for understanding and exploring high dimensional data from many areas in science and engineering. By exploring sparsity, we can always learn a parsimonious and compact model which is more interpretable and computationally tractable at application time. When it is known that the underlying model is indeed sparse, sparse learning methods can provide us a more consistent model and much improved prediction performance. However, the existing methods are still insufficient for modeling complex or dynamic structures of the data, such as those evidenced in pathways of genomic data, gene regulatory network, and synonyms in text data. This thesis develops structured sparse learning methods along with scalable optimization algorithms to explore and predict high dimensional data with complex structures. In particular, we address three aspects of structured sparse learning: 1. Efficient and scalable optimization methods with fast convergence guarantees for a wide spectrum of high-dimensional learning tasks, including single or multi-task structured regression, canonical correlation analysis as well as online sparse learning. 2. Learning dynamic structures of different types of undirected graphical models, e.g., conditional Gaussian or conditional forest graphical models. 3. Demonstrating the usefulness of the proposed methods in various applications, e.g., computational genomics and spatial-temporal climatological data. In addition, we also design specialized sparse learning methods for text mining applications, including ranking and latent semantic analysis. In the last part of the thesis, we also present the future direction of the high-dimensional structured sparse learning from both computational and statistical aspects. Machine Learning Sparse Learning Optimization Structure Regression Multi-task Regression Canonical Correlation Analysis Undirected Graphical Models First-order Method Stochastic Optimization Text Mining Ranking Latent Semantic Analysis Spatial-temporal Data Computational Genomics Computer Sciences
572	巨量資料環境下之新聞主題暨輿情與股價關係之研究 / A Study of the Relevance between News Topics & Public Opinion and Stock Prices in Big Data 張良杰, Chang, Liang Chieh Unknown Date (has links) 近年來科技、網路以及儲存媒介的發達，產生的資料量呈現爆炸性的成長，也宣告了巨量資料時代的來臨。擁有巨量資料代表了不必再依靠傳統抽樣的方式來蒐集資料，分析數據也不再有資料收集不足以致於無法代表母題的限制。突破傳統的限制後，巨量資料的精隨在於如何從中找出有價值的資訊。以擁有大量輿論和人際互動資訊的社群網站為例，就有相關學者研究其情緒與股價具有正相關性，本研究也試著利用同樣具有巨量資料特性的網路新聞，抓取中央新聞社2013年7月至2014年5月之經濟類新聞共計30,879篇，結合新聞主題偵測與追蹤技術及情感分析，利用新聞事件相似的概念，透過連結匯聚成網絡並且分析新聞的情緒和股價指數的關係。研究結果顯示，新聞事件間可以連結成一特定新聞主題，且能在龐大的網絡中找出不同的新聞主題，並透過新聞主題之連結產生新聞主題脈絡。對此提供一種新的方式來迅速了解巨量新聞內容，也能有效的回溯新聞主題及新聞事件。在新聞情緒和股價指數方面，研究發現新聞情緒影響了股價指數之波動，其相關係數達到0.733562；且藉由情緒與心理線及買賣意願指標之比較，顯示新聞的情緒具有一定的程度能夠成為股價判斷之參考依據。 / In recent years, the technology, network, and storage media developed, the amount of generated data with the explosive growth, and also declared the new era of big data. Having big data let us no longer rely on the traditional sample ways to collect data, and no longer have the issue that could not represent the population which caused by the inadequate data collection. Once we break the limitations, the main spirit of big data is how to find out the valuable information in big data. For example, the social network sites (SNS) have a lot of public opinions and interpersonal information, and scholars have founded that the emotions in SNS have a positive correlation with stock prices. Therefore, the thesis tried to focus on the news which have the same characteristic of big data, using the web crawl to catch total of 30,879 economics news articles form the Central News Agency, furthermore, took the “Topic Detection & Tracking” and “Sentiment Analysis” technology on these articles. Finally, based on the concept of the similarity between news articles, through the links converging networks and analyze the relevant between news sentiment and stock prices. The results shows that news events can be linked to specific news topics, identify different news topics in a large network, and form the news topic context by linked news topics together. The thesis provides a new way to quickly understand the huge amount of news, and backtracking news topics and news event with effective. In the aspect of news sentiment and stock prices, the results shows that the news sentiments impact the fluctuations of stock prices, and the correlation coefficient is 0.733562. By comparing the emotion with psychological lines & trading willingness indicators, the emotion is better than the two indicators in the stock prices determination. 巨量資料文字探勘新聞主題偵測與追蹤連結分析情感分析 Big data Text mining News topic detection and tracking Link analysis Sentiment analysis
573	法人說明會資訊對供應鏈機構投資人投資行為之影響-以我國半導體產業為例 / The Effect of Up-stream Company’s Conference Call Information on Down-stream’s Company’s institutional investors– An Example From Semi-conductor Industry in Taiwan 劉士豪, Liu, Shih Hao Unknown Date (has links) 本篇研究試圖探討半導體產業供應鏈上游的IC設計業者召開法人說明會後，基於我國半導體產業供應鏈緊密連結之特性，同屬半導體供應鏈的其他中、下游製造和封測廠之機構投資人的交易行為是否將受到IC設計業者宣告之法人說明會資訊影響，亦即證明法人說明會資訊在半導體供應鏈中是否具有垂直資訊移轉效果。實證結果發現在法人說明會召開訊息首次見報日時，供應鏈上游公司之法人說明會訊息確實會影響其中、下游公司之機構投資人的持股變化，於宣告好(壞)消息時買進(賣出)，顯示機構投資人藉由其專業團隊和私有資訊能早一般大眾提前調整其交易策略，而此資訊移轉效果也會隨著公司在供應鏈上之距離而逐漸稀釋。此外，結果亦顯示外資由於地緣限制，相較於投信和自營商更會倚賴法人說明會宣告之資訊調整其持股策略，於宣告好(壞)消息時買進(賣出)。 / This research examine the conference call which hold by the IC design companies will transfer useful information to the institutional investors of IC manufacturing and packaging companies in the supply chain downstream. I am interested in if there is a vertical information transfer in the semi-conduct industry. The empirical results show that the conference call information is significantly influence the holding percentage of the institutional investors of the downstream supply chain companies after the information of conference call first reported in the newspapers. The institutional investors will increase the holding percentage after the good news released and vice versa. It is showed that the institutional investors can gather more information before the conference call and adjust their invest strategy in advance. Furthermore, this vertical information transfer effect will dilute by degrees as distance increases. Lastly, the result also shows that the foreign institutional investors more rely on the information released from the conference call to adjust their invest strategy than native institutional investors. 半導體供應鏈垂直資訊移轉法人說明會機構投資人文字探勘 Semi-conduct supply chain Vertical information transfer Conference Call Institutional investors Text mining
574	USO DE TEORIAS NO CAMPO DE SISTEMAS DE INFORMAÇÃO: MAPEAMENTO USANDO TÉCNICAS DE MINERAÇÃO DE TEXTOS Pinheiro, José Claudio dos Santos 17 September 2009 (has links) Made available in DSpace on 2016-08-02T21:42:57Z (GMT). No. of bitstreams: 1 Jose Claudio dos Santos Pinheiro.pdf: 5349646 bytes, checksum: 057189cedae5b7fc79c3e7cec83d51aa (MD5) Previous issue date: 2009-09-17 / This work aim to map the use of information system s theories, based on analytic resources that came from information retrieval techniques and data mining and text mining methodologies. The theories addressed by this research were Transactions Costs Economics (TCE), Resource-based view (RBV) and Institutional Theory (IT), which were chosen given their usefulness, while alternatives of approach in processes of allocation of investments and implementation of information systems. The empirical data are based on the content of textual data in abstract and review sections, of articles from ISR, MISQ and JIMS along the period from 2000 to 2008. The results stemming from the text mining technique combined with data mining were compared with the advanced search tool EBSCO and demonstrated greater efficiency in the identification of content. Articles based on three theories accounted for 10% of all articles of the three journals and the most useful publication was the 2001 and 2007.(AU) / Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU) Análise semântica latente teorias de sistemas de informação mineração de dados Latent semantic analysis theories of information system data-mining text mining and semantic content
575	Um processo baseado em parágrafos para a extração de tratamentos de artigos científicos do domínio biomédico Duque, Juliana Lilian 24 February 2012 (has links) Made available in DSpace on 2016-06-02T19:05:56Z (GMT). No. of bitstreams: 1 4310.pdf: 3265738 bytes, checksum: 6650fb70eee9b096860bcac6b5ed596c (MD5) Previous issue date: 2012-02-24 / Currently in the medical field there is a large amount of unstructured information (i.e., in textual format). Regarding the large volume of data, it makes it impossible for doctors and specialists to analyze manually all the relevant literature, which requires techniques for automatically analyze the documents. In order to identify relevant information, as well as to structure and store them into a database and to enable future discovery of significant relationships, in this paper we propose a paragraph-based process to extract treatments from scientific papers in the biomedical domain. The hypothesis is that the initial search for sentences that have terms of complication improves the identification and extraction of terms of treatment. This happens because treatments mainly occur in the same sentence of a complication, or in nearby sentences in the same paragraph. Our methodology employs three approaches for information extraction: machine learning-based approach, for classifying sentences of interest that will have terms to be extracted; dictionary-based approach, which uses terms validated by an expert in the field; and rule-based approach. The methodology was validated as proof of concept, using papers from the biomedical domain, specifically, papers related to Sickle Cell Anemia disease. The proof of concept was performed in the classification of sentences and identification of relevant terms. The value obtained in the classification accuracy of sentences was 79% for the classifier of complication and 71% for the classifier of treatment. These values are consistent with the results obtained from the combination of the machine learning algorithm Support Vector Machine with the filter Noise Removal and Balancing of Classes. In the identification of relevant terms, the results of our methodology showed higher F-measure percentage (42%) compared to the manual classification (31%) and to the partial process, i.e., without using the classifier of complication (36%). Even with low percentage of recall, there was no impact observed on the extraction process, and, in addition, we were able to validate the hypothesis considered in this work. In other words, it was possible to obtain 100% of recall for different terms, thus not impacting the extraction process, and further the working hypothesis of this study was proven. / Atualmente na área médica existe uma grande quantidade de informações não estruturadas (i.e., em formato textual) sendo produzidas na literatura médica. Com o grande volume de dados, torna-se impossível que os médicos e especialistas da área analisem toda a literatura de forma manual, exigindo técnicas para automatizar a análise destes documentos. Com o intuito de identificar as informações relevantes, estruturar e armazenar estas informações em um banco de dados, para posteriormente identificar relacionamentos interessantes entre as informações extraídas, nesta dissertação é proposto um processo baseado em parágrafos para a extração de tratamentos de artigos científicos do domínio biomédico. A hipótese é que a busca inicial de sentenças que possuem termos de complicação melhora a eficiência na identificação e na extração de termos de tratamento. Isso acontece porque tratamentos ocorrem principalmente na mesma sentença de complicação ou em sentenças próximas no mesmo parágrafo. Esta metodologia utiliza três abordagens de extração de informação encontradas na literatura: abordagem baseada em aprendizado de máquina para classificar as sentenças de interesse; abordagem baseada em dicionário com termos validados pelo especialista da área e abordagem baseada em regras. A metodologia foi validada como prova de conceito, utilizando artigos do domínio biomédico, mais especificamente da doença Anemia Falciforme. A prova de conceito foi realizada na classificação de sentenças e identificação de termos relevantes. O valor da acurácia obtida na classificação de sentenças foi de 79% para o classificador de complicação e 71% para o classificador de tratamento. Estes valores condizem com os resultados obtidos com a combinação do algoritmo de aprendizado de máquina Support Vector Machine juntamente com a aplicação do filtro Remoção de Ruído e Balanceamento das Classes. Na identificação de termos relevantes, os resultados da metodologia proposta obteve percentual superior de 42% de medida-F comparado à classificação manual (31%) e comparado ao processo parcial, ou seja, sem utilizar o classificador de complicação (36%). Mesmo com a baixa revocação, foi possível obter 100% de revocação para os termos distintos de tratamento, não impactando o processo de extração, e portanto a hipótese considerada neste trabalho foi comprovada. Inteligência artificial Banco de dados Mineração de textos Reconhecimento de padrões Extração de informação Anemia falciforme Tratamentos Pré-Processamento Domínio Biomédico Information Extraction Treatments Text Mining Preprocessing Biomedical Domain Sickle Cell Anemia
576	Anotação semântica baseada em ontologia: um estudo do português brasileiro em documentos históricos do final do século XIX Pereira, Juliana Wolf 01 July 2014 (has links) Made available in DSpace on 2016-06-02T19:06:12Z (GMT). No. of bitstreams: 1 5898.pdf: 11774674 bytes, checksum: 3cc87530008d9b42c105781f8a1068a3 (MD5) Previous issue date: 2014-07-01 / Financiadora de Estudos e Projetos / This dissertation presents an approach to proceed with semantic annotation in historical documents from the 19th century that discuss the constitution of the mother tongue, the Portuguese Language in Brazil. The objective is to generate a group of semantically annotated documents in agreement with a domain ontology. To provide this domain ontology, the IntrumentoLinguistico Ontology was built, and it supported the process of automatic semantic annotation. The results obtained with the annotation were analyzed in comparison with the Gold Standard and they presented an elevated level of coincidence, between 0.86 and 1.00 for the Fl-score measure. Besides that, it was possible to locate new documents about the discussed domain in a sample of the Revistas Brazileiras. These results prove the efficacy of the approach of automatic semantic annotation. / Esta dissertação apresenta uma abordagem de anotação semântica automática em documentos históricos do século XIX que discutem a constituição da língua pátria, a Língua Portuguesa no Brasil. O objetivo e gerar um conjunto de documentos semanticamente anotados em acordo com uma ontologia de domínio. Para prover essa ontologia de domínio, foi construída a Ontologia Instrumento Linguístico que apoiou o processo para a realização da anotação semântica automática. Os resultados obtidos com a anotação foram analisados em comparação com o Gold Standard e apresentaram alto grau de coincidência, entre 0.86 e 1.00 para a medida F1-Score. Além disso, foi possível localizar novos documentos sobre o domínio discutido em uma amostra das Revistas Brazileiras. Esses resultados comprovam a eficácia da abordagem de anotação semântica automática. Processamento de textos (Computação) Extração de relações semânticas Ontologia Documentos históricos Mineração de textos Semantic annotation Ontology-based information extraction Ontology Historical documents Text mining Natural language processing
577	Um método para descoberta de relacionamentos semânticos do tipo causa e efeito em sentenças de artigos científicos do domínio biomédico Scheicher, Ricardo Brigato 28 November 2013 (has links) Made available in DSpace on 2016-06-02T19:06:20Z (GMT). No. of bitstreams: 1 6477.pdf: 3193803 bytes, checksum: 2bf85c80b2865f8b7efd9d6eeb6aa172 (MD5) Previous issue date: 2013-11-28 / Financiadora de Estudos e Projetos / Recently, there is an enormous amount of scientific material written in textual format and published in electronic ways (paper on proceedings and articles on journals). In the biomedical field, researchers need to analyse a vast amount of information in order to update their knowledges, in order to get more precise diagnostics and propose more modern and effective treatments. The task of getting knowledge is extremely onerous and the manual process to annotate relationships and to propose novel hypothesis for treatments becomes very slow and error-prone. In this sense, as a result of this master s research it is proposed a method to extract cause and effect semantic relationships in sentences of scientific papers of the biomedical domain. The goal of this work is to propose and implements a solution for: (1) to extract terms from the biomedical domain (genes, proteins, chemical components, structures and anatomical processes, cell components and strutures, and treatmens), (2) to identify existing relationships on the texts, from the extracted terms, and (3) to suggest a knowledge network based on the relations of cause and effect . Over the approach using textual patterns, our proposed method had extracted semantic relations with a precision of 94,83 %, recall of 98,10 %, F-measure of 96,43 %. / Atualmente, existe uma enorme quantidade de material científico escrito em formato textual e publicado em meios eletrônicos (artigos em anais de eventos e periódicos). Na área biomédica, pesquisadores necessitam assimilar uma grande parte deste conteúdo com a finalidade de se atualizarem e, por conseguinte realizarem diagnosticos mais precisos e aplicar tratamentos mais modernos e eficazes. A tarefa de obtenção de conhecimento é bastante onerosa e o processo manual para anotar relacionamentos e propor novas hipóteses de tratamentos torna-se muito lento. Neste sentido, como resultado desta pesquisa de mestrado, foi proposto um método para a extração de relacionamentos semânticos do tipo causa e efeito em artigos científicos do domínio biomédico. Mais especificamente, o objetivo deste trabalho é propor e implementar uma solução para (1) extrair termos do domínio biomédico de documentos científicos (genes, componentes químicos, proteínas, estruturas e processos anatômicos, componentes e estruturas celulares e tratamentos), (2) identificar relacionamentos existentes nos textos, com base nos termos extraídos, e (3) sugerir uma rede de conhecimento baseada nos relacionamentos extraídos. Através de uma abordagem utilizando regras e padrões textuais, o método proposto extraiu relacionamentos semânticos com uma precisão de 94,83 %, cobertura de 98,10 % e Medida-F de 96,43 %. Inteligência artificial Relações semânticas Rede semântica Extração de informação Mineração de textos Domínio biomédico Information extraction Text mining Semantic relations Semantic networks Biomedical domain Sickle cell anemia
578	Insight : uma abordagem guiada pela informação para análise qualitativa com suporte de visualização e mineração de texto Hernandes, Elis Cristina Montoro 25 August 2014 (has links) Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-02-10T11:21:07Z No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-13T18:33:05Z (GMT) No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-13T18:33:16Z (GMT) No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) / Made available in DSpace on 2017-03-13T18:46:36Z (GMT). No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) Previous issue date: 2014-08-25 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Instituto Nacional de Estudos e Pesquisas Educacionais (INEP) / Usually, experimental studies that are conducted to generate evidences on the different scientific fields produce many qualitative data to be analyzed by researchers. For instance, this is the case of defects lists generated from the software inspection activity, which defects consist of qualitative data that should be discussed and classified. This scenario was experienced during the Readers Project, where defects lists have token long time to be analyzed, sometimes hours or days, making the decisions about defects difficult. Moreover, feedback questionnaires deserved a careful analysis, since they carried relevant information about the experimental study. Aim: Driven by the described situation this research aimed to support the qualitative analysis conduction in an information-guided way, using visualization and text mining techniques to enable it. This approach allows software inspection defects described similarly be analyzed together, due to the use of these techniques, which may allow the homogeneity of the decisions about them. The same happens in the context of feedback questionnaires. Methodology: Based on the objective, the ways that visualization and text mining could contribute to become qualitative analysis more effective and efficient were investigated and designed. Considering it, the Insight tool was developed to enable the information-guided qualitative analysis. Experimental studies were conducted to validate and evaluate the approach on different contexts. Results: Four experimental studies were conducted: (i) a feasibility study conducted with text documents which gave evidences that the use of visualization and text mining met previous expectations and was feasible to be employed; (ii) a study in the context of software inspection which, although the quantitative data did not present statistical significance, the descriptive statistics and feedback questionnaires analysis gave evidences that visualization and text mining makes the qualitative analysis effective and efficient; (iii) a study in the context of questionnaires, which results were similar to the previous study; (iv) a case study in the context of qualitative analysis of long text documents which gave evidences about the utility and ease of use of the approach. Conclusion: Under the different contexts of evaluation, the Insight, an information-guided qualitative analysis approach that is based on visualization and text mining, gave evidences about the improvements on efficiency and effectiveness. Adding to its relevance in software engineering area, this research also contributes to other scientific fields, which often claim for technological support to conduct their research. / Em geral, os estudos experimentais, que são responsáveis pela construção de evidências nas várias áreas da ciência, geram grande volume de dados qualitativos que devem ser analisados. Esse é o caso, por exemplo, de listas de defeitos derivadas de atividades de inspeção, cujos defeitos correspondem a dados qualitativos que devem ser discutidos e analisados. Esse cenário ocorreu no Projeto Readers, em que as listas de defeitos consumiam horas e dias para serem analisadas, o que dificultava a decisão sobre eles. Além disso, os questionários de feedback também mereciam análise cuidadosa, pois traziam informações relevantes sobre o estudo experimental. Objetivo: Motivado pela situação descrita, esta pesquisa teve o objetivo de apoiar a condução da técnica Coding de forma que a análise dos dados seja feita guiada pela informação, utilizando para isso visualização e mineração de texto. Isso possibilita que, no contexto de inspeção de software, relatos de defeitos semelhantes sejam tratados em um mesmo momento, em decorrência do uso dessas técnicas, fazendo que as decisões sejam homogêneas. O mesmo acontece com informações do questionário de feedback. Metodologia: Com base no objetivo estabeleceram-se as formas com que a visualização e mineração de texto poderiam contribuir para tornar a análise dos dados mais efetiva e eficiente. Com base nessas definições a ferramenta Insight foi desenvolvida para tornar viável a codificação guiada pela informação. Estudos experimentais foram conduzidos para validar e avaliar a tese em diferentes contextos. Resultados: Foram realizados quatro estudos experimentais: (i) estudo de viabilidade conduzido com documentos textuais que evidenciou que o uso de visualização e mineração de texto atendia à expectativa e era viável de ser adotado; (ii) estudo realizado no contexto de reunião de inspeção que, embora não tenha apresentado significância estatística dos resultados, evidenciou por meio das análises descritivas e análise dos questionários de feedback que o uso de visualização e mineração de texto para esta atividade a torna mais efetiva e eficiente; (iii) estudo realizado no contexto de questionários, cujos resultados foram semelhantes ao anterior; (iv) estudo de caso feito no contexto de análise qualitativa que evidenciou a utilidade e facilidade de uso da abordagem para analisar documentos textuais extensos. Conclusão: Nos diferentes contextos em que foi avaliada, a abordagem de análise qualitativa guiada pela informação, baseada em visualização e mineração de texto, evidenciou melhorias na efetividade e eficiência da atividade. Além da relevância da pesquisa para a área de Engenharia de Software, ressalta-se que ela é uma contribuição para outras áreas do conhecimento, muitas vezes carentes de suporte tecnológicos para a condução de suas pesquisas. Engenharia de software Análise qualitativa Engenharia de software experimental Mineração de textos Questionário de feedback Inspeção de software Qualitative analysis Coding Text mining Software inspection meeting Feedback questionnaires Tree-map Experimental software engineering
579	Aplicação de técnicas de mineração de texto na recuperação de informação clínica em prontuário eletrônico do paciente / Application of text mining techniques in clinical information retrieval in the electronic patient record Carvalho, Ricardo César de [UNESP] 08 May 2017 (has links) Submitted by RICARDO CÉSAR DE CARVALHO (ricdon@gmail.com) on 2017-06-02T04:41:34Z No. of bitstreams: 1 Mestrado_Ricardo_Carvalho.pdf: 4464660 bytes, checksum: ba1819b77212278eb1a2808fd9658e4c (MD5) / Approved for entry into archive by Luiz Galeffi (luizgaleffi@gmail.com) on 2017-06-02T13:47:17Z (GMT) No. of bitstreams: 1 carvalho_rc_me_mar.pdf: 4464660 bytes, checksum: ba1819b77212278eb1a2808fd9658e4c (MD5) / Made available in DSpace on 2017-06-02T13:47:17Z (GMT). No. of bitstreams: 1 carvalho_rc_me_mar.pdf: 4464660 bytes, checksum: ba1819b77212278eb1a2808fd9658e4c (MD5) Previous issue date: 2017-05-08 / Na área da saúde, as tecnologias digitais fornecem recursos para a geração, controle, manutenção e arquivamento dos dados vitais dos pacientes, pesquisas biomédicas, captura e disponibilização de imagens diagnósticas. Ao criar grandes bancos de dados sobre a saúde das pessoas, o processamento das informações contidas no prontuário do paciente permitirá uma nova visão a respeito do conhecimento atual do processo de diagnóstico médico. Existem diversos problemas nessa área, porque o acesso ao prontuário analógico é complicado, e em formato eletrônico não está disponível para todos, apesar do conhecido potencial desses documentos como fonte informacional. Uma das formas para a organização desse conhecimento é por meio da mineração de textos, que possibilita o processamento dos dados descritos em linguagem natural. Entretanto, é preciso levar em consideração o fato da redação médica não poder ser padronizada, embora exista a normativa do Conselho Federal de Medicina que orienta nessa direção. É neste contexto, que esta pesquisa se norteia com o objetivo básico de investigar a aplicabilidade da metodologia de mineração de textos para a extração de informações provenientes da anamnese de prontuários eletrônicos do paciente divulgados no ciberespaço visando a qualidade na recuperação de informações. Trata-se de uma pesquisa de cunho exploratório, tendo-se realizado a mineração de textos sobre um conjunto de 46 anamneses divulgadas no ciberespaço visando a recuperação de informação. Em seguida, fez-se um cotejamento com os dados recuperados de forma manual, efetuando-se a interpretação da linguagem de comunicação médico-paciente. Esses dois resultados foram registrados em um protótipo construído e simulando o ambiente de um consultório médico. Os resultados evidenciam que a utilização da mineração de texto como ferramenta de extração na busca e recuperação de informações em saúde encontrou diversas dificuldades decorrentes das inúmeras formas de se redigir uma anamnese, além dos erros ortográficos, erros gramaticais, remoção de sufixos e prefixos, sinônimos, abreviações, siglas, símbolos, pontuações, termos e jargões médicos. Esse fato evidencia que ao se planejar um sistema computacional ele deve ser capaz de interpretar informações descritas de inúmeras formas, não excluindo palavras importantes ou ignorando aqueles relevantes que poderiam colocar em risco as ações de cuidados do paciente. Ao aplicar os processos de tokenization, remoção de stopwords, normalização morfológica, stemming e cálculo da relevância, conjuntamente contribuíram para que os termos resultantes fossem muito diferentes daqueles extraídos manualmente, ou seja, há ainda muitos desafios em cada uma dessas etapas na busca da qualidade na recuperação de informações concernente à anamnese. Conclui-se que embora a mineração seja uma ferramenta útil ao se tratar de textos estruturados e de outros domínios, quando aplicada a anamnese que é um texto mais livre tal ferramenta deixa a desejar, posto que ao se tratar da área da saúde, a redução de termos compostos, bem como a utilização de siglas, símbolos, abreviaturas ou outra forma de redução linguística trará interferências danosas para a recuperação de informação. A construção do protótipo ilustra a criação de uma ferramenta leve e intuitiva aplicando os conceitos discutidos nessa dissertação, além de se tornar o pontapé inicial de trabalhos futuros. / In the health area, digital technologies provide resources for the generation, control, maintenance and vital patient data archiving biomedical research, diagnostic images capture and availability. By creating large databases on people´s health records, processing the information contained in the patient's medical record, will provide a new insight into current knowledge of the medical diagnostic process. There are several problems in this area, because the access to analogical records is very complex and electronic format is not available for all of them, despite the known potential of these documents as informational source. One of the ways to arrange this knowledge is by the text mining which enables the data processing in natural language. However, it is necessary to consider the fact that medical writing cannot be standardized, although there is a Federal Council of Medicine policy that directs to that path. This is the context which this research is guided by the basic goal of investigating the methodology applicability of text mining for extracting information from the anamnesis of patients' electronic medical records divulged in cyberspace and aiming at the quality of information retrieval. This is an exploratory research, with texts mining on a set of 46 anamnesis published in cyberspace aimed at information retrieval. Then, a comparison was made with the data retrieved manually, to the interpretation of the medical-patient communication language. Those two results were recorded in a prototype built and simulating the environment of a doctor's office. The results show that the use of text mining as an extraction tool in the search and retrieval of health information has found several difficulties due to the numerous ways of writing an anamnesis, besides spelling errors, grammatical errors, deletion of suffixes and prefixes, synonyms, abbreviations, acronyms, symbols, punctuations, medical terms and jargon. It shows that when planning a computer system, it should be able to interpret information described in different ways, not excluding important words or ignoring relevant ones that could jeopardize patient care actions. By applying the processes of tokenization, stopwords, morphological normalization, stemming and calculus of relevance, altogether contributed to showing that the resulting terms were very different from those extracted manually. There are still many challenges in each of those steps concerning quality in the anamnesis information retrieval. Concluding that although mining is a useful tool when dealing with structured texts and other domains, when applied to anamnesis, which is a freer text, such tool lacks efficiency, since in health area the compound terms reduction, as well as the use of acronyms, symbols, abbreviations or other forms of linguistic reduction will bring harmful interference to the retrieval of information. The prototype is a light and intuitive tool applied to the concepts discussed on this dissertation, which way become the kickoff of a future project. Mineração de textos Recuperação de informação Prontuário eletrônico de paciente Ciência da Informação Ciência da saúde Text mining Information retrieval Electronic patient record Information Science Health science
580	Les conversations des internautes. Approche pragmatique d'acquisition de connaissances à partir de conversations textuelles pour la recherche marketing / Conversations of internet users. A Pragmatic Approach to knowledge Acquisition from Textual Conversations for Marketing Research Leenhardt, Marguerite 17 January 2017 (has links) Ce travail de recherche s'inscrit dans le cadre des méthodes de la linguistique de corpus et procède des besoins d'exploitation formulés dans le domaine du marketing à l'égard des conversations des internautes. Deux pistes sont poursuivies, la première relevant de leur description du point de vue de l'analyse des conversations et de la textométrie, la seconde visant des applications pratiques relatives à la fouille de textes. Une méthode de description systématique et automatisable est proposée, à partir de laquelle un procédé de mesure de l'engagement conversationnel des participants est mis en œuvre. L'étude des diagrammes d'engagement conversationnel (DEC) produits à partir de cette mesure permet d'observer des régularités typologiques dans les postures manifestées par les participants. Ce travail met également en exergue l'apport de la méthode textométrique pour l'acquisition de connaissances utiles à des fins de catégorisation automatique. Plusieurs analyses textométriques sont utilisées (spécificités, segments répétés, inventaires distributionnels) pour élaborer un modèle de connaissance dédié à la détection des intentions d'achat dans des fils de discussion issus d'un forum automobile. Les résultats obtenus, encourageants malgré la rareté des signaux exploitables au sein du corpus étudié, soulignent l'intérêt d'articuler des techniques d'analyse textométrique et de fouille de données textuelles au sein d'un même procédé d'acquisition de connaissances pour l'analyse automatique des conversations des internautes. / This research is part of the methods of corpus linguistics and proceeds from the needs expressed in the field of marketing regarding conversations of internet users. Two lines of research are investigated, the first falling under the perspective of conversation analysis and textometry, the second focuses on practical applications for text mining. A systematic and automated description is provided, from which a method of measuring participants' conversational engagement is implemented. The study of conversational engagement diagrams (CED) produced from this measure allows to observe typological regularities regarding how participants position themselves in conversations. This work also highlights the contribution of the textometric method for acquiring useful knowledge for supervised classification. Several textometric measures are used (specificity, repeated segments, distributional inventories) to develop a knowledge model for the detection of purchase intentions in discussions threads from an automotive forum. The results, encouraging despite the scarcity of usable signals in the corpus, underline the importance of articulating textometric analysis techniques and text mining in the same process of knowledge acquisition for automatic analysis of conversations of internet users. Textométrie Analyse des conversations Recherche marketing Veille marketing Fouille de textes Acquisition de connaissances Catégorisation automatique supervisée Mesure de l'engagement conversationnel Textometry Conversation Marketing research Marketing intelligence Text mining Knowledge acquisition Supervised acquisition Measure of conversational engagement

Search results