Global ETD Search

571	Insight : uma abordagem guiada pela informação para análise qualitativa com suporte de visualização e mineração de texto Hernandes, Elis Cristina Montoro 25 August 2014 (has links) Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-02-10T11:21:07Z No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-13T18:33:05Z (GMT) No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-13T18:33:16Z (GMT) No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) / Made available in DSpace on 2017-03-13T18:46:36Z (GMT). No. of bitstreams: 1 TeseECMH.pdf: 5016372 bytes, checksum: 1ae169c4a370a647c5616953817c07a6 (MD5) Previous issue date: 2014-08-25 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Instituto Nacional de Estudos e Pesquisas Educacionais (INEP) / Usually, experimental studies that are conducted to generate evidences on the different scientific fields produce many qualitative data to be analyzed by researchers. For instance, this is the case of defects lists generated from the software inspection activity, which defects consist of qualitative data that should be discussed and classified. This scenario was experienced during the Readers Project, where defects lists have token long time to be analyzed, sometimes hours or days, making the decisions about defects difficult. Moreover, feedback questionnaires deserved a careful analysis, since they carried relevant information about the experimental study. Aim: Driven by the described situation this research aimed to support the qualitative analysis conduction in an information-guided way, using visualization and text mining techniques to enable it. This approach allows software inspection defects described similarly be analyzed together, due to the use of these techniques, which may allow the homogeneity of the decisions about them. The same happens in the context of feedback questionnaires. Methodology: Based on the objective, the ways that visualization and text mining could contribute to become qualitative analysis more effective and efficient were investigated and designed. Considering it, the Insight tool was developed to enable the information-guided qualitative analysis. Experimental studies were conducted to validate and evaluate the approach on different contexts. Results: Four experimental studies were conducted: (i) a feasibility study conducted with text documents which gave evidences that the use of visualization and text mining met previous expectations and was feasible to be employed; (ii) a study in the context of software inspection which, although the quantitative data did not present statistical significance, the descriptive statistics and feedback questionnaires analysis gave evidences that visualization and text mining makes the qualitative analysis effective and efficient; (iii) a study in the context of questionnaires, which results were similar to the previous study; (iv) a case study in the context of qualitative analysis of long text documents which gave evidences about the utility and ease of use of the approach. Conclusion: Under the different contexts of evaluation, the Insight, an information-guided qualitative analysis approach that is based on visualization and text mining, gave evidences about the improvements on efficiency and effectiveness. Adding to its relevance in software engineering area, this research also contributes to other scientific fields, which often claim for technological support to conduct their research. / Em geral, os estudos experimentais, que são responsáveis pela construção de evidências nas várias áreas da ciência, geram grande volume de dados qualitativos que devem ser analisados. Esse é o caso, por exemplo, de listas de defeitos derivadas de atividades de inspeção, cujos defeitos correspondem a dados qualitativos que devem ser discutidos e analisados. Esse cenário ocorreu no Projeto Readers, em que as listas de defeitos consumiam horas e dias para serem analisadas, o que dificultava a decisão sobre eles. Além disso, os questionários de feedback também mereciam análise cuidadosa, pois traziam informações relevantes sobre o estudo experimental. Objetivo: Motivado pela situação descrita, esta pesquisa teve o objetivo de apoiar a condução da técnica Coding de forma que a análise dos dados seja feita guiada pela informação, utilizando para isso visualização e mineração de texto. Isso possibilita que, no contexto de inspeção de software, relatos de defeitos semelhantes sejam tratados em um mesmo momento, em decorrência do uso dessas técnicas, fazendo que as decisões sejam homogêneas. O mesmo acontece com informações do questionário de feedback. Metodologia: Com base no objetivo estabeleceram-se as formas com que a visualização e mineração de texto poderiam contribuir para tornar a análise dos dados mais efetiva e eficiente. Com base nessas definições a ferramenta Insight foi desenvolvida para tornar viável a codificação guiada pela informação. Estudos experimentais foram conduzidos para validar e avaliar a tese em diferentes contextos. Resultados: Foram realizados quatro estudos experimentais: (i) estudo de viabilidade conduzido com documentos textuais que evidenciou que o uso de visualização e mineração de texto atendia à expectativa e era viável de ser adotado; (ii) estudo realizado no contexto de reunião de inspeção que, embora não tenha apresentado significância estatística dos resultados, evidenciou por meio das análises descritivas e análise dos questionários de feedback que o uso de visualização e mineração de texto para esta atividade a torna mais efetiva e eficiente; (iii) estudo realizado no contexto de questionários, cujos resultados foram semelhantes ao anterior; (iv) estudo de caso feito no contexto de análise qualitativa que evidenciou a utilidade e facilidade de uso da abordagem para analisar documentos textuais extensos. Conclusão: Nos diferentes contextos em que foi avaliada, a abordagem de análise qualitativa guiada pela informação, baseada em visualização e mineração de texto, evidenciou melhorias na efetividade e eficiência da atividade. Além da relevância da pesquisa para a área de Engenharia de Software, ressalta-se que ela é uma contribuição para outras áreas do conhecimento, muitas vezes carentes de suporte tecnológicos para a condução de suas pesquisas. Engenharia de software Análise qualitativa Engenharia de software experimental Mineração de textos Questionário de feedback Inspeção de software Qualitative analysis Coding Text mining Software inspection meeting Feedback questionnaires Tree-map Experimental software engineering
572	Aplicação de técnicas de mineração de texto na recuperação de informação clínica em prontuário eletrônico do paciente / Application of text mining techniques in clinical information retrieval in the electronic patient record Carvalho, Ricardo César de [UNESP] 08 May 2017 (has links) Submitted by RICARDO CÉSAR DE CARVALHO (ricdon@gmail.com) on 2017-06-02T04:41:34Z No. of bitstreams: 1 Mestrado_Ricardo_Carvalho.pdf: 4464660 bytes, checksum: ba1819b77212278eb1a2808fd9658e4c (MD5) / Approved for entry into archive by Luiz Galeffi (luizgaleffi@gmail.com) on 2017-06-02T13:47:17Z (GMT) No. of bitstreams: 1 carvalho_rc_me_mar.pdf: 4464660 bytes, checksum: ba1819b77212278eb1a2808fd9658e4c (MD5) / Made available in DSpace on 2017-06-02T13:47:17Z (GMT). No. of bitstreams: 1 carvalho_rc_me_mar.pdf: 4464660 bytes, checksum: ba1819b77212278eb1a2808fd9658e4c (MD5) Previous issue date: 2017-05-08 / Na área da saúde, as tecnologias digitais fornecem recursos para a geração, controle, manutenção e arquivamento dos dados vitais dos pacientes, pesquisas biomédicas, captura e disponibilização de imagens diagnósticas. Ao criar grandes bancos de dados sobre a saúde das pessoas, o processamento das informações contidas no prontuário do paciente permitirá uma nova visão a respeito do conhecimento atual do processo de diagnóstico médico. Existem diversos problemas nessa área, porque o acesso ao prontuário analógico é complicado, e em formato eletrônico não está disponível para todos, apesar do conhecido potencial desses documentos como fonte informacional. Uma das formas para a organização desse conhecimento é por meio da mineração de textos, que possibilita o processamento dos dados descritos em linguagem natural. Entretanto, é preciso levar em consideração o fato da redação médica não poder ser padronizada, embora exista a normativa do Conselho Federal de Medicina que orienta nessa direção. É neste contexto, que esta pesquisa se norteia com o objetivo básico de investigar a aplicabilidade da metodologia de mineração de textos para a extração de informações provenientes da anamnese de prontuários eletrônicos do paciente divulgados no ciberespaço visando a qualidade na recuperação de informações. Trata-se de uma pesquisa de cunho exploratório, tendo-se realizado a mineração de textos sobre um conjunto de 46 anamneses divulgadas no ciberespaço visando a recuperação de informação. Em seguida, fez-se um cotejamento com os dados recuperados de forma manual, efetuando-se a interpretação da linguagem de comunicação médico-paciente. Esses dois resultados foram registrados em um protótipo construído e simulando o ambiente de um consultório médico. Os resultados evidenciam que a utilização da mineração de texto como ferramenta de extração na busca e recuperação de informações em saúde encontrou diversas dificuldades decorrentes das inúmeras formas de se redigir uma anamnese, além dos erros ortográficos, erros gramaticais, remoção de sufixos e prefixos, sinônimos, abreviações, siglas, símbolos, pontuações, termos e jargões médicos. Esse fato evidencia que ao se planejar um sistema computacional ele deve ser capaz de interpretar informações descritas de inúmeras formas, não excluindo palavras importantes ou ignorando aqueles relevantes que poderiam colocar em risco as ações de cuidados do paciente. Ao aplicar os processos de tokenization, remoção de stopwords, normalização morfológica, stemming e cálculo da relevância, conjuntamente contribuíram para que os termos resultantes fossem muito diferentes daqueles extraídos manualmente, ou seja, há ainda muitos desafios em cada uma dessas etapas na busca da qualidade na recuperação de informações concernente à anamnese. Conclui-se que embora a mineração seja uma ferramenta útil ao se tratar de textos estruturados e de outros domínios, quando aplicada a anamnese que é um texto mais livre tal ferramenta deixa a desejar, posto que ao se tratar da área da saúde, a redução de termos compostos, bem como a utilização de siglas, símbolos, abreviaturas ou outra forma de redução linguística trará interferências danosas para a recuperação de informação. A construção do protótipo ilustra a criação de uma ferramenta leve e intuitiva aplicando os conceitos discutidos nessa dissertação, além de se tornar o pontapé inicial de trabalhos futuros. / In the health area, digital technologies provide resources for the generation, control, maintenance and vital patient data archiving biomedical research, diagnostic images capture and availability. By creating large databases on people´s health records, processing the information contained in the patient's medical record, will provide a new insight into current knowledge of the medical diagnostic process. There are several problems in this area, because the access to analogical records is very complex and electronic format is not available for all of them, despite the known potential of these documents as informational source. One of the ways to arrange this knowledge is by the text mining which enables the data processing in natural language. However, it is necessary to consider the fact that medical writing cannot be standardized, although there is a Federal Council of Medicine policy that directs to that path. This is the context which this research is guided by the basic goal of investigating the methodology applicability of text mining for extracting information from the anamnesis of patients' electronic medical records divulged in cyberspace and aiming at the quality of information retrieval. This is an exploratory research, with texts mining on a set of 46 anamnesis published in cyberspace aimed at information retrieval. Then, a comparison was made with the data retrieved manually, to the interpretation of the medical-patient communication language. Those two results were recorded in a prototype built and simulating the environment of a doctor's office. The results show that the use of text mining as an extraction tool in the search and retrieval of health information has found several difficulties due to the numerous ways of writing an anamnesis, besides spelling errors, grammatical errors, deletion of suffixes and prefixes, synonyms, abbreviations, acronyms, symbols, punctuations, medical terms and jargon. It shows that when planning a computer system, it should be able to interpret information described in different ways, not excluding important words or ignoring relevant ones that could jeopardize patient care actions. By applying the processes of tokenization, stopwords, morphological normalization, stemming and calculus of relevance, altogether contributed to showing that the resulting terms were very different from those extracted manually. There are still many challenges in each of those steps concerning quality in the anamnesis information retrieval. Concluding that although mining is a useful tool when dealing with structured texts and other domains, when applied to anamnesis, which is a freer text, such tool lacks efficiency, since in health area the compound terms reduction, as well as the use of acronyms, symbols, abbreviations or other forms of linguistic reduction will bring harmful interference to the retrieval of information. The prototype is a light and intuitive tool applied to the concepts discussed on this dissertation, which way become the kickoff of a future project. Mineração de textos Recuperação de informação Prontuário eletrônico de paciente Ciência da Informação Ciência da saúde Text mining Information retrieval Electronic patient record Information Science Health science
573	Les conversations des internautes. Approche pragmatique d'acquisition de connaissances à partir de conversations textuelles pour la recherche marketing / Conversations of internet users. A Pragmatic Approach to knowledge Acquisition from Textual Conversations for Marketing Research Leenhardt, Marguerite 17 January 2017 (has links) Ce travail de recherche s'inscrit dans le cadre des méthodes de la linguistique de corpus et procède des besoins d'exploitation formulés dans le domaine du marketing à l'égard des conversations des internautes. Deux pistes sont poursuivies, la première relevant de leur description du point de vue de l'analyse des conversations et de la textométrie, la seconde visant des applications pratiques relatives à la fouille de textes. Une méthode de description systématique et automatisable est proposée, à partir de laquelle un procédé de mesure de l'engagement conversationnel des participants est mis en œuvre. L'étude des diagrammes d'engagement conversationnel (DEC) produits à partir de cette mesure permet d'observer des régularités typologiques dans les postures manifestées par les participants. Ce travail met également en exergue l'apport de la méthode textométrique pour l'acquisition de connaissances utiles à des fins de catégorisation automatique. Plusieurs analyses textométriques sont utilisées (spécificités, segments répétés, inventaires distributionnels) pour élaborer un modèle de connaissance dédié à la détection des intentions d'achat dans des fils de discussion issus d'un forum automobile. Les résultats obtenus, encourageants malgré la rareté des signaux exploitables au sein du corpus étudié, soulignent l'intérêt d'articuler des techniques d'analyse textométrique et de fouille de données textuelles au sein d'un même procédé d'acquisition de connaissances pour l'analyse automatique des conversations des internautes. / This research is part of the methods of corpus linguistics and proceeds from the needs expressed in the field of marketing regarding conversations of internet users. Two lines of research are investigated, the first falling under the perspective of conversation analysis and textometry, the second focuses on practical applications for text mining. A systematic and automated description is provided, from which a method of measuring participants' conversational engagement is implemented. The study of conversational engagement diagrams (CED) produced from this measure allows to observe typological regularities regarding how participants position themselves in conversations. This work also highlights the contribution of the textometric method for acquiring useful knowledge for supervised classification. Several textometric measures are used (specificity, repeated segments, distributional inventories) to develop a knowledge model for the detection of purchase intentions in discussions threads from an automotive forum. The results, encouraging despite the scarcity of usable signals in the corpus, underline the importance of articulating textometric analysis techniques and text mining in the same process of knowledge acquisition for automatic analysis of conversations of internet users. Textométrie Analyse des conversations Recherche marketing Veille marketing Fouille de textes Acquisition de connaissances Catégorisation automatique supervisée Mesure de l'engagement conversationnel Textometry Conversation Marketing research Marketing intelligence Text mining Knowledge acquisition Supervised acquisition Measure of conversational engagement
574	A interpretação semântica de textos científicos em português na perspectiva da Ciência da Informação: procedimentos e aplicação à área de Ciências Agrárias / A interpretação semântica de textos científicos em português: procedimentos e aplicações à área de Ciências Agrárias na perspectiva da Ciência da Informação CORRÊA, Dominique de Lira Vieira 29 February 2016 (has links) Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-08-04T18:54:26Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) DissertacaoFinalDominiqueDigital.pdf: 1809626 bytes, checksum: 0394869923ec4dde774f79a5ec5290de (MD5) / Made available in DSpace on 2016-08-04T18:54:26Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) DissertacaoFinalDominiqueDigital.pdf: 1809626 bytes, checksum: 0394869923ec4dde774f79a5ec5290de (MD5) Previous issue date: 2016-02-29 / Facepe / A presente pesquisa se desenvolveu no âmbito do Observatório Temático e Laboratório – Ensino, Tecnologia, Ciência e Informação (OtletCI) com a intensão de avançar na questão de como extrair informação relevante e de como representá-la para fins de recuperação semântica da informação, em particular no caso de textos de publicações científicas em português. Para tanto, como metodologia, investigou-se a tecnologia da busca semântica quanto aos fundamentos teóricos, sua utilidade no contexto do OtletCI e requisitos para aplicação em textos científicos em português. Como experimento, buscou-se explicitar os requisitos da busca semântica para a aplicação em textos científicos, através da análise da extração de relacionamentos semânticos do tipo “causa e efeito” em 60 resumos, em português, de artigos científicos da área de Ciências Agrárias. O estudo apresentou, por meio de considerações de ordem qualitativa e quantitativa, uma comparação entre o processo manual e automático de extração de sentenças de causa e efeito. Esses documentos foram previamente analisados de forma manual, e as sentenças de causa e efeito foram extraídas através da leitura dos resumos. Para o processo automático, com os dados transferidos do software PALAVRAS para a planilha do Excel, foi possível realizar uma programação para localizar sentenças de causa e efeito automaticamente. O objetivo foi comparar as sentenças identificadas diretamente pelo pesquisador e as sentenças reconstruídas automaticamente a partir do conjunto de células programadas. Conclui-se enfatizando que a possibilidade de usar técnicas automáticas acelera o processo de criação e extração de relações de causa e efeito e pode ser usada como alternativa ao processo custoso de identificação manual de informações semânticas. Porém, mais importante que propor uma estrutura de relações de causa e efeito para a construção de sistemas de busca, o que pode-se apontar como o resultado mais expressivo da presente pesquisa é o estabelecimento preliminar de rotinas para a versão automatizada. / This research is developed within the Thematic Observatory and Laboratory - Education, Technology, Science and Information (OtletCI) with the intention to move forward on the question of how to extract relevant information and how to represent it for purposes of semantic retrieval of information, particularly in the case of texts of scientific publications in Portuguese. Therefore, as a methodology, we investigated the semantic search technology based on the theoretical foundations, its usefulness in the context of OtletCI and requirements for application in scientific texts in Portuguese. As an experiment, we tried to clarify the semantic search requirements for the application of scientific texts by analyzing the extraction of semantic relationships such as "cause and effect" in 60 abstracts, in Portuguese, of scientific articles in the area of Agricultural Sciences. The study shows, through qualitative and quantitative considerations, a comparison between manual and automatic extraction process of cause and effect sentences. These documents were previously analyzed manually, and the sentences of cause and effect were extracted by reading the summaries. For automatic process, with data transferred from PALAVRAS software to the Excel spreadsheet, it was possible to carry out a program to find cause and effect sentences automatically. The goal was to buy the sentences identified directly by the researcher and sentences automatically reconstructed from the set of programmed cells. The research concludes emphasizing that the possibility of using automatic techniques accelerates the process of creating and extracting of cause and effect relationship and may be used as an alternative to costly manual process of identifying semantic information. However, more important than to propose a structure of cause and effect relationships for building search engines, we can point out as the most significant result of this research the preliminary establishment of routines for automated version.
575	Uma Metodologia para Mineração de Regras de Associação Usando Ontologias para Integração de Dados Estruturados e Não-Estruturados / A Methodology for Mining Association Rules Using Ontologies for Integrating Structured and Non-Structured Data CAMILO, Cassio Oliveira 23 August 2010 (has links) Made available in DSpace on 2014-07-29T14:57:46Z (GMT). No. of bitstreams: 1 dissertacao cassio o camilo.pdf: 2631871 bytes, checksum: 70087ec16670e8999d58da53330104f4 (MD5) Previous issue date: 2010-08-23 / Data and text mining methods have been applied in several areas of knowledge with the purpose of extracting useful information from large data volumes. Among the various data mining methods reported by specialized literature, association rule mining has proved useful in producing understandable rules. However, one of its major problems is the significant amount of rules produced, which hampers the selection of the more relevant rules needed to reply to a query. This study proposes a method for mining data from structured and unstructured sources in order to generate association rules between the terms extracted. The process of mining data from unstructured sources is assisted by an ontology that maps knowledge from a specific domain. The result of such process is converted into structured data and combined with data from other structured sources. A combination of objective and subjective interest measures is used to filter the set of rules obtained, in addition to support and confidence model. To verify the feasibility of this method in real-life situations, it was applied to a database of police occurrence reports of a government institution, which included data stored in structured and unstructured sources. / Métodos de mineração de dados e mineração de textos têm sido aplicados em diversas áreas do conhecimento para recuperação de informações úteis a partir de grandes volumes de dados. Dentre os diversos métodos de mineração de dados propostos na literatura, a mineração de regras de associação tem sido de grande utilidade. Entretanto, um dos grandes problemas gerados pela aplicação deste método sobre um grande volume de dados é, em geral, a produção de uma quantidade significativa de regras, dificultando a escolha daquelas mais relevantes para responder a uma consulta. O presente trabalho propõe uma metodologia para minerar dados de fontes estruturadas e não estruturadas, visando gerar regras de associação entre termos extraídos dessas fontes. O processo de mineração de dados de fontes não-estruturadas é auxiliado por uma Ontologia para mapear conhecimentos de um domínio específico. O resultado desta etapa é convertido para uma representação estruturada, e é então combinado com os dados obtidos de outras fontes estruturadas. Além do modelo de suporte e confiança, utiliza-se uma combinação das medidas de interesse objetivas e subjetivas para filtrar o conjunto de regras obtido. Para analisar sua viabilidade em situações reais, a metodologia proposta neste trabalho foi submetida à aplicação de ocorrências policiais de uma instituição governamental, sob conjuntos de dados armazenados em fontes estruturadas e não estruturadas. Mineração de Dados Mineração de Texto Recuperação de Informação Extração de Informação Conceitos Ontologia Regras de Associação Data mining Text mining Information Retrievel Information Extraction Concept Ontology Association Rules
576	BOOKISH: Uma ferramenta para contextualização de documentos utilizando mineração de textos e expansão de consulta / BOOKISH: A tool for background documents using text mining and query expansion SILVA, Luciana Oliveira e 14 August 2009 (has links) Made available in DSpace on 2014-07-29T14:57:51Z (GMT). No. of bitstreams: 1 dissertacao Luciana Oliveira.pdf: 4515929 bytes, checksum: 79519bd2538c588dba8b9d903a04d8f4 (MD5) Previous issue date: 2009-08-14 / The continuous development of technology and its dissemination in all domains have caused significant changes in society and in education. The new global society demands new skills and provides an opportunity to introduce new technologies into the educational process, improving traditional education systems. The focus should be on the search for information, significant research, and on the development of projects, rather than on the pure transmission of content. When delivering a lecture about a given content, teachers often provide additional sources that will help students deepen their understanding of the subject and carry out activities. Furthermore, it is desirable to have proactive students, capable of interpreting and identifying other sources of information that complement and expand the subject being studied. However, one of the challenges today is information overload - there are many documents available and few effective ways to treat them. Every day, large numbers of documents are stored and made available. These documents contain a lot of relevant information. However finding that knowledge is a difficult task. The BOOKISH system, proposed in this work, assists students in their search activity. Analyzing PowerPoint slide presentations, the tool identifies contextually similar electronic documents, minimizing the time spent in searching for additional relevant material and directing the student to the content he needs. The tool presented in this document uses text mining techniques and automatic query expansion. / O contínuo desenvolvimento da tecnologia e sua disseminação em todas as áreas têm provocado mudanças significativas na sociedade e na educação. É preciso buscar a formação necessária às novas competências do mundo globalizado e considerar que o momento proporciona uma oportunidade de aproximar novas tecnologias ao processo educativo como possibilidade de melhorar os sistemas de ensino tradicionais. O foco deve ser a busca da informação significativa e da pesquisa, o desenvolvimento de projetos e não predominantemente a simples transmissão de conteúdo. Ao ministrar conteúdo de determinada disciplina, o professor muitas vezes disponibiliza fontes complementares que ajudam na compreensão do tema e auxiliam os alunos na execução de atividades. Já o aluno, dentro de uma abordagem pró-ativa, deve ser capaz de interpretar e identificar outras fontes que melhor complementem e expandam assunto. No entanto, um dos desafios atuais é a sobrecarga de informação - são muitos documentos à disposição e poucas formas eficientes de tratá-los. O sistema BOOKISH, proposto neste trabalho, busca auxiliar os alunos na atividade de identificar e filtrar informações relevantes e dentro do contexto que está sendo estudado em sala de aula. A partir de apresentações em forma de slides disponibilizados pelos professores, a ferramenta identifica documentos eletrônicos contextualmente semelhantes e os disponibiliza para os alunos. É objetivo minimizar o tempo gasto nas atividades de busca por material complementar relevante e direcionar o aluno para o conteúdo do qual necessita. A ferramenta apresentada neste trabalho utiliza técnicas de mineração de textos e expansão automática de consultas com esta finalidade. Mineração de Textos Expansão de Consulta Text Mining Query Expansion
577	The Information Value of Unstructured Analyst Opinions / Studies on the Determinants of Information Value and its Relationship to Capital Markets Eickhoff, Matthias 29 June 2017 (has links) No description available. 330 Analyst Opinion Sentiment Analysis Topic Modelling Information Value Media Richness Theory Wisdom of Crowds Capital Markets Decision Support Decision Support Systems Text Mining Data Mining Unstructured Data Financial Decision Support Systems Wirtschaftswissenschaften (PPN621567140)
578	應用文本主題與關係探勘於多文件自動摘要方法之研究：以電影評論文章為例 / Application of text topic and relationship mining for multi-document summarization: using movie reviews as an example 林孟儀 Unknown Date (has links) 由於網際網路的普及造成資訊量愈來愈大，在資訊的搜尋、整理與閱讀上會耗費許多時間，因此本研究提出一應用文本主題及關係探勘的方法，將多份文件自動生成一篇摘要，以幫助使用者能降低資訊的閱讀時間，並能快速理解文件所欲表達之意涵。本研究以電影評論文章為例，結合文章結構的概念，將影評摘要分為「電影資訊」、「電影劇情介紹」及「心得結論」三部分，其中「電影資訊」及「心得結論」為透過本研究建置之電影領域相關詞庫比對得出。接著將餘下之段落歸屬於「電影劇情介紹」，並透過LDA主題模型將段落分群，再運用主題關係地圖的概念挑選各群之代表段落並排序，最後將各段落去除連接詞及將代名詞還原為其所指之主詞，以形成一篇列點式影評摘要。研究結果顯示，本研究所實驗之三部電影，產生之摘要能涵蓋較多的資訊內容，提升了摘要之多樣性，在與最佳範本摘要的相似度比對上，分別提升了10.8228%、14.0123%及25.8142%，可知本研究方法能有效掌握文件之重點內容，生成之摘要更為全面，藉由此方法讓使用者自動彙整電影評論文章，以生成一精簡之摘要，幫助使用者節省其在資訊的搜尋及閱讀的時間，以便能快速了解相關電影之資訊及評論。 / The rapid development of information technology over the past decades has dramatically increased the amount of online information. Because of the time-wasting on absorbing large amounts of information for users, we would like to present a method in this thesis by using text topic and relationship mining for multi-document summarization to help users grasp the theme of multiple documents quickly and easily by reading the accurate summary without reading the whole documents. We use movie reviews as an example of multi-document summarization and apply the concept of article structures to categorize summary into film data, film orientation and conclusion by comparing the thesaurus of movie review field built by this thesis. Then we cluster the paragraphs in the structure of film orientation into different topics by Latent Dirichlet Allocation (LDA). Next, we apply the concept of text relationship map, a network of paragraphs and the node in the network referring to a paragraph and an edge indicating that the corresponding paragraphs are related to each other, to extract the most important paragraph in each topic and order them. Finally, we remove conjunctions and replace pronouns with the name it indicates in each extracted paragraph s and generate a bullet-point summary. From the result, the summary produced by this thesis can cover different topics of contents and improve the diversity of the summary. The similarities compared with the produced summaries and the best-sample summaries raise of 10.8228%, 14.0123% and 25.8142% respectively. The method presented in this thesis grasps the key contents effectively and generates a comprehensive summary. By providing this method, we try to let users aggregate the movie reviews automatically and generate a simplified summary to help them reduce the time in searching and reading articles. 文字探勘多文件自動摘要 LDA主題模型主題關係地圖 Text mining Multi-document summarization LDA Topic Model Text relationship map
579	輿論對外匯趨勢的影響 / The effects of public opinions on exchange rate movements 林子翔, Lin, Tzu Hsiang Unknown Date (has links) 本研究要探討的是在新聞、論壇和社群媒體討論的相關訊息是否真的會影響匯率的運動的假設。對於這樣的研究目標，我們建立了一個實驗，首先以文字探勘技術應用在新聞、論壇與社群媒體來產生與匯率相關的數值表示。接著，機器學習技術應用於學習得到的數值表示和匯率波動之間的關係。最後，我們證明透過檢驗所獲得的關係的有效性的假設。在此研究中，我們提出一種兩階段的神經網路來學習與預測每日美金兌台幣匯率的走勢。不同於其他專注於新聞或者社群媒體的研究，我們將他們進行整合，並將論壇的討論納為輸入資料。不同的資料組合產生出多種觀點，而三個資料來源的不同組合可能會以不同的方式影響預測準確率。透過該方法，初步實驗的結果顯示此方法優於隨機漫步模型。 / This study wants to explore the hypothesis that the relevant information in the news, the posts in forums and discussions on the social media can really affect the daily movement of exchange rates. For such study objective, we set up an experiment, where the text mining technique is first applied to the news, the forum and the social media to generate numerical representations regarding the textual information relevant with the exchange rate. Then the machine learning technique is applied to learn the relationship between the derived numerical representations and the movement of exchange rates. At the end, we justify the hypothesis through examining the effectiveness of the obtained relationship. In this paper, we propose a hybrid neural networks to learn and forecast the daily movements of USD/TWD exchange rates. Different from other studies, which focus on news or social media, we integrate them and add the discussion of forum as input data. Different data combinations yield many views while different combination of three data sources might affect the forecasting accuracy rate in different ways. As a result of this method, the experiment result was better than random walk model. 文字探勘機器學習匯率類神經網路 TensorFlow 圖形處理器 Text mining Machine learning Exchange rates Artificial neural networks Tensorflow Graphic processing units
580	運用文字探勘技術分析金融科技之發展與趨勢 / Applying text mining techniques to the development and trends of fintech's patent 郝紹君, Hao, Shao Chun Unknown Date (has links) 現今科技日新月異，不斷突破創新，產業環境變動的步調也越來越快，新竄出之金融科技(Finance Technology)的應用，使得許多企業越加注重技術方面的研發創新，尤其，善加運用專利資訊能有效節省研發經費與時間。因此如何有效運用專利是企業維持競爭優勢不可或缺的一環。有鑑於此，本研究搜集近年各國專利資料庫之專利資料，將資料分為三個時期，並區分申請中與已申請之專利資料，透過文字探勘技術與機會探索分析出金融科技之發展與趨勢，了解各時期詞彙間之關聯性與差異，再搭配視覺化工具KeyGraph，以描繪出金融科技領域之相關詞彙關聯趨勢圖，挖掘未來潛在趨勢。本研究之結果了解金融科技在各時期的趨勢發展變化與尋求脈絡，以及過去各時期之專利佈局，因而從結果中發現金融科技之發展方向主體為支付領域，許多支付科技接連出現在三個時期中。然而近幾年，其他金融領域如投資、融資、保險、資料分析等也漸漸浮出，從本研究之第三個時期的高頻字詞高達34個可看出，可見金融科技之專利發展佈局已快速從支付領域拓展至其他金融領域。本研究所挖掘出之潛在趨勢顯示了未來金融科技領域中將會有五大重點發展領域，分別為服務整合領域之雲端科技、支付領域之生物辨識與穿戴支付與加密貨幣、資料分析領域之機器學習與人工智慧、信息收集與處理領域之遠程信息處理科技、以及理財投資領域之理財機器人。期望本研究結果能幫助企業，在面臨新科技不斷衝擊產業，而產業不斷尋求創新發展之下，能夠快速檢閱目前市場趨勢，藉此釐清並改善自身之發展策略，以因應外部環境之變動，提供企業作為金融科技發展之策略參考，也能有助於企業釐清與制定金融科技之投資方向，以擁有持續的競爭優勢。 / Nowadays, with the rapid advancement of information technologies, the changes of business environment and the way to deal with the changes are becoming faster and faster. The development and adoption of new financial technologies has made many enterprises pay more attention to the research and development (R&D) initiatives. Besides, making good use of patent information can effectively save the budget and time of R&D, so how to effectively use patent information is an indispensable part for enterprises to maintain their competitive advantages. This study collected the patent data from the national patent database, and divided the data into three periods, and distinguished the data between the applying and the applied patents. Through the text mining techniques and chance discovery, this study explored the development and trends of financial technology and also aimed to understand the relevance and differences between the major terms in each period. Then, with the visual tool, KeyGraph, this study illustrated the associations between related terms, and proposed the potential future trends based on the graphs. The results of this study help monitor the changes of the trends and financial technology’s development in the three periods, and understand the patent portfolios in each period. This study has found that the main direction of financial technology’s development is the payment field. Many technologies related to payment have successively appeared in the three periods. However, in recent years, other financial areas such as investment, financing, insurance, data analysis and other areas are gradually emerging, since we found 34 high-frequency terms in the third period. This also shows that the development of financial technology’s patent portfolios has expanded from payment to other financial areas. The potential trends of financial technology’s development in this study are five areas, namely, technologies of cloud, biometric and wearable payment and cryptocurrency, machine learning and artificial intelligence, telematics technology, and robo-advisors. It is expected that this study can serve as a reference for the development of financial technology, and help enterprises be able to quickly review their current market trends, clarify and improve their own R&D strategies to respond to the changes in the external environment. Also, it is hoped that the results can help enterprises clarify and develop their own investment directions to maintain competitive advantages. 文字探勘金融科技專利趨勢分析機會探索關鍵圖 Text mining Fintech Patent trend analysis Chance discovery KeyGraph

Search results