Global ETD Search

261	Disorderclassifier: classificação de texto para categorização de transtornos mentais NUNES, Francisca Pâmela Carvalho 23 August 2016 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-04-19T13:35:36Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) DISSERTAÇÃO_Franscisca Pamela Carvalho.pdf: 2272114 bytes, checksum: 83ff79a7d05409b93fe71ce4c307dc30 (MD5) / Made available in DSpace on 2017-04-19T13:35:36Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) DISSERTAÇÃO_Franscisca Pamela Carvalho.pdf: 2272114 bytes, checksum: 83ff79a7d05409b93fe71ce4c307dc30 (MD5) Previous issue date: 2016-08-23 / Nos últimos anos, através da Internet, a comunicação se tornou mais ampla e acessível. Com o grande crescimento das redes sociais, blogs, sites em geral, foi possível estabelecer uma extensa base de conteúdo diversificado, onde os usuários apresentam suas opiniões e relatos pessoais. Esses informes podem ser relevantes para observações futuras ou até mesmo para o auxílio na tomada de decisão de outras pessoas. No entanto, essa massa de informação está esparsa na Web, em formato livre, dificultando a análise manual dos textos para categorização dos mesmos. Tornar esse trabalho automático é a melhor opção, porém a compreensão desses textos em formato livre não é um trabalho simples para o computador, devido a irregularidades e imprecisões da língua natural. Nessas circunstâncias, estão surgindo sistemas que classificam textos, de forma automática, por tema, gênero, características, entre outros, através dos conceitos da área de Mineração de Texto (MT). A MT objetiva extrair informações importantes de um texto, através da análise de um conjunto de documentos textuais. Diversos trabalhos de MT foram sugeridos em âmbitos variados como, por exemplo, no campo da psiquiatria. Vários dos trabalhos propostos, nessa área, buscam identificar características textuais para percepção de distúrbios psicológicos, para análise dos sentimentos de pacientes, para detecção de problemas de segurança de registros médicos ou até mesmo para exploração da literatura biomédica. O trabalho aqui proposto, busca analisar depoimentos pessoais de potenciais pacientes para categorização dos textos por tipo de transtorno mental, seguindo a taxonomia DSM-5. O procedimento oferecido classifica os relatos pessoais coletados, em quatro tipos de transtorno (Anorexia, TOC, Autismo e Esquizofrenia). Utilizamos técnicas de MT para o pré-processamento e classificação de texto, com o auxilio dos pacotes de software do Weka. Resultados experimentais mostraram que o método proposto apresenta alto índice de precisão e que a fase de pré-processamento do texto tem impacto nesses resultados. A técnica de classificação Support Vector Machine (SVM) apresentou melhor desempenho, para os fins apresentados, em comparação a outras técnicas usadas na literatura. / In the last few years, through the internet, communication became broader and more accessible. With the growth of social media, blogs, and websites in general, it became possible to establish a broader, diverse content base, where users present their opinions and personal stories. These data can be relevant to future observations or even to help other people’s decision process. However, this mass information is dispersing on the web, in free format, hindering the manual analysis for text categorization. Automating is the best option. However, comprehension of these texts in free format is not a simple task for the computer, taking into account irregularities and imprecisions of natural language. Giving these circumstances, automated text classification systems, by theme, gender, features, among others, are arising, through Text Mining (MT) concepts. MT aims to extract information from a text, by analyzing a set of text documents. Several MT papers were suggested on various fields, as an example, psychiatric fields. A number of proposed papers, in this area, try to identify textual features to perceive psychological disorders, to analyze patient’s sentiments, to detect security problems in medical records or even biomedical literature exploration. The paper here proposed aim to analyze potential patient’s personal testimonies for text categorization by mental disorder type, according to DSM-5 taxonomy. The offered procedure classifies the collected personal testimonies in four disorder types (anorexia, OCD, autism, and schizophrenia). MT techniques were used for pre-processing and text classification, with the support of software packages of Weka. Experimental results showed that the proposed method presents high precision values and the text pre-processing phase has impact in these results. The Support Vector Machine (SVM) classification technique presented better performance, for the presented ends, in comparison to other techniques used in literature. Mineração de Texto Classiﬁcação de texto Depoimentos pessoais Transtorno mental Text Mining Text classiﬁcation Personal testimonies Mental disorder
262	Learning about corruption: a statistical framework for working with audit reports Pereira, Laura Sant’Anna Gualda 26 March 2018 (has links) Submitted by Laura Pereira (laurasgualda@gmail.com) on 2018-04-15T15:39:39Z No. of bitstreams: 1 Dissertacao_LauraGualda_Bib.pdf: 1147862 bytes, checksum: 1ba34dfb1e02e555a66410badfb0cbb5 (MD5) / Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2018-04-27T12:59:33Z (GMT) No. of bitstreams: 1 Dissertacao_LauraGualda_Bib.pdf: 1147862 bytes, checksum: 1ba34dfb1e02e555a66410badfb0cbb5 (MD5) / Made available in DSpace on 2018-05-08T14:43:18Z (GMT). No. of bitstreams: 1 Dissertacao_LauraGualda_Bib.pdf: 1147862 bytes, checksum: 1ba34dfb1e02e555a66410badfb0cbb5 (MD5) Previous issue date: 2018-03-26 / Quantitative studies aiming to disentangle public corruption eﬀects often emphasize the lack of objective information in this research area. The CGU Random Audits Anti-Corruption Program, based on extensive and unadvertised audits of transfers from the federal government to municipalities, emerged as a potential source to try to ﬁll this gap. Reports generated by these audits describe corrupt and mismanagement practices in detail, but reading and coding them manually is laborious and requires specialized people to do it. We propose a statistical framework to guide the use of text data to construct objective indicators of corruption and use it in inferential models. It consists of two main steps. In the ﬁrst one, we use machine learning methods for text classiﬁcation to create an indicator of corruption based on irregularities from audit reports. In the second step, we use this indicator in a regression model, accounting for the measurement error carried from the ﬁrst step. To validate this framework, we replicate an empirical strategy presented by Ferraz et al. (2012) to estimate eﬀects of corruption in educational funds on primary school students’ outcomes, between 2006 and 2015. We achieved an expected accuracy of 92% on the binary classiﬁcation of irregularities, and our results endorse Ferraz et al.. ﬁndings: students in municipal schools perform significantly worse on standardized tests in municipalities where was found corruption in education. / Estudos quantitativos em corrupção política enfatizam a falta de informações objetivas nessa área de pesquisa. O Programa de Fiscalização por Sorteios Públicos da CGU se baseia em auditorias não anunciadas das transferências do Governo Federal para municípios, e aparece como uma potencial solução para essa lacuna. Relatórios gerados durante essas auditorias descrevem com detalhe práticas de corrupção e de má gestão pública. No entanto, a análise manual desses relatórios é penosa e requer o conhecimento de especialistas. Nós propomos um framework estatístico para guiar o uso desses dados textuais na construção de indicadores objetivos de corrupção e em modelos de inferência. O framework consiste em duas etapas gerais. Na primeira, usamos métodos de aprendizagem de máquinas para classiﬁcação das irregularidades constatadas durante as auditorias. Na segunda etapa, construímos um indicador de corrupção baseado na classiﬁcação e o utilizamos em um modelo de regressão, ajustando pelo erro de medida derivado da primeira etapa. Para validar essa metodologia, nós replicamos a estratégia empírica apresentada por Ferraz et al. (2012) para estimar o efeito da corrupção em fundos educacionais nos resultados escolares de alunos do Ensino Fundamental, entre os anos de 2006-2015. Nós obtemos uma acurácia média de 92% na classiﬁcação binária de irregularidades, e nossos resultados corroboram com os encontrados em Ferraz et al.: estudantes de escolas municipais apresentam resultados signiﬁcativamente piores em testes padronizados se estudam municípios com indícios de corrupção na área de educação Machine learning Corruption Text mining Measurement error Matemática Mineração de dados (Computador) Modelagem de dados Auditoria - Processamento de dados
263	Patentes e inovação frugal em uma perspectiva contributiva / Patents and frugal innovation in a contributory perspective Mazieri, Marcos Rogério 02 December 2016 (has links) Submitted by Nadir Basilio (nadirsb@uninove.br) on 2017-04-05T19:21:13Z No. of bitstreams: 1 Marcos Rogerio Mazieri.pdf: 6159434 bytes, checksum: 3633a0169bccd42f89d3337015eb9f0e (MD5) / Made available in DSpace on 2017-04-05T19:21:13Z (GMT). No. of bitstreams: 1 Marcos Rogerio Mazieri.pdf: 6159434 bytes, checksum: 3633a0169bccd42f89d3337015eb9f0e (MD5) Previous issue date: 2016-12-02 / This current research contributes to the studies on innovation, by investigating one of its possible faces: the innovations that are develop with almost no resources, called, frugal innovation. The current research brings together conceptual, theoretical and practical aspects of Frugal Innovation, seeking to obtain enough elements to systematize the discussion of the use of patents in this context. From a management point of view, observing the conceptual, theoretical and practical aspects of Frugal Innovation carried out in environments of intense restrictions, whether natural or financial resources, seems to facilitate reflection on the use of resources, as discovered in this research. These discoveries also favor the construction of more effective structured innovation processes. These processes can contribute as a structuring guideline for the construction of new business models, processes, products, services, organizational arrangements and marketing methods. It is possible, for example, to enter companies in emerging and underdeveloped countries or, on the other hand, the valorization of products developed by modest communities or the improvement of living conditions in regions with severe resource constraints. Therefore, research encompasses intrinsic social responsibility. Using mixed methods, especially qualitative for the inductive interpretation of results, quantitative analysis and text mining techniques to carry out the multivariate analysis of the texts segments that form the patent abstracts, eleven propositions were discussing and corroborated. In addition to the methodological contributions, such as the full text analysis, it was concluding that Frugal Innovation is not a type of innovation but a response to an observable restrictive context and, therefore, can coexist with incremental, architectural, modular and radical innovations. The theoretical contributions go beyond the definition of Frugal Innovation, including the definition of semantic classes in patent contexts, demonstrating that patents can contribute to Frugal Innovation and offer some directions on how to make this contribution (Patents-Frugal Innovation) more effective. / Essa atual pesquisa contribui com os estudos sobre a inovação, por investigar uma de suas possíveis faces: as inovações que são desenvolvidas quase sem recursos, denominadas, inovação frugal. A pesquisa atual reúne aspectos conceituais, teóricos e práticos da Inovação Frugal, buscando obter elementos suficientes para sistematizar a discussão do uso das patentes nesse contexto. Do ponto de vista gerencial, observar a face conceitual, teórica e prática da Inovação Frugal realizada em ambientes de grandes restrições, sejam de recursos naturais ou financeiros, parece facilitar a reflexão sobre o uso de recursos, conforme descoberto nessa pesquisa. Tais descobertas favorecem ainda a construção de processos de Inovação estruturada mais eficazes que podem contribuir como diretriz estruturante para a construção de novos modelos de negócios, processos, produtos, serviços, arranjos organizacionais e métodos de marketing, possibilitando, por exemplo, a entrada de empresas em países emergentes e subdesenvolvidos ou por outro lado, a valorização de produtos desenvolvidos por comunidades modestas ou a melhoria das condições de vida de regiões com restrições severas de recursos. A pesquisa abarca, portanto, resposanbilidade social intrínseca. Usando métodos mistos, especialmente qualitativos para a interpretação indutiva dos resultados, análise quantitativa e técnicas de text mining para realizar a análise multivariada dos segmentos de textos que formam os abstracts das patentes, onze proposições foram discutidas e corroboradas. Além das contribuições metodológicas, como a análise de full text, concluiu-se que a Inovação Frugal não é um tipo de inovação e sim uma resposta a um contexto restritivo observável e que, portanto, pode coexistir com inovações incrementais, arquiteturais, modulares e radicais. As contribuições teóricas avançam além da definição da Inovação Frugal, incluindo a definição das classes semânticas em contextos de patentes, demonstrando que as patentes podem contribuir com a Inovação Frugal e oferecendo algumas direções de como fazer essa contribuição (Patentes-Inovação Frugal) mais efetiva. inovação frugal patentes mineração de textos Patent2Net frugal innovation patents text mining Patent2Net
264	Ett verktyg för konstruktion av ontologier från text / A Tool for Facilitating Ontology Construction from Texts Chétrit, Héloèise January 2004 (has links) With the growth of information stored over Internet, especially in the biological field, and with discoveries being made daily in this domain, scientists are faced with an overwhelming amount of articles. Reading all published articles is a tedious and time-consuming process. Therefore a way to summarise the information in the articles is needed. A solution is the derivation of an ontology representing the knowledge enclosed in the set of articles and allowing to browse through them. In this thesis we present the tool Ontolo, which allows to build an initial ontology of a domain by inserting a set of articles related to that domain in the system. The quality of the ontology construction has been tested by comparing our ontology results for keywords to the ones provided by the Gene Ontology for the same keywords. The obtained results are quite promising for a first prototype of the system as it finds many common terms on both ontologies for justa few hundred of inserted articles. Datalogi Ontology construction Text Mining Information Retrieval Bio-Informatics Bio-Ontologies Datalogi Computer Sciences Datavetenskap (datalogi)
265	Deriving Genetic Networks Using Text Mining Olsson, Elin January 2002 (has links) On the Internet an enormous amount of information is available that is represented in an unstructured form. The purpose with a text mining tool is to collect this information and present it in a more structured form. In this report text mining is used to create an algorithm that searches abstracts available from PubMed and finds specific relationships between genes that can be used to create a network. The algorithm can also be used to find information about a specific gene. The network created by Mendoza et al. (1999) was verified in all the connections but one using the algorithm. This connection contained implicit information. The results suggest that the algorithm is better at extracting information about specific genes than finding connections between genes. One advantage with the algorithm is that it can also find connections between genes and proteins and genes and other chemical substances. Text mining Genetic network Mendoza Protein network Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
266	Extraction of database and software usage patterns from the bioinformatics literature Duck, Geraint January 2015 (has links) Method forms the basis of scientific research, enabling criticism, selection and extension of current knowledge. However, methods are usually confined to the literature, where they are often difficult to find, understand, compare, or repeat. Bioinformatics and computational biology provide a rich opportunity for resource creation and discovery, with a rapidly expanding "resourceome". Many of these resources are difficult to find due to the large choice available, and there are only a limited number of sufficiently populated lists that can help inform resource selection. Text mining has enabled large scale data analysis and extraction from within the scientific literature, and as such can provide a way to help explore the vast wealth of resources available, which form the basis of bioinformatics methods. As such, this thesis aims to survey the computational biology literature, using text mining to extract database and software resource name mentions. By evaluating the common pairs and patterns of usage of these resources within such articles, an abstract approximation of the in silico methods employed within the target domain is developed. Specifically, this thesis provides an analysis of the difficulties of resource name extraction from the literature, then using this knowledge to develop bioNerDS - a rule-based system that can detect database and software name mentions within full-text documents (with a final F-score of 67%). bioNerDS is then applied to the full-text document corpus from PubMed Central, the results of which are then explored to identify the differences in resource usage between different domains (bioinformatics, biology and medicine) through time, different journals and different document sections. In particular, the well established resources (e.g., BLAST, GO and GenBank) remain pervasive throughout the domains, although they are seeing a slight decline in usage. Statistical programs see high levels of usage, with R in bioinformatics and SPSS in medicine being frequently mentioned throughout the literature. An overview of the common resource pairs has been generated by pairing database and software names which directly co-occur after one another in text. Combining and aggregating these resource pairs together across the literature enables the generation of a network of common resource patterns within computational biology, which provides an abstract representation of the common in silico methods used. For example, sequence alignment tools remain an important part of several computational biology analysis pipelines, and GO is a strong network sink (primarily used for data annotation). The networks also show the emergence of proteomics and next generation sequencing resources, and provide a specialised overview of a typical phylogenetics method. This work performs an analysis of common resource usage patterns, and thus provides an important first step towards in silico method extraction using text-mining. This should have future implications in community best practice, both for resource and method selection. 572.8
267	Classification of Stock Exchange News Kroha, Petr, Baeza-Yates, Ricardo 24 November 2004 (has links) (PDF) In this report we investigate how much similarity good news and bad news may have in context of long-terms market trends. We discuss the relation between text mining, classification, and information retrieval. We present examples that use identical set of words but have a quite different meaning, we present examples that can be interpreted in both positive or negative sense so that the decision is difficult as before reading them. Our examples prove that methods of information retrieval are not strong enough to solve problems as specified above. For searching of common properties in groups of news we had used classifiers (e.g. naive Bayes classifier) after we found that the use of diagnostic methods did not deliver reasonable results. For our experiments we have used historical data concerning the German market index DAX 30. / In diesem Bericht untersuchen wir, wieviel Ähnlichkeit gute und schlechte Nachrichten im Kontext von Langzeitmarkttrends besitzen. Wir diskutieren die Verbindungen zwischen Text Mining, Klassifikation und Information Retrieval. Wir präsentieren Beispiele, die identische Wortmengen verwenden, aber trotzdem recht unterschiedliche Bedeutungen besitzen; Beispiele, die sowohl positiv als auch negativ interpretiert werden können. Sie zeigen Probleme auf, die mit Methoden des Information Retrieval nicht gelöst werden können. Um nach Gemeinsamkeiten in Nachrichtengruppen zu suchen, verwendeten wir Klassifikatoren (z.B. Naive Bayes), nachdem wir herausgefunden hatten, dass der Einsatz von diagnostizierenden Methoden keine vernünftigen Resultate erzielte. Für unsere Experimente nutzten wir historische Daten des Deutschen Aktienindex DAX 30. ddc:330 Aktienbörse Automatische Klassifikation Bayes-Verfahren Information Retrieval Text Mining
268	中國古典詩歌對應探勘及詞彙分析工具 / Tools for Pattern Comparison and Word Analysis of Chinese Classical Poetry 黃植琨 Unknown Date (has links) 本研究以《詩經》、《楚辭》、《全唐詩》、《全宋詩》及《全宋詞》等，數位化的文本資料作為基礎，運用資訊技術，建構分析文獻間借鑒的工具。工具採用字串或詞彙比對的方式，使用者可以透過設定，過濾出可能的對應關係，特別是《全唐詩》、《全宋詩》和《全宋詞》間字面上的類似之處。本研究參考人文領域的研究，用以評估工具的效果。同時，我們也藉由資訊科學的角度，統計如唐詩和宋代詩詞間的對應關係，亦透過如《詩經》和《詩經》、《楚辭》和《楚辭》、《全唐詩》和《全唐詩》、《全宋詞》和《全宋詞》、《全宋詩》和《全宋詩》的對應關係，挖掘同一時代文人作品的對應。另外，本研究也嘗試中國古典詩歌的斷詞，以及分析詩歌中詞彙的語意，未來也希望能夠透過語意進行詩歌比對。本研究雖不如傳統方法的人文研究深入，但提供從大量的語料中去蕪存菁，以及統計等相關服務，節省人文研究分析整理文本所需的時間，用數位的力量輔助人文領域的相關研究。數位人文文本探勘詩歌 Digital humanities Text mining Poem
269	Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance Kojo, Kwofie Samuel January 2011 (has links) Philosophiae Doctor - PhD / To ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists. / South Africa Biomedical concepts Database Dictionaries Hepatitis C Virus Hepatocellular carcinoma Hypothesis generation Protein-protein interactions Text mining
270	Information extraction from pharmaceutical literature Batista-Navarro, Riza Theresa Bautista January 2014 (has links) With the constantly growing amount of biomedical literature, methods for automatically distilling information from unstructured data, collectively known as information extraction, have become indispensable. Whilst most biomedical information extraction efforts in the last decade have focussed on the identification of gene products and interactions between them, the biomedical text mining community has recently extended their scope to capture associations between biomedical and chemical entities with the aim of supporting applications in drug discovery. This thesis is the first comprehensive study focussing on information extraction from pharmaceutical chemistry literature. In this research, we describe our work on (1) recognising names of chemical compounds and drugs, facilitated by the incorporation of domain knowledge; (2) exploring different coreference resolution paradigms in order to recognise co-referring expressions given a full-text article; and (3) defining drug-target interactions as events and distilling them from pharmaceutical chemistry literature using event extraction methods. 005.74

Search results