371 |
Concept Based Knowledge Discovery from Biomedical Literature.Radovanovic, Aleksandar. January 2009 (has links)
<p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology  / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p>
|
372 |
The development of a single nucleotide polymorphism database for forensic identification of specified physical traitsAlecia Geraldine Naidu January 2009 (has links)
<p>Many Single Nucleotide Polymorphisms (SNPs) found in coding or regulatory regions within the human genome lead to phenotypic differences that make prediction of physical appearance, based on genetic analysis, potentially useful in forensic investigations. Complex traits such as pigmentation can be predicted from the genome sequence, provided that genes with strong effects on the trait exist and are known. Phenotypic traits may also be associated with variations in gene expression due to the presence of SNPs in promoter regions. In this project, the identification of genes associated with these physical traits of potential forensic relevance have been collated from the literature using a text mining platform and hand curation. The SNPs associated with these genes have been acquired from public SNP repositories such as the International HapMap project, dbSNP and Ensembl. Characterization of different population groups based on the SNPs has been performed and the results and data stored in a MySQL database. This database contains SNP genotyping data with respect to physical phenotypic differences of forensic interest. The potential forensicrelevance of the SNP information contained in this database has been verified through in silico SNP analysis aimed at establishing possible relationships between SNP occurrence and phenotype. The software used for this analysis is MATCH&trade / .</p>
|
373 |
Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevanceKojo, Kwofie Samuel January 2011 (has links)
<p>To ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via  / http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based  / approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the  / Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific  / exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to  / computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can  / retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously  / reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles.  / Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via &ldquo / string searches&rdquo / . The utility of HCVpro  / has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin.  / Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their  / diagnostic potential. The data archived in HCVpro can be  / utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.  / </p>
|
374 |
內控缺失與財務報導一致性之關聯性 / The Relationship between Internal Control Weakness and the Financial Reporting Consistency許正昇 Unknown Date (has links)
本研究使用TFIDF文字探勘技術分析樣本公司年度財務報告裏的管理階層討論與分析(Management’s Discussion & Analysis of Financial Condition and Results of Operations,以下簡稱MD&A)與財務資訊,欲探討公司內部控制有效性對於MD&A資訊與財務資訊一致性之影響。本研究樣本自2002年至2014年美國上市櫃公司之年報中選取,研究結果顯示,當內部控制出現重大缺失,會對企業財務報導一致性產生顯著影響,內部控制具備有效性,其財務資訊與MD&A文字性資訊所揭露之訊息較為一致。 / The major purpose of this study is to examine the relationship between internal control weakness and the financial reporting consistency. I use TFIDF text mining technology analysis the Management's Discussion & Analysis of Financial Condition and Results of Operations (MD&A) and financial information. All annual report of the US-listed companies from 2002 to 2014 are collected as data samples. As anticipated, we find that internal control weakness is negatively correlated to the financial reporting consistency. Companies with no internal control weakness present more consistent MD&A information comparing to their financial information.
|
375 |
基於文件相似度的標籤推薦-應用於問答型網站 / Applying Tag Recommendation base on Document Similarity in Question and Answer Website葉早彬, Tsao, Pin Yeh Unknown Date (has links)
隨著人們習慣的改變,從網路上獲取新知漸漸取代傳統媒體,這也延伸產生許多新的行為。社群標籤是近幾年流行的一種透過使用者標記來分類與詮釋資訊的方式,相較於傳統分類學要求物件被分類到預先定義好的類別,社群標籤則沒有這樣的要求,因此容易因應內容的變動做出調整。
問答型網站是近年來興起的一種個開放性的知識分享平台,例如quora、Stack Overflow、yahoo 奇摩知識+,使用者可以在平台上與網友做問答的互動,在問與答的討論中,結合大眾的經驗與專長,幫助使用者找到滿意的答案,使用單純的問答系統的好處是可以不必在不同且以分類為主的論壇花費時間尋找答案,和在關鍵字搜索中的結果花費時間尋找答案。
本研究希望能針對問答型網站的文件做自動標籤分類,運用標籤推薦技術來幫助使用者能夠更有效率的找到需要的問題,也讓問答平台可以把這些由使用者所產生的大量問題分群歸類。
在研究過程蒐集Stack Exchange問答網站共20638個問題,使用naïve Bayes演算法與文件相似度計算的方式,進行標籤推薦,推薦適合的標籤給新進文件。在研究結果中,推薦標籤的準確率有64.2%
本研究希望透過自動分類標籤,有效地分類問題。幫助使用者有效率的找到需要的問題,也能把這些由使用者所產生的大量問題分群歸類。 / With User's behavior change. User access to new knowledge from the internet instead of from the traditional media. This Change leads to a lot new behavior. Social tagging is popular in recent years through a user tag to classify and annotate information. Unlike traditional taxonomy requiring items are classified into predefined categories, Social tagging is more elastic to adjust through the content change.
Q & A Website is the rise in recent years. Like Quora , Stack Overflow , yahoo Knowledge plus. User can interact with other people form this platform , in Q & A discussion, with People's experience and expertise to help the user find a satisfactory answer.
This study hopes to build a tag recommendation system for Q & A Website. The recommendation system can help people find the right problem efficiently , and let Q & A platform can put these numerous problems into the right place.
We collect 20,638 questions from Stack Exchange. Use naïve Bayes algorithm and document similarity calculation to recommend tag for the new document. The result of the evaluation show we can effectively recommend relevant tags for the new question.
|
376 |
Finding early signals of emerging trends in text through topic modeling and anomaly detectionRedyuk, Sergey January 2018 (has links)
Trend prediction has become an extremely popular practice in many industrial sectors and academia. It is beneficial for strategic planning and decision making, and facilitates exploring new research directions that are not yet matured. To anticipate future trends in academic environment, a researcher needs to analyze an extensive amount of literature and scientific publications, and gain expertise in the particular research domain. This approach is time-consuming and extremely complicated due to abundance of data and its diversity. Modern machine learning tools, on the other hand, are capable of processing tremendous volumes of data, reaching the real-time human-level performance for various applications. Achieving high performance in unsupervised prediction of emerging trends in text can indicate promising directions for future research and potentially lead to breakthrough discoveries in any field of science. This thesis addresses the problem of emerging trend prediction in text in two main steps: it utilizes HDP topic model to represent latent topic space of a given temporal collection of documents, DBSCAN clustering algorithm to detect groups with high-density regions in the document space potentially leading to emerging trends, and applies KLdivergence in order to capture deviating text which might indicate birth of a new not-yet-seen phenomenon. In order to empirically evaluate the effectiveness of the proposed framework and estimate its predictive capability, both synthetically generated corpora and real-world text collections from arXiv.org, an open-access electronic archive of scientific publications (category: Computer Science), and NIPS publications are used. For synthetic data, a text generator is designed which provides ground truth to evaluate the performance of anomaly detection algorithms. This work contributes to the body of knowledge in the area of emerging trend prediction in several ways. First of all, the method of incorporating topic modeling and anomaly detection algorithms for emerging trend prediction is a novel approach and highlights new perspectives in the subject area. Secondly, the three-level word-document-topic topology of anomalies is formalized in order to detect anomalies in temporal text collections which might lead to emerging trends. Finally, a framework for unsupervised detection of early signals of emerging trends in text is designed. The framework captures new vocabulary, documents with deviating word/topic distribution, and drifts in latent topic space as three main indicators of a novel phenomenon to occur, in accordance with the three-level topology of anomalies. The framework is not limited by particular sources of data and can be applied to any temporal text collections in combination with any online methods for soft clustering.
|
377 |
Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text VisualizationShokat, Imran January 2018 (has links)
Text visualization is a field dedicated to the visual representation of textual data by using computer technology. A large number of visualization techniques are available, and now it is becoming harder for researchers and practitioners to choose an optimal technique for a particular task among the existing techniques. To overcome this problem, the ISOVIS Group developed an interactive survey browser for text visualization techniques. ISOVIS researchers gathered papers which describe text visualization techniques or tools and categorized them according to a taxonomy. Several categories were manually assigned to each visualization technique. In this thesis, we aim to analyze the dataset of this browser. We carried out several analyses to find temporal trends and correlations of the categories present in the browser dataset. In addition, a comparison of these categories with a computational approach has been made. Our results show that some categories became more popular than before whereas others have declined in popularity. The cases of positive and negative correlation between various categories have been found and analyzed. Comparison between manually labeled datasets and results of computational text analyses were presented to the experts with an opportunity to refine the dataset. Data which is analyzed in this thesis project is specific to text visualization field, however, methods that are used in the analyses can be generalized for applications to other datasets of scientific literature surveys or, more generally, other manually curated collections of textual documents.
|
378 |
Controle informatizado de fluxo e hist?rico de trabalho de conclus?o de cursoSantos, Jary Alves dos 28 April 2017 (has links)
Submitted by Raniere Barreto (raniere.barros@ufvjm.edu.br) on 2018-08-17T18:26:35Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
jary_alves_santos.pdf: 1966576 bytes, checksum: db737a341539261dd88dd5436a57f2ac (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-10-05T19:36:59Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
jary_alves_santos.pdf: 1966576 bytes, checksum: db737a341539261dd88dd5436a57f2ac (MD5) / Made available in DSpace on 2018-10-05T19:36:59Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
jary_alves_santos.pdf: 1966576 bytes, checksum: db737a341539261dd88dd5436a57f2ac (MD5)
Previous issue date: 2017 / A elabora??o de artigos ou trabalhos cient?ficos ? uma constante no cotidiano de um estudante de gradua??o. Uma das principais produ??es cient?ficas ? o Trabalho de Conclus?o de Curso (TCC), o qual em alguns casos, figura como requisito parcial ? obten??o do t?tulo de gradua??o. A viv?ncia no ambiente acad?mico mostra que, na ocasi?o de elabora??o de TCC, alguns erros podem ser cometidos e, nesse processo, h? v?rias nuances que devem ser observadas. O desenvolvimento de um sistema Web de acompanhamento, hist?rico e controle de fluxo da produ??o pode representar um excelente recurso de aux?lio na rela??o de cumplicidade que h? entre orientadores e orientandos. Esta disserta??o se prop?e a desenvolver um sistema Web para essa finalidade, denominado ?Academic DUX?. Este ? um Agente Inteligente de Software que realiza verifica??es pr?vias de trabalho, compara??o entre vers?es, coment?rios e acompanhamento dos mesmos atrav?s de uma linha de tempo. A automatiza??o do processo de acompanhamento dos trabalhos, utilizando-se metodologia relacionada ? Minera??o de Textos, proporciona o aprendizado atrav?s dos acertos e erros dos orientandos, bem como as principais orienta??es dos professores. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Tecnologia, Sa?de e Sociedade, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / The elaboration of articles or scientific works is a constant in the daily life of a undergraduate student. One the main scientific productions the Course Competition Assignment (CCA) which, in some cases, is a partial requirement to obtain the degree. The experience in the academic environment shows that at the time of writing CCA some errors can be committed, and in this process there are several nuances that must be observed. The development of a Web monitoring, historical and production flow control system can be an excellent aid resource in the relationship of complicity between supervisors and undergraduates. This dissertation proposes to develop a Web system for this purpose, called "Academic DUX". This is a Intelligent Agents Software that performs pre-work checks, comparison of versions, comments, and tracking them through a timeline. The automation of the monitoring works process, using a methodology related to the Texting Mining, provides learning through the correct answers and errors of the undergraduate as well as the main teacher?s orientations.
|
379 |
Um modelo classificador da lista de e-mail do Projeto Apache que combina dicionário neurolinguístico com ontologia / A classifier model from the e-mail list of Apache Project that combines neurolinguistic dictionary with ontologyFarias, Mário André de Freitas 23 December 2011 (has links)
Electronic mailing lists and discussion groups are normally used by programmers to discuss and improve tasks to be performed during software projects development. Open Source Software (OSS) projects use this lists as the primary tool for collaboration and cooperation. In project like that, normally, the developers are around the world. Thus, means of interaction and communication are needed to ensure collaboration between them, as well as efficiency in the construction and maintenance of projects this size. Mailing lists can be an important data source to discovery information useful about patterns of behavior of developer aimed at project manager. The Neurominer is a text mining tool that determines the Preferred Representational System (PRS) of software developers in a specific context. The tool has a new approach which is a combination between the Neuro-Linguistic Programming NLP theory, text mining and statistic technique. In this context, we propose the extension of this tool by applying of techniques of ontology to dictionary, allowing the combination of sensory predicates with software engineering terms, providing a greater power in the context of the dictionary. This way, the text mining matched with NLP theory and ontology appears as natural candidate that consists a solution that aiming to improve the mining of textual information through mailing lists, in order to support software project managers in making decision. This matching showed significant outcomes, proposing a efficient and effective solution. / Listas de e-mails e grupos de discussão são normalmente usados por programadores para discutir e aperfeiçoar tarefas a serem executadas durante as fases de desenvolvimento de projetos de software. Projetos de softwares Open Source utilizam essas listas como uma ferramenta primária para a colaboração e cooperação. Em projetos dessa natureza, normalmente, os desenvolvedores estão em todas as partes do mundo. Desta forma, meios de interação e comunicação são necessários para garantir a colaboração entre os mesmos, bem como a eficácia no processo de construção e manutenção de projetos desse porte. Listas de e-mails podem ser uma importante fonte de dados para a descoberta de informações úteis acerca de padrões de comportamento de desenvolvedores para gerentes de projetos. O Neurominer é uma ferramenta de mineração de texto que determina o Sistema de Representação Preferencial de desenvolvedores de software em um contexto específico. A ferramenta apresenta como inovação a utilização da teoria da Programação Neurolinguística - PNL combinada com técnicas de mineração e estatística. Nesse contexto, é proposta a extensão dessa ferramenta através da aplicação de técnicas de ontologia ao seu dicionário, permitindo a combinação de predicados sensoriais a termos da engenharia de software, proporcionando um poder maior de contextualização ao seu dicionário. Sob esse prisma, a mineração de texto combinada com técnicas de PNL e ontologia surge como candidata natural para compor uma solução que objetiva melhorar a garimpagem de informações textuais, através de listas de discussões, com o propósito de apoiar gerentes de projetos de softwares na tomada de decisão. Essa combinação conduziu a resultados bastante significativos, propondo uma solução eficiente e eficaz.
|
380 |
Representações textuais e a geração de hubs : um estudo comparativoAguiar, Raul Freire January 2017 (has links)
Orientador: Prof. Dr. Ronaldo Pratti / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2017. / O efeito de hubness, juntamente com a maldição de dimensionalidade, vem sendo estudado, sob diferentes oticas, nos ultimos anos. Os estudos apontam que este problema esta presente em varios conjuntos de dados do mundo real e que a presença de hubs (tendencia de alguns exemplos aparecem com frequencia na lista de vizinhos mais proximos de outros exemplos) traz uma serie de consequencias indesejaveis, como por exemplo, afetar o desempenho de classificadores. Em tarefas de mineração de texto, o problema depende tambem da maneira escolhida pra representar os documentos. Sendo assim o objetivo principal dessa dissertação é avaliar o impacto da formação de hubs em diferentes representações textuais. Ate onde vai o nosso conhecimento e durante o período desta pesquisa,
não foi posivel encontrar na literatura um estudo aprofundado sobre as implicaçõess do efeito de hubness em diferentes representações textuais. Os resultados sugerem que as diferentes representações textuais implicam em corpus com propensão menor para a formação de hubs. Notou-se também que a incidencia de hubs nas diferentes representações textuais possuem in
uencia similar em alguns classificadores. Analisamos tambem o desempenho dos classifcadores apos a remoção de documentos sinalizados como hubs em porçõess pre-estabelecidas do tamanho total do data set. Essa remoção trouxe, a alguns algoritmos, uma tendencia de melhoria de desempenho. Dessa maneira, apesar de nem sempre efetiva, a estrategia de identifcar e remover hubs com uma vizinhança majoritariamente
ruim pode ser uma interessante tecnica de pre-processamento a ser considerada, com o intuito de melhorar o desempenho preditivo da tarefa de classificação. / The hubness phenomenon, associated to the curse of dimensionality, has been studied, from diferent perspectives, in recent years. These studies point out that the hubness problem is present in several real-world data sets and, as a consequence, the hubness implies a series of undesirable side efects, such as an increase in misclassifcation error in classification tasks. In text mining research, this problem also depends on the choice of text representation. Hence, the main objective of the dissertation is to evaluate the impact of the hubs presence in diferent textual representations. To the best of our knowledge, this is the first study that performs an in-depth analysis on the efects of the hub problem in diferent textual representations. The results suggest that diferent text representations
implies in diferent bias towards hubs presence in diferent corpus. It was also noticed that the presence of hubs in dierent text representations has similar in
uence for some classifiers. We also analyzed the performance of classifiers after removing documents
agged as hubs in pre-established portions of the total data set size. This removal allows, to some algorithms, a trend of improvement in performance. Thus, although not always efective, the strategy of identifying and removing hubs with a majority of bad neighborhood may be an interesting preprocessing technique to be considered in order to improve the predictive performance of the text classification task.
|
Page generated in 0.035 seconds