• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 6
  • 6
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Finding Combinatorial Connections between Concepts in the Biomedical Literature

Gresock, Joseph Aaron 11 May 2007 (has links)
There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and entire processes. Each article investigates particular subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must computationally integrate information across multiple publications. This is especially important in problems such as modeling cross-talk in signaling networks, designing drug therapies for combinatorial selectivity, and unraveling the role of gene interactions in deleterious phenotypes, where the cost of performing combinatorial screens is exorbitant. In this thesis, we present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for unraveling combinatorial relationships. It involves the systematic application of a `storytelling' algorithm followed by a series of filtering and compression operations over the mined stories. Given a start and end publication, typically with little or no overlap in content, storytelling identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. Stories discovered thus provide an argued approach to relate distant concepts through compositions of related concepts. The chains of links employed by stories are then mined to find frequently reused sub-stories, which can be compressed to yield compact templates of connections. We demonstrate a successful application of storytelling to finding combinatorial connections between biological concepts using two application case studies. / Master of Science
2

Automating the gathering of relevant information from biomedical text

Canevet, Catherine January 2009 (has links)
More and more, database curators rely on literature-mining techniques to help them gather and make use of the knowledge encoded in text documents. This thesis investigates how an assisted annotation process can help and explores the hypothesis that it is only with respect to full-text publications that a system can tell relevant and irrelevant facts apart by studying their frequency. A semi-automatic annotation process was developed for a particular database - the Nuclear Protein Database (NPD), based on a set of full-text articles newly annotated with regards to subnuclear protein localisation, along with eight lexicons. The annotation process is carried out online, retrieving relevant documents (abstracts and full-text papers) and highlighting sentences of interest in them. The process also offers a summary Table of the facts found clustered by type of information. Each method involved in each step of the tool is evaluated using cross-validation results on the training data as well as test set results. The performance of the final tool, called the “NPD Curator System Interface”, is estimated empirically in an experiment where the NPD curator updates the database with pieces of information found relevant in 31 publications using the interface. A final experiment complements our main methodology by showing its extensibility to retrieving information on protein function rather than localisation. I argue that the general methods, the results they produced and the discussions they engendered are useful for any subsequent attempt to generate semi-automatic database annotation processes. The annotated corpora, gazetteers, methods and tool are fully available on request of the author (catherine.canevet@bbsrc.ac.uk).
3

High Specificity Literature Mining Method Based on Microarray Expression Profile for Discovering Hidden Connections among Diseases, Genes, and Drugs

Wu, Jain-Shing 05 September 2011 (has links)
In recent years, with the microarray technique widely adopted, a large amount of biomedical literatures are published to provide a lot of useful information. However, some relationships among disease, genes and drug are still to be explored, since the authors only focus on part of the significant genes to the disease or the significant genes to the drug but not connect them to obtain new relationships. There are several methods proposed for finding out the hidden relationships, however many of them requires manual involvements. The main objective of this dissertation is to discover the hidden connections between human diseases and genes and the connections between drugs and the same genes. In order achieve this goal, the intermediate nodes (signification genes) must be found first. When a gene has more significant difference in observed group (abnormal patients) than in control group (normal persons), this gene is called significant genes to the disease. These signification genes often play a crucial role in cancer diagnosis and treatment. Via classifying the microarray gene expression data to find these significant genes, doctors can obtain the feasible and appropriate information for treatments that can give to the patients according to their cancer symptoms. A variety of existing classifiers have been proposed for this problem. However, most of them often work inefficiently when attributes grow up over thousands. To further improve the accuracy and the speed of the existing classifiers, a novel microarray attribute reduction scheme (MARS) is proposed for selecting significant genes to the disease. Experimental results demonstrate that combining the proposed scheme with multiclass support vector machine (MCSVM) obtains better performance than other different gene selection methods with the same MCSVM. In addition, the proposed scheme with MCSVM performs better than the results listed in the existing literature.. Furthermore, 19 of 22 genes selected by the proposed scheme in acute lymphoblastic leukemia and acute myeloid leukemia (AML-ALL) dataset are related to the AML and ALL diseases that have been reported in the literatures. Thus the proposed scheme not only can significantly reduce large amount of attributes (genes) for gene expression classification problem, but also increase the classification accuracy. MARS finds related gene set according to a threshold determined by using receiver operating characteristic (ROC) curve. However, it requires repeating the experiment many times to determine the best threshold. Hence, we propose a novel disease-oriented feature selection algorithm (DOFA) to improve MARS. DOFA uses the Genetic Algorithm (GA) in the selection method for automatic picking up the related genes and Support Vector Machine (SVM) and K-nearest-neighborhood (KNN) as the classifier. DOFA is tested on picking up related genes for AML-ALL and Colon datasets. For AML-ALL and Colon datasets, it selects 21 genes and 25 genes, respectively. Based on the literatures, it shows that 20 of 21 genes are related to the disease or cancers related for AML-ALL dataset and one of these genes is still uncertain. And 20 of 25 genes are directly related to the disease colon cancer or cancers related and 5 of these genes are still uncertain. Three more experiments are conducted to verify the discriminability of the genes selected by DOFA. Experimental results all indicate that DOFA obtains better performance than other competing methods. Thus DOFA not only can select the genes related to the diseases, but also increase the classification accuracy. After obtaining the significant gene group, we can further use these genes to obtain the hidden connections. We propose a high specificity literature mining method based on microarray expression profile for discovering hidden connections among disease, drug, and genes. The proposed method can automatically select related genes from the disease or drug microarray expression profiles, and use the disease names or the drug names and gene names or aliases of the selected genes to obtain the related abstract collections. An alias expansion scheme and a weight function are used to eliminate the unrelated literatures. We perform three scenarios to verify the proposed method. Experimental results show that using the proposed method can obtain the hidden connections among diseases, genes and drugs. The (ROC) curve shows that the proposed method can not only find the hidden connections between diseases and drugs but also have high specificity. Concluding this dissertation, our goal is to discover the hidden connections between the diseases and the drugs. In order to achieve this goal, we first proposed MARS to select the significant genes to the diseases. And then, we proposed DOFA to improve the ability of MARS. We proposed a high specificity literature mining method based on microarray expression profile for discovering the hidden connections among diseases, genes, and drugs. The proposed method combines the power of searching significant genes to the disease of DOFA to further obtain the hidden connections. Experimental results show that the proposed method not only can obtain the hidden connections among diseases, genes, and drugs, but also has high specificity.
4

Problems of textual transmission in early German books on mining "Der Ursprung Gemeynner Berckrecht" and the Norwegian "Bergkordnung" /

Connolly, David E., January 2005 (has links)
Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Includes bibliographical references (p. 663-677).
5

A multi-layered approach to information extraction from tables in biomedical documents

Milosevic, Nikola January 2018 (has links)
The quantity of literature in the biomedical domain is growing exponentially. It is becoming impossible for researchers to cope with this ever-increasing amount of information. Text mining provides methods that can improve access to information of interest through information retrieval, information extraction and question answering. However, most of these systems focus on information presented in main body of text while ignoring other parts of the document such as tables and figures. Tables present a potentially important component of research presentation, as authors often include more detailed information in tables than in textual sections of a document. Tables allow presentation of large amounts of information in relatively limited space, due to their structural flexibility and ability to present multi-dimensional information. Table processing encapsulates specific challenges that table mining systems need to take into account. Challenges include a variety of visual and semantic structures in tables, variety of information presentation formats, and dense content in table cells. The work presented in this thesis examines a multi-layered approach to information extraction from tables in biomedical documents. In this thesis we propose a representation model of tables and a method for table structure disentangling and information extraction. The model describes table structures and how they are read. We propose a method for information extraction that consists of: (1) table detection, (2) functional analysis, (3) structural analysis, (4) semantic tagging, (5) pragmatic analysis, (6) cell selection and (7) syntactic processing and extraction. In order to validate our approach, show its potential and identify remaining challenges, we applied our methodology to two case studies. The aim of the first case study was to extract baseline characteristics of clinical trials (number of patients, age, gender distribution, etc.) from tables. The second case study explored how the methodology can be applied to relationship extraction, examining extraction of drug-drug interactions. Our method performed functional analysis with a precision score of 0.9425, recall score of 0.9428 and F1-score of 0.9426. Relationships between cells were recognized with a precision of 0.9238, recall of 0.9744 and F1-score of 0.9484. The information extraction methodology performance is the state-of-the-art in table information extraction recording an F1-score range of 0.82-0.93 for demographic data, adverse event and drug-drug interaction extraction, depending on the complexity of the task and available semantic resources. Presented methodology demonstrated that information can be efficiently extracted from tables in biomedical literature. Information extraction from tables can be important for enhancing data curation, information retrieval, question answering and decision support systems with additional information from tables that cannot be found in the other parts of the document.
6

Computational biology approaches in drug repurposing and gene essentiality screening

Philips, Santosh 20 June 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The rapid innovations in biotechnology have led to an exponential growth of data and electronically accessible scientific literature. In this enormous scientific data, knowledge can be exploited, and novel discoveries can be made. In my dissertation, I have focused on the novel molecular mechanism and therapeutic discoveries from big data for complex diseases. It is very evident today that complex diseases have many factors including genetics and environmental effects. The discovery of these factors is challenging and critical in personalized medicine. The increasing cost and time to develop new drugs poses a new challenge in effectively treating complex diseases. In this dissertation, we want to demonstrate that the use of existing data and literature as a potential resource for discovering novel therapies and in repositioning existing drugs. The key to identifying novel knowledge is in integrating information from decades of research across the different scientific disciplines to uncover interactions that are not explicitly stated. This puts critical information at the fingertips of researchers and clinicians who can take advantage of this newly acquired knowledge to make informed decisions. This dissertation utilizes computational biology methods to identify and integrate existing scientific data and literature resources in the discovery of novel molecular targets and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was able to identify a potential novel anti-estrogenic indication for the commonly used urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in the laboratory. In chapter 3 of my dissertation, through the use of a manually curated corpus and machine learning algorithms, I identified and extracted genes that are essential for cell survival. These results brighten the reality that novel knowledge with potential clinical applications can be discovered from existing data and literature by integrating information across various scientific disciplines.

Page generated in 0.0758 seconds