31 |
Dotazovací jazyk pro databáze biologických dat / Query Language for Biological DatabasesBahurek, Tomáš January 2015 (has links)
With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
|
32 |
Family-Wise Error Rate Control in Quantitative Trait Loci (QTL) Mapping and Gene Ontology Graphs with Remarks on Family SelectionSaunders, Garrett 01 May 2014 (has links)
One of the great aims of statistics, the science of collecting, analyzing, and interpreting data, is to protect against the probability of falsely rejecting an accepted claim, or hypothesis, given observed data stemming from some experiment. This is generally known as protecting against a Type I Error, or controlling the Type I Error rate. The extension of this protection against Type I Errors to the situation where thousands upon thousands of hypotheses are examined simultaneously is known as multiple hypothesis testing. This dissertation presents an improvement to an existing multiple hypothesis testing approach, the Focus Level method, specific to gene set testing (a branch of genomics) on Gene Ontology graphs. This improvement resolves a long standing computational difficulty of the Focus Level method, providing more than a 15.000-fold increase in computational efficiency. This dissertation also presents a solution to a multiple testing problem in genetics where a specific approach to mapping genes underlying quantitative traits of interest requires a multiplicity adjustment approach that both corrects for the number of tests while also ensuring logical consistency. The power advantage of the solution is demonstrated over the current standard approach to the problem. A side issue of this model framework led to the development of a new bivariate approach to quantitative trait marker detection, which is presented herein. The overall contribution of this dissertation to the statistics literature is that it provides novel solutions that meet real needs of practitioners in genetics and genomics with the aim of ensuring both that truth is discovered and that discoveries are actually true.
|
33 |
Gene Ontology-Guided Force-Directed Visualization of Protein Interaction NetworksKing, James Lowell 01 January 2019 (has links)
Protein interaction data is being generated at unprecedented rates thanks to advancements made in high throughput techniques such as mass spectrometry and DNA microarrays. Biomedical researchers, operating under budgetary constraints, have found it difficult to scale their efforts to keep up with the ever-increasing amount of available data. They often lack the resources and manpower required to analyze the data using existing methodologies. These research deficiencies impede our ability to understand diseases, delay the advancement of clinical therapeutics, and ultimately costs lives.
One of the most commonly used techniques to analyze protein interaction data is the construction and visualization of protein interaction networks. This research investigated the effectiveness and efficiency of novel domain-specific algorithms for visualizing protein interaction networks. The existing domain-agnostic algorithms were compared to the novel algorithms using several performance, aesthetic, and biological relevance metrics. The graph drawing algorithms proposed here introduced novel domain-specific forces to the existing force-directed graph drawing algorithms. The innovations include an attractive force and graph coarsening policy based on semantic similarity, and a novel graph refinement algorithm.
These experiments have demonstrated that the novel graph drawing algorithms consistently produce more biologically meaningful layouts than the existing methods. Aggregated over the 480 tests performed, and quantified using the Biological Evaluation Percentage metric defined in the Methodology chapter, the novel graph drawing algorithms created layouts that are 237 percent more biologically meaningful than the next best algorithm. This improvement came at the cost of additional edge crossings and smaller minimum angles between adjacent edges, both of which are undesirable aesthetics. The aesthetic and performance tradeoffs are experimentally quantified in this study, and dozens of algorithmically generated graph drawings are presented to visually illustrate the benefits of the novel algorithms. The graph drawing algorithms proposed in this study will help biomedical researchers to more efficiently produce high quality interactive protein interaction network drawings for improved discovery and communication.
|
34 |
Systems Biology Modeling of Bovine Fertility using ProteomicsPeddinti, Divya swetha 30 April 2011 (has links)
Beef and milk production industries represent the largest agricultural industries in the United States with a retail equivalent value of approximately $112 billion (USDA, 2008). Infertility is the major problem for mammalian reproduction. In the United States approximately 66% of cows are bred by Artificial Insemination (AI), but only ~50% of these inseminations result in successful pregnancies. Infertility can occur either from male factor (spermatozoon) or female factor (oocyte) and male contributes approximately 40% of cases. Infertility costs the producer approximately $5 per exposed cow for every 1% reduction in pregnancy rate. In spite of its millions of dollars in economic impact, the precise molecular events/mechanisms that determine the fertilizing potential of an oocyte and spermatozoon are not well defined. The thesis of my doctoral dissertation is that proteomics-based “systems biology” modeling of bovine oocyte and spermatozoon can facilitate rapid understanding of fertility. To test this thesis, I needed to first identify the proteins associated with bovine oocyte and its associated cumulus cells, and spermatozoon. The next step was functional annotation of the experimentally confirmed proteins to identify the major functions associated with the oocyte, cumulus cells and spermatozoon, and finally, generate a proteomics based systems biology model of bovine oocyte and cumulus cell communication and male fertility. The results of my dissertation established the methods that provide afoundation for high-throughput proteomics approaches of bovine oocyte and cumuluscell biology and allowed me to model the intricate cross communication between oocyte and cumulus cells using systems biology approaches. Proteomics based systems biology modeling of oocytes and cumulus cells identified the signaling pathways and proteins associated with this communication that may have implications in oocyte maturation. In addition, systems biology modeling of differential spermatozoa proteomes from bulls of varying fertility rates enabled the identification of putative molecular markers and key pathways associated with male fertility. The ultimate positive impact of these results is to facilitate the field of biomedical research with useful information for comparative biology, better understanding of bovine oocyte and spermatozoon development, infertility, biomarker discovery, and eventually development of therapies to treat infertility in bovine as well as humans.
|
35 |
Novel Algorithms for Cross-Ontology Multi-Level Data MiningManda, Prashanti 15 December 2012 (has links)
The wide spread use of ontologies in many scientific areas creates a wealth of ontologyannotated data and necessitates the development of ontology-based data mining algorithms. We have developed generalization and mining algorithms for discovering cross-ontology relationships via ontology-based data mining. We present new interestingness measures to evaluate the discovered cross-ontology relationships. The methods presented in this dissertation employ generalization as an ontology traversal technique for the discovery of interesting and informative relationships at multiple levels of abstraction between concepts from different ontologies. The generalization algorithms combine ontological annotations with the structure and semantics of the ontologies themselves to discover interesting crossontology relationships. The first algorithm uses the depth of ontological concepts as a guide for generalization. The ontology annotations are translated to higher levels of abstraction one level at a time accompanied by incremental association rule mining. The second algorithm conducts a generalization of ontology terms to all their ancestors via transitive ontology relations and then mines cross-ontology multi-level association rules from the generalized transactions. Our interestingness measures use implicit knowledge conveyed by the relation semantics of the ontologies to capture the usefulness of cross-ontology relationships. We describe the use of information theoretic metrics to capture the interestingness of cross-ontology relationships and the specificity of ontology terms with respect to an annotation dataset. Our generalization and data mining agorithms are applied to the Gene Ontology and the postnatal Mouse Anatomy Ontology. The results presented in this work demonstrate that our generalization algorithms and interestingness measures discover more interesting and better quality relationships than approaches that do not use generalization. Our algorithms can be used by researchers and ontology developers to discover inter-ontology connections. Additionally, the cross-ontology relationships discovered using our algorithms can be used by researchers to understand different aspects of entities that interest them.
|
36 |
Protein Function Prediction Using Decision Tree TechniqueYedida, Venkata Rama Kumar Swamy 02 September 2008 (has links)
No description available.
|
37 |
Analysis of Gene Expression Data for Gene Ontology Based Protein Function PredictionMacholan, Robert Daniel 13 May 2011 (has links)
No description available.
|
38 |
A Method for Integrating Heterogeneous Datasets based on GO Term SimilarityThanthiriwatte, Chamali Lankara 11 December 2009 (has links)
This thesis presents a method for integrating heterogeneous gene/protein datasets at the functional level based on Gene Ontology term similarity. Often biologists want to integrate heterogeneous data sets obtain from different biological samples. A major challenge in this process is how to link the heterogeneous datasets. Currently, the most common approach is to link them through common reference database identifiers which tend to result in small number of matching identifiers. This is due to lack of standard accession schemes. Due to this problem, biologists may not recognize the underlying biological phenomena revealed by a combination of the data but by each data set individually. We discuss an approach for integrating heterogeneous datasets by computing the similarity among them based on the similarity of their GO annotations. Then we group the genes and/or proteins with similar annotations by applying a hierarchical clustering algorithm. The results demonstrate a more comprehensive understanding of the biological processes involved.
|
39 |
Concept Lattice Analysis for Annotation ObjectsYi, Wenting 02 September 2009 (has links)
No description available.
|
40 |
De novo genome-scale prediction of protein-protein interaction networks using ontology-based background knowledgeNiu, Kexin 18 July 2022 (has links)
Proteins and their function play one of the most essential roles in various biological processes. The study of PPI is of considerable importance. PPI network data are of great scientific value, however, they are incomplete and experimental identification is time and money consuming. Available computational methods perform well on model organisms’ PPI prediction but perform poorly for a novel organism. Due to the incompleteness of interaction data, it is challenging to train a model for a novel organism. Also, millions to billions of interactions need to be verified which is extremely compute-intensive.
We aim to improve the performance of predicting whether a pair of proteins will interact, with only two sequences as input. And also efficiently predict a PPI network with a proteome of sequences as input.
We hypothesize that information about cellular locations where proteins are
active and proteins' 3D structures can help us to significantly improve predict performance.
To overcome the lack of experimental data, we use predicted structures by AlphaFold2 and cellular locations by DeepGoPlus.
We believe that proteins belonging to disjoint biological components have very little chance to interact. We manually choose several disjoint pairs and further confirmed it by experimental PPI.
We generate new no-interaction pairs with disjoint classes to update the D-SCRIPT dataset. As result, the AUPR has improved by 10% compared to the D-SCRIPT dataset. Besides, we pre-filter the negatives instead of enumerating all the potential PPI for de-novo PPI network prediction. For E.coli, we can pass around a million negative interactions.
To combine the structure and sequence information, we generate a graph for each protein. A graph convolution network using Self-Attention Graph Pooling in Siamese architecture is used to learn these graphs for PPI prediction. In this way, we can improve around 20% in AUPR compared to our baseline model D-SCRIPT.
|
Page generated in 0.0474 seconds