21 |
Using semantic similarity measures across Gene Ontology to predict protein-protein interactionsHelgadóttir, Hanna Sigrún January 2005 (has links)
Living cells are controlled by proteins and genes that interact through complex molecular pathways to achieve a specific function. Therefore, determination of protein-protein interaction is fundamental for the understanding of the cell’s lifecycle and functions. The function of a protein is also largely determined by its interactions with other proteins. The amount of protein-protein interaction data available has multiplied by the emergence of large-scale technologies for detecting them, but the drawback of such measures is the relatively high amount of noise present in the data. It is time consuming to experimentally determine protein-protein interactions and therefore the aim of this project is to create a computational method that predicts interactions with high sensitivity and specificity. Semantic similarity measures were applied across the Gene Ontology terms assigned to proteins in S. cerevisiae to predict protein-protein interactions. Three semantic similarity measures were tested to see which one performs best in predicting such interactions. Based on the results, a method that predicts function of proteins in connection with connectivity was devised. The results show that semantic similarity is a useful measure for predicting protein-protein interactions.
|
22 |
A Novel Approach to Ontology ManagementKim, Jong Woo 01 August 2010 (has links)
The term ontology is defined as the explicit specification of a conceptualization. While much of the prior research has focused on technical aspects of ontology management, little attention has been paid to the investigation of issues that limit the widespread use of ontologies and the evaluation of the effectiveness of ontologies in improving task performance. This dissertation addresses this void through the development of approaches to ontology creation, refinement, and evaluation. This study follows a multi-paper model focusing on ontology creation, refinement, and its evaluation. The first study develops and evaluates a method for ontology creation using knowledge available on the Web. The second study develops a methodology for ontology refinement through pruning and empirically evaluates the effectiveness of this method. The third study investigates the impact of an ontology in use case modeling, which is a complex, knowledge intensive organizational task in the context of IS development. The three studies follow the design science research approach, and each builds and evaluates IT artifacts. These studies contribute to knowledge by developing solutions to three important issues in the effective development and use of ontologies.
|
23 |
Innovative Algorithms and Evaluation Methods for Biological Motif FindingKim, Wooyoung 05 May 2012 (has links)
Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of
network motifs is still invalidated and currently no databases exist for this purpose.
In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs.
In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques.
We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins.
|
24 |
Automatic Assignment of Protein Function with Supervised ClassifiersJung, Jae 16 January 2010 (has links)
High-throughput genome sequencing and sequence analysis technologies have
created the need for automated annotation and analysis of large sets of genes. The
Gene Ontology (GO) provides a common controlled vocabulary for describing gene
function. However, the process for annotating proteins with GO terms is usually
through a tedious manual curation process by trained professional annotators. With
the wealth of genomic data that are now available, there is a need for accurate auto-
mated annotation methods.
The overall objective of my research is to improve our ability to automatically an-
notate proteins with GO terms. The first method, Automatic Annotation of Protein
Functional Class (AAPFC), employs protein functional domains as features and learns
independent Support Vector Machine classifiers for each GO term. This approach relies only on protein functional domains as features, and demonstrates that statistical
pattern recognition can outperform expert curated mapping of protein functional
domain features to protein functions. The second method Predict of Gene Ontology
(PoGO) describes a meta-classification method that integrates multiple heterogeneous
data sources. This method leads to improved performance than the protein domain
method can achieve alone.
Apart from these two methods, several systems have been developed that employ pattern recognition to assign gene function using a variety of features, such as the sequence similarity, presence of protein functional domains and gene expression
patterns. Most of these approaches have not considered the hierarchical relationships
among the terms in the form of a directed acyclic graph (DAG). The DAG represents
the functional relationships between the GO terms, thus it should be an important
component of an automated annotation system. I describe a Bayesian network used as
a multi-layered classifier that incorporates the relationships among GO terms found in
the GO DAG. I also describe an inference algorithm for quickly assigning GO terms
to unlabeled proteins. A comparative analysis of the method to other previously
described annotation systems shows that the method provides improved annotation
accuracy when the performance of individual GO terms are compared. More importantly, this method enables the classification of significantly more GO terms to more
proteins than was previously possible.
|
25 |
Multi-resolution Visualization Of Large Scale Protein Networks Enriched With Gene Ontology AnnotationsYasar, Sevgi 01 September 2009 (has links) (PDF)
Genome scale protein-protein interactions (PPIs) are interpreted as networks or graphs with thousands of nodes from the perspective of computer science. PPI networks represent various types of possible interactions among proteins or genes of a genome. PPI data is vital in protein function prediction since functions of the cells are performed by groups of proteins interacting with each other and main complexes of the cell are made of proteins interacting with each other.
Recent increase in protein interaction prediction techniques have made great amount of protein-protein interaction data available for genomes. As a consequence, a systematic visualization and analysis technique has become crucial.
To the best of our knowledge, no PPI visualization tool consider multi-resolution viewing of PPI network. In this thesis, we implemented a new approach for PPI network visualization
which supports multi-resolution viewing of compound graphs. We construct compound nodes and label them by using gene set enrichment methods based on Gene Ontology annotations.
This thesis further suggests new methods for PPI network visualization.
|
26 |
Mining Microarray Data For Biologically Important Gene SetsKorkmaz, Gulberal Kircicegi Yoksul 01 March 2012 (has links) (PDF)
Microarray technology enables researchers to measure the expression levels of thousands
of genes simultaneously to understand relationships between genes, extract
pathways, and in general understand a diverse amount of biological processes such
as diseases and cell cycles. While microarrays provide the great opportunity of revealing
information about biological processes, it is a challenging task to mine the huge
amount of information contained in the microarray datasets. Generally, since an accurate
model for the data is missing, first a clustering algorithm is applied and then the
resulting clusters are examined manually to find genes that are related with the biological
process under inspection. We need automated methods for this analysis which
can be used to eliminate unrelated genes from data and mine for biologically important
genes. Here, we introduce a general methodology which makes use of traditional
clustering algorithms and involves integration of the two main sources of biological
information, Gene Ontology and interaction networks, with microarray data for eliminating
unrelated information and find a clustering result containing only genes related
with a given biological process. We applied our methodology successfully on a number
of different cases and on different organisms. We assessed the results with Gene Set Enrichment Analysis method and showed that our final clusters are highly enriched.
We also analyzed the results manually and found that most of the genes that are in
the final clusters are actually related with the biological process under inspection.
|
27 |
Using semantic similarity measures across Gene Ontology to predict protein-protein interactionsHelgadóttir, Hanna Sigrún January 2005 (has links)
<p>Living cells are controlled by proteins and genes that interact through complex molecular pathways to achieve a specific function. Therefore, determination of protein-protein interaction is fundamental for the understanding of the cell’s lifecycle and functions. The function of a protein is also largely determined by its interactions with other proteins. The amount of protein-protein interaction data available has multiplied by the emergence of large-scale technologies for detecting them, but the drawback of such measures is the relatively high amount of noise present in the data. It is time consuming to experimentally determine protein-protein interactions and therefore the aim of this project is to create a computational method that predicts interactions with high sensitivity and specificity. Semantic similarity measures were applied across the Gene Ontology terms assigned to proteins in S. cerevisiae to predict protein-protein interactions. Three semantic similarity measures were tested to see which one performs best in predicting such interactions. Based on the results, a method that predicts function of proteins in connection with connectivity was devised. The results show that semantic similarity is a useful measure for predicting protein-protein interactions.</p>
|
28 |
The evolutionary significance of DNA methylation in human genomeZeng, Jia 13 January 2014 (has links)
In eukaryotic genomes ranging from plants to mammals, DNA methylation is a major epigenetic modification of DNA by adding a methyl group exclusively to cytosine residuals. In mammalian genomes such as humans, these cytosine bases are usually followed by guanine. Although it does not change the primary DNA sequence, this covalent modification plays critical roles in several regulatory processes and can impact gene activity in a heritable fashion. What is more important, DNA methylation is essential for mammalian embryonic development and aberrant DNA methylation is implicated in several human diseases, in particular in neuro-developmental syndromes (such as the fragile X and Rett syndromes) and cancer. These biological significances disclose the importance of understanding genomic patterns and function role of DNA methylation in human, as a initial step to get to know the epigenotype and its manner in connecting the phenotype and genotype.
Two key papers back in 1975 independently suggested that methylation of CpG dinucleotides in vertebrates could be established de novo and inherited through somatic cell divisions by protein machineries of DNA methyltransferases that recognizes hemi-methylated CpG palindromes. They also indicated that the methyl group could be recognized by DNA-binding proteins and that DNA methylation directly silences gene expression. After almost four decades, several key points in these foundation papers are proved to be true. Take the mammalian genome for example, there are several findings indicating the epigenetic repression of gene expression by DNA methylation. These include X-chromosome inactivation, gene imprinting and suppressing the proliferation of transposable elements and repeat elements of viral or retroviral origin. In addition to these, many novel roles of DNA methylation have also been revealed. For example, DNA methylation can regulate alternative splicing by preventing CTCF, an evolutionarily conserved zinc-finger protein, binding to DNA. By using the technique of fluorescence resonance energy transfer (FRET) and fluorescence polarization, DNA methylation has also been shown to increase nucleosome compaction through DNA-histone contacts. What is more important, DNA methylation is essential for mammalian embryonic development and aberrant change of DNA methylation has been related to disease such as cancer. However, it is also notable there are several lines of evidence contradicting the relationship between DNA methylation and gene silencing. For example, comparison of DNA methylation levels in human genome on the active and inactive X chromosomes showed reduced methylation specifically over gene bodies on inactive X chromosomes. Not only in human, DNA methylation is found to be usually targeted to the transcription units of actively transcribed genes in invertebrate species. These results prove that the function of DNA methylation is challenging to be unravel. Besides, due to the development of sequencing technique, whole genome DNA methylation profiles have been detected in diverse species. Comparing genomic patterns of DNA methylation shows considerable variation among taxa, especially between vertebrates and invertebrates. However, even though extensive studies reveal the patterns and functions of DNA methylation in different species, in the mean time, they also highlight the limits to our understanding of this complex epigenetic system.
During my Ph.D., in order to perform in-depth studies of DNA methylation in diverse animals as a way to understand the complexity of DNA methylation and its functions, I dedicated my efforts in investigating and analyzing the DNA methylation profiles in diverse species, ranging from insects to primates, including both model and non-model organisms. This dissertation, which constitutes an important part of my research, mainly focuses on the DNA methylation profile in primates including human and chimpanzee. In general, I will use three chapters to elucidate my work in generating and interpreting the whole genome DNA methylation data. Firstly, we generated nucleotide-resolution whole-genome methylation maps of the prefrontal cortex of multiple humans and chimpanzees, then comprehensive comparative studies for these DNA methylation maps have been performed, by integrating data on gene expression as well. This work demonstrates that differential DNA methylation might be an important molecular mechanism driving gene-expression divergence between human and chimpanzee brains and also potentially contribute to the human-specific traits, such as evolution of disease vulnerabilities. Secondly , we performed global analyses of CpG islands (CGIs) methylation across multiple methylomes of distinctive cellular origins in human. The results from this work show that the human CpG islands can be distinctly classified into different clusters solely based upon the DNA methylation profiles, and these CpG islands clusters reflect their distinctive nature at many biological levels, including both genomic characteristics and evolutionary features. Moreover, these CpG islands clusters are non-randomly associated with several important biological phenomena and processes such as diseases, aging, and gene imprinting. These new findings shed lights in deciphering the regulatory mechanisms of CpG islands in human health and diseases. At last, by utilizing the DNA methylome from human sperm and genetic map generated from the International HapMap Consortium project, we investigated the hypothesis suggesting a potential role of germ line DNA methylation in affecting meiotic recombination, which is essential for successful meiosis and various evolutionary processes. Even thought the results imply that DNA methylation is a important factor affecting regional recombination rate, the strength of correlation between these two is not as strong as the previous report. Besides, high-throughput analyses indicate that other epigenetic modifications, tri-methylation of histone 3 lysine 4 and histone 3 lysine 27 are also global features at the recombination hotspots, and may interact with methylation to affect the recombination pattern simultaneously. This work suggests epigenetic mechanisms as additional factors affecting recombination, which cannot be fully explained by the DNA sequence itself. In summary, I hope the results from these work can expand our knowledge regarding the common and variable patterns of DNA methylation in different taxa, and shed light about the function role and its major change during animal evolution.
|
29 |
A Novel Approach to Ontology ManagementKim, Jong Woo 01 August 2010 (has links)
The term ontology is defined as the explicit specification of a conceptualization. While much of the prior research has focused on technical aspects of ontology management, little attention has been paid to the investigation of issues that limit the widespread use of ontologies and the evaluation of the effectiveness of ontologies in improving task performance. This dissertation addresses this void through the development of approaches to ontology creation, refinement, and evaluation. This study follows a multi-paper model focusing on ontology creation, refinement, and its evaluation. The first study develops and evaluates a method for ontology creation using knowledge available on the Web. The second study develops a methodology for ontology refinement through pruning and empirically evaluates the effectiveness of this method. The third study investigates the impact of an ontology in use case modeling, which is a complex, knowledge intensive organizational task in the context of IS development. The three studies follow the design science research approach, and each builds and evaluates IT artifacts. These studies contribute to knowledge by developing solutions to three important issues in the effective development and use of ontologies.
|
30 |
Biologically plausible visual representation of modular decompositionRahm, Jonas January 2005 (has links)
Modular decompositions of protein interaction networks can be used to identify modules of cooperating proteins. The biological plausibility off these modules might be questioned though. This report describes how a modular decomposition can be completed with semantic information in the visual representation. Possible methods for creating modules of functionally related proteins are also proposed in this work. The results show that such modules, with advantage can be combined with modules from a graph decomposition, to find proteins that are likely to cooperate to perform certain functions in organisms
|
Page generated in 0.0756 seconds