Spelling suggestions: "subject:"biolological database"" "subject:"bybiological database""
1 |
Systematic approaches to mine, predict and visualize biological functionsChang, Yi-Chien 12 February 2016 (has links)
With advances in high-throughput technologies and next-generation sequencing, the amount of genomic and proteomic data is dramatically increasing in the post-genomic era. One of the biggest challenges that has arisen is the connection of sequences to their activities and the understanding of their cellular functions and interactions. In this dissertation, I present three different strategies for mining, predicting and visualizing biological functions.
In the first part, I present the COMputational Bridges to Experiments (COMBREX) project, which facilitates the functional annotation of microbial proteins by leveraging the power of scientific community. The goal is to bring computational biologists and biochemists together to expand our knowledge. A database-driven web portal has been built to serve as a hub for the community. Predicted annotations will be deposited into the database and the recommendation system will guide biologists to the predictions whose experimental validation will be more beneficial to our knowledge of microbial proteins. In addition, by taking advantage of the rich content, we develop a web service to help community members enrich their genome annotations.
In the second part, I focus on identifying the genes for enzyme activities that lack genetic details in the major biological databases. Protein sequences are unknown for about one-third of the characterized enzyme activities listed in the EC system, the so-called orphan enzymes. Our approach considers the similarities between enzyme activities, enabling us to deal with broad types of orphan enzymes in eukaryotes. I apply our framework to human orphan enzymes and show that we can successfully fill the knowledge gaps in the human metabolic network.
In the last part, I construct a platform for visually analyzing the eco-system level metabolic network. Most microbes live in a multiple-species environment. The underlying nutrient exchange can be seen as a dynamic eco-system level metabolic network. The complexity of the network poses new visualization challenges. Using the data predicted by Computation Of Microbial Ecosystems in Time and Space (COMETS), I demonstrate that our platform is a powerful tool for investigating the interactions of the microbial community. We apply it to the exploration of a simulated microbial eco-system in the human gut. The result reflects both known knowledge and novel mutualistic interactions, such as the nutrients exchanges between E. coli, C. difficile and L. acidophilus.
|
2 |
SubtiWiki 3.0: A relational database for the functional genome annotation of the model organism Bacillus subtilisZhu, Bingyao 11 January 2018 (has links)
No description available.
|
3 |
Dotazovací jazyk pro databáze biologických dat / Query Language for Biological DatabasesBahurek, Tomáš January 2015 (has links)
With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
|
4 |
Biological and clinical data integration and its applications in healthcareHagen, Matthew 07 January 2016 (has links)
Answers to the most complex biological questions are rarely determined solely from the experimental evidence. It requires subsequent analysis of many data sources that are often heterogeneous. Most biological data repositories focus on providing only one particular type of data, such as sequences, molecular interactions, protein structure, or gene expression. In many cases, it is required for researchers to visit several different databases to answer one scientific question. It is essential to develop strategies to integrate disparate biological data sources that are efficient and seamless to facilitate the discovery of novel associations and validate existing hypotheses.
This thesis presents the design and development of different integration strategies of biological and clinical systems.
The BioSPIDA system is a data warehousing solution that integrates
many NCBI databases and other biological sources on protein sequences,
protein domains, and biological pathways. It utilizes a universal
parser facilitating integration without developing separate source
code for each data site. This enables users to execute fine-grained
queries that can filter genes by their protein interactions, gene
expressions, functional annotation, and protein domain
representation. Relational databases can powerfully return and
generate quickly filtered results to research questions, but they are not the most suitable solution in all cases. Clinical patients and genes are typically annotated by concepts in hierarchical ontologies and performance of relational databases are weakened considerably when traversing and representing graph structures. This thesis illustrates when relational databases are most suitable as well as comparing the performance benchmarks of semantic web technologies and graph databases when comparing ontological concepts.
Several approaches of analyzing integrated data will be discussed to demonstrate the advantages over dependencies on remote data centers. Intensive Care Patients are prioritized by their length of stay and their severity class is estimated by their diagnosis to help minimize wait time and preferentially treat patients by their condition. In a separate study, semantic clustering of patients is conducted by integrating a clinical database and a medical ontology to help identify multi-morbidity patterns.
In the biological area, gene pathways, protein interaction networks, and functional annotation are integrated to help predict and prioritize candidate disease genes. This thesis will present the results that were able to be generated from each project through utilizing a local repository of genes, functional annotations, protein interactions, clinical patients, and medical ontologies.
|
5 |
Leishmania braziliensis: reanota??o estrutura??o das informa??es em modelos de dados relacionaisTorres, Felipe Guimar?es 18 December 2015 (has links)
Submitted by Luis Ricardo Andrade da Silva (lrasilva@uefs.br) on 2016-03-01T23:07:05Z
No. of bitstreams: 1
Dissertacao_Filipe Torres.pdf: 3327655 bytes, checksum: a687ff94c9854a5e290b5d00055299e0 (MD5) / Made available in DSpace on 2016-03-01T23:07:05Z (GMT). No. of bitstreams: 1
Dissertacao_Filipe Torres.pdf: 3327655 bytes, checksum: a687ff94c9854a5e290b5d00055299e0 (MD5)
Previous issue date: 2015-12-18 / Funda??o de Amparo ? Pesquisa do Estado da Bahia - FAPEB / Cutaneous leishmaniasis affects 12 million people around the world. The main etiological agent in Brazil is Leishmania braziliensis. Its annotation is not very well characterized and have inaccurate regions because of the large number of hypothetical and putative genes. To solve this problem, we annotated the Leishmania braziliensis genome (MHOM / BR / 75 / M2904) with new algorithm and curated database by similarity. Initially, we downloaded these sequences of Leishmania braziliensis from NCBI and used GENSCAN and GLIMMER algorithms. We compared the predicted genes with SWISSPROT protein database. The comparison process was made by BLASTx and SWISS-PROT. We also used StructRNAFinder to predict the non-coding RNA, ncrna. We identified 25,539 ORF's, 4,916 genes, 735 ncRNA and 863 proteins. 94.75\% of predicted genes from Leishmania braziliensis were present on Leishmania panamensis genome (MHOM/PA/94/PSC-1). The comparison of ncrna number and proteins highlights a relation between chromosome and ncrna. All genes found were mapped and aligned to the Leishmania braziliensis genome and stored in a relational database model built in PHP 5.0 and MySQL. All data resulting of analyses in this work is available at www.leishdb.com. / A leishmaniose cut?nea afeta cerca 12 milh?es de pessoas ao redor do mundo. O principal agente etiol?gico dessa patologia no Brasil ? a Leishmania braziliensis. A anota??o dela n?o est? bem caracterizada e possui regi?es sem uma grande certeza devido ao grande n?mero de genes hipot?ticos e putativos. Para resolver esse problema,n?s reanotamos o genoma da Leishmania braziliensis (MHOM/BR/75/M2904) com novas vers?es de algoritmos e uma base de dados curada. Inicialmente, n?s baixamos e predizemos os genes com uma base de prote?nas. A compara??o foi feita porBLASTx e o banco de dados SWISS-PROT. Tamb?m utilizamos o preditor StructRNA Finder para predi??o de RNA?s n?o codificantes. Como resultado desse trabalho identificamos 25539 ORF?s, 4916 genes, 735 ncRNA e 863 prote?nas. Cerca de 94.75% genes preditos tamb?m estavam presentes em anota??o da Leishmania panamensis (MHOM/PA/94/PSC-1). Comparando o n?mero de rna n?o codificante e prote?nas foi evidenciado a presen?a de rela??es entre cromossomos e ncrna. Todos os genes encontrados foram mapeados no genoma da Leishmania braziliensis e armazenados em um modelo de banco de dados relacional constru?do em MySQL e PHP 5.0. Todos os dados resultantes desse trabalho est?o dispon?veis.
|
6 |
Identification de nouveaux substrats des kinases Erk1/2 par une approche bio-informatique, pharmacologique et phosphoprotéomiqueCourcelles, Mathieu 12 1900 (has links)
La phosphorylation est une modification post-traductionnelle omniprésente des protéines Cette modification est ajoutée et enlevée par l’activité enzymatique respective des protéines kinases et phosphatases. Les kinases Erk1/2 sont au cœur d’une voie de signalisation importante qui régule l’activité de protéines impliquées dans la traduction, le cycle cellulaire, le réarrangement du cytosquelette et la transcription. Ces kinases sont aussi impliquées dans le développement de l’organisme, le métabolisme du glucose, la réponse immunitaire et la mémoire. Différentes pathologies humaines comme le diabète, les maladies cardiovasculaires et principalement le cancer, sont associées à une perturbation de la phosphorylation sur les différents acteurs de cette voie. Considérant l’importance biologique et clinique de ces deux kinases, connaître l’étendue de leur activité enzymatique pourrait mener au développement de nouvelles thérapies pharmacologiques.
Dans ce contexte, l’objectif principal de cette thèse était de mesurer l’influence de cette voie sur le phosphoprotéome et de découvrir de nouveaux substrats des kinases Erk1/2. Une étude phosphoprotéomique de cinétique d’inhibition pharmacologique de la voie de signalisation Erk1/2 a alors été entreprise. Le succès de cette étude était basé sur trois technologies clés, soit l’enrichissement des phosphopeptides avec le dioxyde de titane, la spectrométrie de masse haut débit et haute résolution, et le développement d’une plateforme bio-informatique nommée ProteoConnections. Cette plateforme permet d’organiser les données de protéomique, évaluer leur qualité, indiquer les changements d’abondance et accélérer l’interprétation des données. Une fonctionnalité distinctive de ProteoConnections est l’annotation des sites phosphorylés identifiés (kinases, domaines, structures, conservation, interactions protéiques phospho-dépendantes). Ces informations ont été essentielles à l’analyse des 9615 sites phosphorylés sur les 2108 protéines identifiées dans cette étude, soit le plus large ensemble rapporté chez le rat jusqu’à ce jour. L’analyse des domaines protéiques a révélé que les domaines impliqués dans les interactions avec les protéines, les acides nucléiques et les autres molécules sont les plus fréquemment phosphorylés et que les sites sont stratégiquement localisés pour affecter les interactions.
Un algorithme a été implémenté pour trouver les substrats potentiels des kinases Erk1/2 à partir des sites identifiés selon leur motif de phosphorylation, leur cinétique de stimulation au sérum et l’inhibition pharmacologique de Mek1/2. Une liste de 157 substrats potentiels des kinases Erk1/2 a ainsi été obtenue. Parmi les substrats identifiés, douze ont déjà été rapportés et plusieurs autres ont des fonctions associées aux substrats déjà connus. Six substrats (Ddx47, Hmg20a, Junb, Map2k2, Numa1, Rras2) ont été confirmés par un essai kinase in vitro avec Erk1. Nos expériences d’immunofluorescence ont démontré que la phosphorylation de Hmg20a sur la sérine 105 par Erk1/2 affecte la localisation nucléocytoplasmique de cette protéine.
Finalement, les phosphopeptides isomériques positionnels, soit des peptides avec la même séquence d’acides aminés mais phosphorylés à différentes positions, ont été étudiés avec deux nouveaux algorithmes. Cette étude a permis de déterminer leur fréquence dans un extrait enrichi en phosphopeptides et d’évaluer leur séparation par chromatographie liquide en phase inverse. Une stratégie analytique employant un des algorithmes a été développée pour réaliser une analyse de spectrométrie de masse ciblée afin de découvrir les isomères ayant été manqués par la méthode d’analyse conventionnelle. / Phosphorylation is an omnipresent post-translational modification of proteins that regulates numerous cellular processes. This modification is controlled by the enzymatic activity of protein kinases and phosphatases. Erk1/2 kinases are central to an important signaling pathway that modulates translation, cell cycle, cytoskeleton rearrangement and transcription. They are also implicated in organism development, glucose metabolism, immune response and memory. Different human pathologies such as diabetes, cardiovascular diseases, and most importantly cancer, are associated with misregulation or mutations in members of this pathway. Considering the biological and clinical importance of those two kinases, discovering the extent of their enzymatic activity could favor the development of new pharmacological therapies.
In this context, the principal objective of this thesis was to measure the influence of this pathway on the phosphoproteome and to discover new substrates of the Erk1/2 kinases. A phosphoproteomics study on the pharmacological inhibition kinetics of the Erk1/2 signaling pathway was initiated. The success of this study was based on three key technologies such as phosphopeptides enrichment with titanium dioxide, high-throughput and high-resolution mass spectrometry, and the development of ProteoConnections, a bioinformatics analysis platform. This platform is dedicated to organize proteomics data, evaluate data quality, report changes of abundance and accelerate data interpretation. A distinctive functionality of ProteoConnections is the annotation of phosphorylated sites (kinases, domains, structures, conservation, phospho-dependant protein interactions, etc.). This information was essential for the dataset analysis of 9615 phosphorylated sites identified on 2108 proteins during the study, which is, until now, the largest one reported for rat. Protein domain analysis revealed that domains implicated in proteins, nucleic acids and other molecules binding were the most frequently phosphorylated and that these sites are strategically located to affect the interactions.
An algorithm was implemented to find Erk1/2 kinases potential substrates of identified sites using their phosphorylation motif, serum stimulation and Mek1/2 inhibition kinetic profile. A list of 157 potential Erk1/2 substrates was obtained. Twelve of them were previously reported and many more have functions associated to known substrates. Six substrates (Ddx47, Hmg20a, Junb, Map2k2, Numa1, and Rras2) were confirmed by in vitro kinase assays with Erk1. Our immunofluorescence experiments demonstrated that the phosphorylation of Hmg20a on serine 105 by Erk1/2 affects the nucleocytoplasmic localization of this protein.
Finally, phosphopeptides positional isomers, peptides with the same amino acids sequence but phosphorylated at different positions, were studied with two new algorithms. This study allowed us to determine their frequency in an enriched phosphopeptide extract and to evaluate their separation by reverse-phase liquid chromatography. An analytical strategy that uses one of the algorithms was developed to do a targeted mass spectrometry analysis to discover the isomers that had been missed by the conventional method.
|
7 |
Identification de nouveaux substrats des kinases Erk1/2 par une approche bio-informatique, pharmacologique et phosphoprotéomiqueCourcelles, Mathieu 12 1900 (has links)
La phosphorylation est une modification post-traductionnelle omniprésente des protéines Cette modification est ajoutée et enlevée par l’activité enzymatique respective des protéines kinases et phosphatases. Les kinases Erk1/2 sont au cœur d’une voie de signalisation importante qui régule l’activité de protéines impliquées dans la traduction, le cycle cellulaire, le réarrangement du cytosquelette et la transcription. Ces kinases sont aussi impliquées dans le développement de l’organisme, le métabolisme du glucose, la réponse immunitaire et la mémoire. Différentes pathologies humaines comme le diabète, les maladies cardiovasculaires et principalement le cancer, sont associées à une perturbation de la phosphorylation sur les différents acteurs de cette voie. Considérant l’importance biologique et clinique de ces deux kinases, connaître l’étendue de leur activité enzymatique pourrait mener au développement de nouvelles thérapies pharmacologiques.
Dans ce contexte, l’objectif principal de cette thèse était de mesurer l’influence de cette voie sur le phosphoprotéome et de découvrir de nouveaux substrats des kinases Erk1/2. Une étude phosphoprotéomique de cinétique d’inhibition pharmacologique de la voie de signalisation Erk1/2 a alors été entreprise. Le succès de cette étude était basé sur trois technologies clés, soit l’enrichissement des phosphopeptides avec le dioxyde de titane, la spectrométrie de masse haut débit et haute résolution, et le développement d’une plateforme bio-informatique nommée ProteoConnections. Cette plateforme permet d’organiser les données de protéomique, évaluer leur qualité, indiquer les changements d’abondance et accélérer l’interprétation des données. Une fonctionnalité distinctive de ProteoConnections est l’annotation des sites phosphorylés identifiés (kinases, domaines, structures, conservation, interactions protéiques phospho-dépendantes). Ces informations ont été essentielles à l’analyse des 9615 sites phosphorylés sur les 2108 protéines identifiées dans cette étude, soit le plus large ensemble rapporté chez le rat jusqu’à ce jour. L’analyse des domaines protéiques a révélé que les domaines impliqués dans les interactions avec les protéines, les acides nucléiques et les autres molécules sont les plus fréquemment phosphorylés et que les sites sont stratégiquement localisés pour affecter les interactions.
Un algorithme a été implémenté pour trouver les substrats potentiels des kinases Erk1/2 à partir des sites identifiés selon leur motif de phosphorylation, leur cinétique de stimulation au sérum et l’inhibition pharmacologique de Mek1/2. Une liste de 157 substrats potentiels des kinases Erk1/2 a ainsi été obtenue. Parmi les substrats identifiés, douze ont déjà été rapportés et plusieurs autres ont des fonctions associées aux substrats déjà connus. Six substrats (Ddx47, Hmg20a, Junb, Map2k2, Numa1, Rras2) ont été confirmés par un essai kinase in vitro avec Erk1. Nos expériences d’immunofluorescence ont démontré que la phosphorylation de Hmg20a sur la sérine 105 par Erk1/2 affecte la localisation nucléocytoplasmique de cette protéine.
Finalement, les phosphopeptides isomériques positionnels, soit des peptides avec la même séquence d’acides aminés mais phosphorylés à différentes positions, ont été étudiés avec deux nouveaux algorithmes. Cette étude a permis de déterminer leur fréquence dans un extrait enrichi en phosphopeptides et d’évaluer leur séparation par chromatographie liquide en phase inverse. Une stratégie analytique employant un des algorithmes a été développée pour réaliser une analyse de spectrométrie de masse ciblée afin de découvrir les isomères ayant été manqués par la méthode d’analyse conventionnelle. / Phosphorylation is an omnipresent post-translational modification of proteins that regulates numerous cellular processes. This modification is controlled by the enzymatic activity of protein kinases and phosphatases. Erk1/2 kinases are central to an important signaling pathway that modulates translation, cell cycle, cytoskeleton rearrangement and transcription. They are also implicated in organism development, glucose metabolism, immune response and memory. Different human pathologies such as diabetes, cardiovascular diseases, and most importantly cancer, are associated with misregulation or mutations in members of this pathway. Considering the biological and clinical importance of those two kinases, discovering the extent of their enzymatic activity could favor the development of new pharmacological therapies.
In this context, the principal objective of this thesis was to measure the influence of this pathway on the phosphoproteome and to discover new substrates of the Erk1/2 kinases. A phosphoproteomics study on the pharmacological inhibition kinetics of the Erk1/2 signaling pathway was initiated. The success of this study was based on three key technologies such as phosphopeptides enrichment with titanium dioxide, high-throughput and high-resolution mass spectrometry, and the development of ProteoConnections, a bioinformatics analysis platform. This platform is dedicated to organize proteomics data, evaluate data quality, report changes of abundance and accelerate data interpretation. A distinctive functionality of ProteoConnections is the annotation of phosphorylated sites (kinases, domains, structures, conservation, phospho-dependant protein interactions, etc.). This information was essential for the dataset analysis of 9615 phosphorylated sites identified on 2108 proteins during the study, which is, until now, the largest one reported for rat. Protein domain analysis revealed that domains implicated in proteins, nucleic acids and other molecules binding were the most frequently phosphorylated and that these sites are strategically located to affect the interactions.
An algorithm was implemented to find Erk1/2 kinases potential substrates of identified sites using their phosphorylation motif, serum stimulation and Mek1/2 inhibition kinetic profile. A list of 157 potential Erk1/2 substrates was obtained. Twelve of them were previously reported and many more have functions associated to known substrates. Six substrates (Ddx47, Hmg20a, Junb, Map2k2, Numa1, and Rras2) were confirmed by in vitro kinase assays with Erk1. Our immunofluorescence experiments demonstrated that the phosphorylation of Hmg20a on serine 105 by Erk1/2 affects the nucleocytoplasmic localization of this protein.
Finally, phosphopeptides positional isomers, peptides with the same amino acids sequence but phosphorylated at different positions, were studied with two new algorithms. This study allowed us to determine their frequency in an enriched phosphopeptide extract and to evaluate their separation by reverse-phase liquid chromatography. An analytical strategy that uses one of the algorithms was developed to do a targeted mass spectrometry analysis to discover the isomers that had been missed by the conventional method.
|
8 |
MIDB : um modelo de integração de dados biológicosPerlin, Caroline Beatriz 29 February 2012 (has links)
Made available in DSpace on 2016-06-02T19:05:56Z (GMT). No. of bitstreams: 1
4370.pdf: 1089392 bytes, checksum: 82daa0e51d37184f8864bd92d9342dde (MD5)
Previous issue date: 2012-02-29 / In bioinformatics, there is a huge volume of data related to biomolecules and to nucleotide and amino acid sequences that reside (in almost their totality) in several Biological Data Bases (BDBs). For a specific sequence, there are some informational classifications: genomic data, evolution-data, structural data, and others. Some BDBs store just one or some of these classifications. Those BDBs are hosted in different sites and servers, with several data base management systems with different data models. Besides, instances and schema might have semantic heterogeneity. In such scenario, the objective of this project is to propose a biological data integration model, that adopts new schema integration and instance integration techniques. The proposed integration model has a special mechanism of schema integration and another mechanism that performs the instance integration (with support of a dictionary) allowing conflict resolution in the attribute values; and a Clustering Algorithm is used in order to cluster similar entities. Besides, a domain specialist participates managing those clusters. The proposed model was validated through a study case focusing on schema and instance integration about nucleotide sequence data from organisms of Actinomyces gender, captured from four different data sources. The result is that about 97.91% of the attributes were correctly categorized in the schema integration, and the instance integration was able to identify that about 50% of the clusters created need support from a specialist, avoiding errors on the instance resolution. Besides, some contributions are presented, as the Attributes Categorization, the Clustering Algorithm, the distance functions proposed and the proposed model itself. / Na bioinformática, existe um imenso volume de dados sendo produzidos, os quais estão relacionados a sequências de nucleotídeos e aminoácidos que se encontram, em quase a sua totalidade, armazenados em Bancos de Dados Biológicos (BDBs). Para uma determinada sequência existem algumas classificações de informação: dados genômicos, dados evolutivos, dados estruturais, dentre outros. Existem BDBs que armazenam somente uma ou algumas dessas classificações. Tais BDBs estão hospedados em diferentes sites e servidores, com sistemas gerenciadores de banco de dados distintos e com uso de diferentes modelos de dados, além de terem instâncias e esquemas com heterogeneidade semântica. Dentro desse contexto, o objetivo deste projeto de mestrado é propor um Modelo de Integração de Dados Biológicos, com novas técnicas de integração de esquemas e integração de instâncias. O modelo de integração proposto possui um mecanismo especial de integração de esquemas, e outro mecanismo que realiza a integração de instâncias de dados (com um dicionário acoplado) permitindo resolução de conflitos nos valores dos atributos; e um Algoritmo de Clusterização é utilizado, com o objetivo de realizar o agrupamento de entidades similares. Além disso, o especialista de domínio participa do gerenciamento desses agrupamentos. Esse modelo foi validado por meio de um estudo de caso com ênfase na integração de esquemas e integração de instâncias com dados de sequências de nucleotídeos de genes de organismos do gênero Actinomyces, provenientes de quatro diferentes fontes de dados. Como resultado, obteve-se que aproximadamente 97,91% dos atributos foram categorizados corretamente na integração de esquemas e a integração de instâncias conseguiu identificar que aproximadamente 50% dos clusters gerados precisam de tratamento do especialista, evitando erros de resolução de entidades. Além disso, algumas contribuições são apresentadas, como por exemplo a Categorização de Atributos, o Algoritmo de Clusterização, as funções de distância propostas e o modelo MIDB em si.
|
Page generated in 0.0489 seconds