1 |
Characterization of protein families, sequence patterns, and functional annotations in large data setsBresell, Anders January 2008 (has links)
Bioinformatics involves storing, analyzing and making predictions on massive amounts of protein and nucleotide sequence data. The thesis consists of six papers and is focused on proteins. It describes the utilization of bioinformatics techniques to characterize protein families and to detect patterns in gene expression and in polypeptide occurrences. Two protein families were bioinformatically characterized - the membrane associated proteins in eicosanoid and glutathione metabolism (MAPEG) and the Tripartite motif (TRIM) protein families. In the study of the MAPEG super-family, application of different bioinformatic methods made it possible to characterize many new members leading to a doubling of the family size. Furthermore, the MAPEG members were subdivided into families. Remarkably, in six families with previously predominantly mammalian members, fish representatives were also now detected, which dated the origin of these families back to the Cambrium ”species explosion”, thus earlier than previously anticipated. Sequence comparisons made it possible to define diagnostic sequence patterns that can be used in genome annotations. Upon publication of several MAPEG structures, these patterns were confirmed to be part of the active sites. In the TRIM study, the bioinformatic analyses made it possible to subdivide the proteins into three subtypes and to characterize a large number of members. In addition, the analyses showed crucial structural dependencies between the RING and the B-box domains of the TRIM member Ro52. The linker region between the two domains, denoted RBL, is known to be disease associated. Now, an amphipathic helix was found to be a characteristic feature of the RBL region, which also was used to divide the family into three subtypes. The ontology annotation treebrowser (OAT) tool was developed to detect functional similarities or common concepts in long lists of proteins or genes, typically generated from proteomics or microarray experiments. OAT was the first annotation browser to include both Gene Ontology (GO) and Medical Subject Headings (MeSH) into the same framework. The complementarity of these two ontologies was demonstrated. OAT was used in the TRIM study to detect differences in functional annotations between the subtypes. In the oligopeptide study, we investigated pentapeptide patterns that were over- or under-represented in the current de facto standard database of protein knowledge and a set of completed genomes, compared to what could be expected from amino acid compositions. We found three predominant categories of patterns: (i) patterns originating from frequently occurring families, e.g. respiratory chain-associated proteins and translation machinery proteins; (ii) proteins with structurally and/or functionally favored patterns; (iii) multicopy species-specific retrotransposons, only found in the genome set. Such patterns may influence amino acid residue based prediction algorithms. These findings in the oligopeptide study were utilized for development of a new method that detects translated introns in unverified protein predictions, which are available in great numbers due to the many completed and ongoing genome projects. A new comprehensive database of protein sequences from completed genomes was developed, denoted genomeLKPG. This database was of central importance in the MAPEG, TRIM and oligopeptide studies. The new sequence database has also been proven useful in several other studies.
|
2 |
Evolution and diversification of secreted protein effectors in the order LegionellalesAmmunet, Tea January 2018 (has links)
The evolution of a large, diverse group of intracellular bacteria was previously very difficult to study. Recent advancements in both metagenomic methods and bioinformatics has made it possible. This thesis investigates the evolution of the order Legionellales. The study concentrates on a group of proteins essential for pathogenesis and host manipulation in the order, called effector proteins. The role of effectors in host adaptation, evolutionary history and the diversification of the order were investigated using a multitude of bioinformatics methods. First, the abundance and distribution of the known effector proteins in the orderwas found to cover newly discovered clades. There was a clear distinction between the proteins present in Legionellales and the outgoup, indicating the important role of the effectors in the order. Further, the effectors with known functions found in the new clades, particularly in Berkiella, revealed potential modes of host manipulation of this group. Secondly, the evolution of the effector gene content in the order shed light on theevolution of the order, as well as on the potential evolutionary differences between Legionellaceae and Coxiellaceae. In general, most of the effectors were gained early in the last common ancestor of Legionellales and Legionellaceae, as further indication of their role in the diversification of the order. New effector genes were acquired in the Legionellaceae even up to recent speciation events, whereas Coxiellacea have lost more protein coding genes with time. These differences may be due to horizontal gene transfer in the case of gene gains in Legionellaceae and loss of selection in the case of gene losses in Coxiellaceae. Third, the early evolution of core gained effector proteins for the order was studied.Two of the eight investigated core effectors seem to have a connection to eukaryotes, the rest to other bacteria, indicating both inter-domain and within bacteria horizontal gene transfer. In particular, one effector protein with eukaryotic motif gained at the last common ancestor of Legionellales, was found in all the clades and is therefore an important evolutionary link that may have allowed Legionellales to utilize eukaryotic hosts.
|
3 |
The Gonium pectorale genome demonstrates co-option of cell cycle regulation during the evolution of multicellularityHanschen, Erik R., Marriage, Tara N., Ferris, Patrick J., Hamaji, Takashi, Toyoda, Atsushi, Fujiyama, Asao, Neme, Rafik, Noguchi, Hideki, Minakuchi, Yohei, Suzuki, Masahiro, Kawai-Toyooka, Hiroko, Smith, David R., Sparks, Halle, Anderson, Jaden, Bakarić, Robert, Luria, Victor, Karger, Amir, Kirschner, Marc W., Durand, Pierre M., Michod, Richard E., Nozaki, Hisayoshi, Olson, Bradley J. S. C. 22 April 2016 (has links)
The transition to multicellularity has occurred numerous times in all domains of life, yet its initial steps are poorly understood. The volvocine green algae are a tractable system for understanding the genetic basis of multicellularity including the initial formation of cooperative cell groups. Here we report the genome sequence of the undifferentiated colonial alga, Gonium pectorale, where group formation evolved by co-option of the retinoblastoma cell cycle regulatory pathway. Significantly, expression of the Gonium retinoblastoma cell cycle regulator in unicellular Chlamydomonas causes it to become colonial. The presence of these changes in undifferentiated Gonium indicates extensive group-level adaptation during the initial step in the evolution of multicellularity. These results emphasize an early and formative step in the evolution of multicellularity, the evolution of cell cycle regulation, one that may shed light on the evolutionary history of other multicellular innovations and evolutionary transitions.
|
4 |
Uma abordagem integrada para a construção e utilização de HMMs de perfil para análises genômicas e metagenômicas / An integrated approach for the construction and application of profile HMMs for genomic and metagenomic analyses.Kashiwabara, Liliane Santana Oliveira 02 August 2019 (has links)
HMMs de perfil são um método poderoso para modelar a diversidade de sequências biológicas e constituem uma abordagem muito sensível para a detecção de ortólogos remotos. Uma potencial aplicação de tais modelos é a detecção de vírus emergentes e novos elementos genéticos móveis. Nosso grupo desenvolveu recentemente o GenSeed-HMM, um programa que emprega HMMs de perfil como sementes para montagem progressiva de genes-alvo, utilizando tanto dados genômicos como metagenômicos. No presente trabalho foi desenvolvido o TABAJARA, um programa para o desenho racional de HMMs de perfil. Partindo de um alinhamento de múltiplas sequências, o TABAJARA é capaz de encontrar blocos que são (1) conservados ou (2) discriminativos para dois ou mais grupos de sequências. O programa utiliza diferentes métricas para atribuir pontuações posição-específicas ao longo de todo o alinhamento e utiliza então uma janela deslizante para encontrar as regiões com maiores pontuações. Blocos de alinhamento selecionados são então extraídos e utilizados para construir HMMs de perfil. Para validar o método, o programa TABAJARA foi empregado para a construção de modelos para vírus do gênero Flavivirus e para fagos da família Microviridae. Em ambos os grupos virais foi possível se obter modelos de ampla abrangência, capazes de detectar todos os membros de um respectivo grupo taxonômico, e modelos de abrangência mais restrita, específicos para espécies distintas de Flavivirus (ex. DENV, ZIKV ou YFV) ou subfamílias de Microviridae (ex. Alpavirinae, Gokushovirinae e Pichovirinae). Em outra validação, foram utilizadas sequências da endonuclease Cas1 para se obter modelos capazes de diferenciar CRISPRs de casposons, esses últimos representando uma superfamília de transposons de DNA autossintetizantes, os quais originaram o sistema de imunidade CRISPR-Cas de procariotos. O TABAJARA conseguiu gerar modelos específicos de Cas1 derivada de casposons, permitindo sua diferenciação em relação aos seus ortólogos de CRISPRs. No presente trabalho foi desenvolvido ainda o HMM-Prospector, uma ferramenta que utiliza um conjunto de HMMs de perfil para a triagem de dados de sequenciamento genômico ou metagenômico. O programa informa quais são os modelos mais reconhecidos pelas leituras, sob valores de corte de pontuação definidos pelo usuário, assim como quantas leituras são detectadas por cada modelo. Com esta informação, os modelos mais relevantes podem ser utilizados como sementes em montagens progressivas com o programa GenSeed-HMM, dentro de uma abordagem integrada para a construção de modelos e sua aplicação. Finamente, foi desenvolvido o e-Finder, um aplicativo genérico para a detecção e extração de elementos multigênicos a partir de genomas ou metagenomas montados utilizando HMMs de perfil. O e-Finder executa buscas de similaridade entre os HMMs de perfil e as sequências traduzidas dos dados montados e checa, em seguida, se os critérios de sintenia pré-definidos foram atendidos, incluindo o número mínimo de genes, a ordem dos genes e as distâncias intergênicas. As sequências dos elementos são então extraídas, as regiões codificantes (ORFs) identificadas e traduzidas conceitualmente em sequências completas de proteínas. Para validar esta ferramenta, foram empegados dois estudos de caso, profagos da família Microviridae e casposons, utilizando-se HMMs de perfil específicos, construídos com o programa TABAJARA. Em ambos os casos, o e-Finder foi executado usando-se a base de dados PATRIC, um repositório com mais de 135.000 genomas de bactérias e arqueias. Foram identificados um total de 91 contigs positivos para casposons a partir de 79 genomas distintos. No caso dos Microviridae, foram encontrados 104 profagos candidatos, estendendo o conhecimento da gama de hospedeiros bacterianos. Em ambos os casos, análises filogenéticas confirmaram a correta atribuição taxonômica das sequências positivas. Os programas desenvolvidos neste trabalho podem ser utilizados isoladamente ou em combinação para detectar e discriminar sequências conhecidas ou remotamente relacionadas. Juntamente com o GenSeed-HMM, estes programas constituem um conjunto integrado de ferramentas com potencial aplicação na busca de novos vírus e elementos genéticos móveis, bem como em qualquer outra tarefa relacionada à detecção e/ou discriminação de subgrupos de famílias de sequências nucleotídicas ou proteicas / Profile HMMs are a powerful way of modeling sequence diversity and constitute a very sensitive approach to detect remote orthologs. A potential application of such models is the detection of emerging viruses and novel mobile genetic elements. Our group has recently developed GenSeed-HMM, a tool that employs profile HMMs as seeds for gene-targeted progressive assembly using either genomic or metagenomic data. In this work we developed TABAJARA, a program for the rational design of profile HMMs. Starting from a multiple sequence alignment, TABAJARA is able to find blocks that are either (1) conserved across all sequences or (2) discriminative for two or more specific groups of sequences. The program uses different metrics to ascribe position-specific scores along the whole alignment and then uses a sliding-window to find top-scoring regions. Selected alignment blocks are then extracted and used to build profile HMMs. To validate the method, we employed TABAJARA to construct models for viruses of the Flavivirus genus and phages of the Microviridae family. In both viral groups we were able to obtain wide-range models, able to detect all members of the respective taxonomic group, and models that are specific to particular Flavivirus species (e.g. DENV, ZIKV or YFV) or Microviridae subfamilies (e.g. Alpavirinae, Gokushovirinae and Pichovirinae). In another validation, we used sequences of the endonuclease Cas1 to obtain models capable of differentiating CRISPRs from casposons, the latter elements representing a superfamily of self-synthesizing DNA transposons that originated the prokaryotic CRISPR-Cas immunity. TABAJARA succeeded to generate models specific to casposon-derived Cas1, enabling their differentiation from CRISPR orthologs. We also developed HMM-Prospector, a tool that can use a batch of profile HMMs to screen genomic or metagenomic sequencing data, reporting which profile HMMs are mostly recognized under user-defined score cutoff values, and how many reads are detected by each model. With this information, the most relevant models can be used as seeds in progressive assemblies with GenSeed-HMM program, providing an integrated approach for model construction and application. Finally, we developed e-Finder, a generic application for detecting and extracting multigene elements from assembled genomes or metagenomes using profile HMMs. e-Finder runs similarity searches of profile HMMs against translated sequences of the assembled data and then checks if pre-defined syntenic criteria have been fulfilled, including minimum number of genes, gene order and intergenic distances. Element sequences are then extracted, their ORFs identified and conceptually translated into full-length protein sequences. To validate the tool, we employed two distinct case studies, prophages of the Microviridae family and casposons, using specific profile HMMs constructed by TABAJARA. In both cases, we executed e-Finder using the PATRIC database, a repository with over 135,000 bacterial and archaeal genomes. We identified in total 91 casposon-positive contigs from 79 distinct genomes. In the case of Microviridae, we found a total of 104 provirus candidates, extending the known range of bacterial hosts. In both cases, phylogenetic analyses confirmed the correct taxonomic assignment of the positive sequences. The programs developed in this work can be used alone or in combination to detect and discriminate known or distantly related sequences. Together with GenSeed-HMM, these programs provide an integrated toolbox with potential application in the search of novel viruses and mobile genetic elements, as well as in any other task related to the detection and/or discrimination of subgroups of DNA or protein sequences.
|
5 |
Function and Evolution of Putative Odorant Carriers in the Honey Bee (Apis mellifera)Foret, Sylvain, sylvain.foret@anu.edu.au January 2007 (has links)
The remarkable olfactory power of insect species is thought to be generated by a combinatorial action of G-protein-coupled olfactory receptors (ORs) and olfactory carriers. Two such carrier gene families are found in insects: the odorant binding proteins (OBPs) and the chemosensory proteins (CSPs). In olfactory sensilla, OBPs and CSPs are believed to deliver hydrophobic air-borne molecules to ORs, but their expression in non-olfactory tissues suggests that they also may function as general carriers in other developmental and physiological processes.
¶
Bioinformatics and experimental approaches were used to characterise the OBP and CSP gene families in a highly social insect, the western honey bee (Apis mellifera). Comparison with other insects reveals that the honey bee has the smallest set of these genes, consisting of only 21 OBPs and 6 CSPs. These numbers stand in stark contrast to the 66 OBPs and 7 CSPs in the mosquito Anopheles gambiae and the 46 OBPs and 20 CSPs in the beetle Tribolium castaneum. The genes belonging to both families are often organised in clusters, and evolve by lineage specic expansions. Positive selection has been found to play a role in generating a greater sequence diversication in the OBP family in contrast to the CSP gene family that is more conserved, especially in the binding pocket. Expression proling under a wide range of conditions shows that, in the honey, bee only a minority of these genes are antenna-specic. The remaining genes are expressed either ubiquitously, or are tightly regulated in specialized tissues or during development. These findings support the view that OBPs and CSPs are not restricted to olfaction, and are likely to be involved in broader physiological functions.
¶
Finally, the detailed expression study and the functional characterization of a member of the CSP family, uth (unable-to-hatch), is reported. This gene is expressed in a maternal-zygotic fashion, and is restricted to the egg and embryo. Blocking the zygotic expression of uth with double-stranded RNA causes abnormalities in all body parts where this gene is highly expressed.
The treated embryos are `unable-to-hatch' and cannot progress to the larval stages. Our ndings reveal a novel, essential role for this gene family and suggest that uth is an ectodermal gene involved in embryonic cuticle formation.
|
6 |
The Implementation and Evaluation of Bioinformatics Algorithms for the Classification of Arabinogalactan-Proteins in Arabidopsis thalianaYerardi, Jason T. 26 July 2011 (has links)
No description available.
|
Page generated in 0.0465 seconds