• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19
  • 5
  • 2
  • 1
  • Tagged with
  • 34
  • 34
  • 10
  • 8
  • 8
  • 7
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Fully automated methods for protein fold recognition using predicted secondary structure

McGuffin, Liam James January 2002 (has links)
No description available.
2

Assembly and Automated Annotation of the <i>Clostridium scatologenes</i> Genome

Tiwari, Jitesh 01 May 2012 (has links)
Clostridium scatologenes is an anaerobic bacterium that demonstrates some unusual metabolic traits such as the production of 3-methyl indole. The availability of genome level sequencing has lent itself to the exploration and elucidation of unique metabolic pathways in other organisms such as Clostridium botulinum. The Clostridium scatologenes genome, with an estimated length 4.2 million bp, was sequenced by the Applied Biosystems Solid method and the Roche 454 pyrosequencing method. The resulting DNA sequences were combined and assembled into 8267 contigs with an average length of 1250 bp with the Newbler Assembler program. Comparision of published subunits of csd gene and assembled contigs identified that one contig contained all three subunits. In addition a gene with similarity to clostridium carboxidivorans butyrate kinase was found lined next to csd gene. An alignment of the contig and csdgene sequences identified three deletions in the contig within the 4066 bases of the alignment. This implies that there is about 0.07% error rate in the sequencing itself requiring more finishing. Even without finishing the genome assembly into single contig, contigs were annotated in RAST pipeline predicting 2521 protein encoding genes (PEGs). The PEGs were classified by their metabolic function and compared to classified PEGs found in the closely related clostridium species, Clostridium carboxidivorans and Clostridium. ljungdahlii, which have similarly sized genomes. According to the RAST analysis, Clostridium scatologenes had 35% subsystem coverage of all known metabolic processes with its 2521 PEGs. This compares to 41% for Clostridium carboxidivorans with 4174 PEGs (29) and 42% for Clostridium ljungdahlii with 4184 PEGs (30), indicating that Clostridium scatologenesmay still have more genes to be identified. Comparison of the percent genes found in the metabolic subsystems was similar except in motility and chemotaxis. The contigs, on which the csd gene and tryptophan metabolizing genes lay, were examined to see if additional genes might support these metabolic pathways. Butyrate kinase was associated with the csd genes but no other associations were found for the two tryptophan metabolizing genes. The tryptophan biosynthesis operon genes were all found on one contig (contig 6771) and were syntenic with other bacterial species.
3

Towards a Genome Reverse Compiler

Warren, Andrew S. 29 November 2007 (has links)
The Genome Reverse Compiler (GRC) is an annotation tool for prokaryotic genomes. Its name and philosophy are based on analogy with a high-level programming language compiler. In this analogy, the genome is a program in a certain low-level language that humans cannot understand. Given the sequence of any prokaryotic genome, GRC produces its corresponding "high-level program"--its annotation. GRC works in a completely automatic manner, using standard input and output formats. The goal is to provide an open-source, easy-to-run, very efficient annotation program. / Master of Science
4

Computational proteomics for genome annotation

Blakeley, Paul January 2013 (has links)
The field of proteogenomics operates at the interface between proteomics and genomics, and has emerged during the past decade to exploit the vast quantities of high-throughput sequence data. A range of different proteogenomics approaches have been developed, which integrate mass spectrometry data with genome sequence data to provide empirical evidence for protein-coding genes. However, current methods may not be optimized as they do not fully consider the splicing complexity in eukaryotes and there is currently no best practice method. To address this, we investigate the level of proteomics support for Ensembl gene models in human, and a selection of model organisms. We find a disparity between the number of splice variants confirmed by extant data, and the number that can theoretically be confirmed using current proteomics technologies. We then go on to investigate EST-based proteogenomics methods, which enabled the discovery of novel peptide sequences in the chicken genome, which represent hitherto unannotated genes, amended gene models, polymorphisms, and genes missing from the genome assembly. Different approaches for searching mass spectrometry data against transcript sequences are explored, and we show that searching mass spectra against protein sequences predicted by the EORF and ESTScan2 translation tools results in the best sensitivity.
5

Contributions to In Silico Genome Annotation

Kalkatawi, Manal M. 30 November 2017 (has links)
Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally, we focus on deriving a model capable of facilitating the functional annotation of prokaryotes. As far as we know, there is no fully automated system for detailed comparison of functional annotations generated by different methods. Hence, we developed BEACON, a method and supporting system that compares gene annotation from various methods to produce a more reliable and comprehensive annotation. Overall, our research contributed to different aspects of the genome annotation.
6

Improving structural and functional annotation of the chicken genome

Buza, Teresia 11 December 2009 (has links)
Chicken is an important non-mammalian vertebrate model organism for biomedical research, especially for vaccine production and the study of embryology and development. Chicken is also an important agricultural species and major food source for high-quality protein worldwide. In addition, chicken is an important model organism for comparative and evolution genomics. Exploitation of this genome as a biomedical model is hindered by its incomplete structural and functional annotation. This incomplete annotation makes it difficult for researchers to model their functional genomics datasets. Improving structural and functional annotation of the chicken genome will allow researchers to derive biological meaning from their functional genomics datasets. The objectives of this study were to identify proteins expressed in multiple chicken tissues, to functionally annotate experimentally confirmed proteins expressed in different chicken tissues, to quantify and assess the Gene Ontology (GO) annotation quality, and to facilitate functional annotation of microarray data. The results of this research have proven to be fundamental resource for improving the structural and functional annotation of chicken genome. Specifically, we have improved the structural annotation of the chicken genome by adding support to predicted proteins. In addition, we have improved the functional annotation of the chicken genome by assigning useful biological information to proteomics datasets and the whole genome chicken array. The Gene Ontology Annotation Quality (GAQ) and Array GO Mapper (AGOM) tools developed in this study will sustainably continue to facilitate functional modeling of chicken arrays and high-throughput experimental datasets from microarray and proteomics studies. The ultimate positive impact of these results is to facilitate the field of biomedical research with useful information for comparative biology, better understanding of chicken biological systems, diseases, drug discovery and eventually development of therapies.
7

Automated detection of ncRNAs in the draft genome sequence of a colonial tunicate

Velandia-Huerto, Cristian A., Gittenberger, Adriaan A., Brown, Federico D., Stadler, Peter F., Bermúdez-Santana, Clara I. 05 September 2016 (has links) (PDF)
Background: The colonial ascidian Didemnum vexillum, sea carpet squirt, is not only a key marine organism to study morphological ancestral patterns of chordates evolution but it is also of great ecological importance due to its status as a major invasive species. Non-coding RNAs, in particular microRNAs (miRNAs), are important regulatory genes that impact development and environmental adaptation. Beyond miRNAs, not much in known about tunicate ncRNAs. Results: We provide here a comprehensive homology-based annotation of non-coding RNAs in the recently sequenced genome of D. vexillum. To this end we employed a combination of several computational approaches, including blast searches with a wide range of parameters, and secondary structured centered survey with infernal. The resulting candidate set was curated extensively to produce a high-quality ncRNA annotation of the first draft of the D. vexillum genome. It comprises 57 miRNA families, 4 families of ribosomal RNAs, 22 isoacceptor classes of tRNAs (of which more than 72% of loci are pseudogenes), 13 snRNAs, 12 snoRNAs, and 1 other RNA family. Additionally, 21 families of mitochondrial tRNAs and 2 of mitochondrial ribosomal RNAs and 1 long non-coding RNA. Conclusions: The comprehensive annotation of the D. vexillum non-coding RNAs provides a starting point towards a better understanding of the restructuring of the small RNA system in ascidians. Furthermore it provides a valuable research for efforts to establish detailed non-coding RNA annotations for other recently published and recently sequences in tunicate genomes.
8

Square: uma plataforma gráfica e intuitiva para anotação de genomas bacterianos / Square: a graphical and intuitive platform for annotation of bacterial genomes

Eslabão, Marcus Redü 29 February 2016 (has links)
Submitted by Maria Beatriz Vieira (mbeatriz.vieira@gmail.com) on 2017-10-18T11:53:11Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2017-10-23T11:08:52Z (GMT) No. of bitstreams: 2 tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2017-10-23T11:09:03Z (GMT) No. of bitstreams: 2 tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-10-23T11:09:12Z (GMT). No. of bitstreams: 2 tese_marcus_redu_eslabao.pdf: 2744083 bytes, checksum: 5950b0ffa159bbf193a91d88276a5e49 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-02-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / O sequenciamento de DNA é uma técnica que fornece uma fonte vasta de informações sobre diversos organismos. Atualmente, novas metodologias de sequenciamento conhecidas como Next-Generation Sequencing, estão fazendo com que esta técnica fique inúmeras vezes mais rápida, precisa e economicamente acessível, tornando-se popular e disseminada no meio científico. Com a popularização do sequenciamento de genomas, laboratórios que não possuem ênfase em sequenciamento de DNA, utilizam desta abordagem para complementar seus estudos. Porém, a facilidade em obter a sequência do DNA contrasta com a dificuldade em processar, analisar e anotar o genoma, para que então seja possível obter informações biológicas relevantes sobre aquele organismo. Para auxiliar os pesquisadores que se utilizam desta técnica, alguns softwares estão disponíveis, porém, geralmente são pagos, não realizam toda a tarefa ou são de difícil utilização, neste último caso, por serem em sua grande maioria executados através de terminais de comando, que não contam com um ambiente gráfico para guiar os usuários. Com base nesta problemática, o presente trabalho teve por objetivo criar um software de anotação de genomas de fácil utilização e com interface gráfica amigável, gratuito e que anote com as informações necessárias para submissão ao GenBank. Para implementação do software, denominado Square, as linguagens de programação Python e Object Pascal foram utilizadas. Os algoritmos Prodigal, NCBI BLAST e tRNAscan-SE também foram integrados no software. Ao final da etapa de desenvolvimento, o Square foi testado com três genomas e comparado com dois anotadores populares: o RAST e o BASys. O resultado mostrou que o Square possui maior precisão que os dois outros anotadores, por se aproximar mais do resultado depositado no NCBI, e mais rápido, por ser executado localmente com rapidez. O Square demonstrou-se uma boa alternativa para usuários que não estão acostumados com o terminal de comando Linux e está disponível no endereço http://sourceforge.net/projects/sqgenome/. / DNA sequencing is a technique that provides a vast source of information on various organisms. Currently, new sequencing methods known as Next-Generation Sequencing, are making this technique many times more rapid, accurate and affordable, making it popular and widespread in the scientific community. With the popularization of genome sequencing, laboratories that do not have an emphasis on DNA sequencing, are using this approach to complement their studies. However, the ease in obtaining a DNA sequence contrasts with the difficulty to process, analyze and annotate the genome, in order to obtain relevant biological information. To assist researchers who use this technique, several programs are available, however, they are generally not free, do not perform all the necessary analysis or are difficult to use, mainly because a considerable number of them make use of command line to be executed, which is not intuitive. The objective of this study was to create a genome annotation software easy to use, with a user friendly interface, free and able to provide all the necessary information for the annotated genome to be submitted to GenBank. For software implementation named Square, Python and Object Pascal programming languages were used. The Prodigal algorithms, NCBI BLAST and tRNAscan-SE were also integrated in the software. At the end of the development stage, Square was tested with three genomes and compared to two popular annotators: RAST and BASYS. The result showed that the Square has higher accuracy than the other two annotator programs, as the results are similar to what is deposited in NCBI, and produce the result in a shorter time, as it runs locally. The Square proved to be a good alternative for users not familiar with the Linux command terminal and is available in http://sourceforge.net/projects/sqgenome/ address.
9

ChlamyCyc : an integrative systems biology database and web-portal for Chlamydomonas reinhardtii

May, Patrick, Christian, Jan-Ole, Kempa, Stefan, Walther, Dirk January 2009 (has links)
Background: The unicellular green alga Chlamydomonas reinhardtii is an important eukaryotic model organism for the study of photosynthesis and plant growth. In the era of modern highthroughput technologies there is an imperative need to integrate large-scale data sets from highthroughput experimental techniques using computational methods and database resources to provide comprehensive information about the molecular and cellular organization of a single organism. Results: In the framework of the German Systems Biology initiative GoFORSYS, a pathway database and web-portal for Chlamydomonas (ChlamyCyc) was established, which currently features about 250 metabolic pathways with associated genes, enzymes, and compound information. ChlamyCyc was assembled using an integrative approach combining the recently published genome sequence, bioinformatics methods, and experimental data from metabolomics and proteomics experiments. We analyzed and integrated a combination of primary and secondary database resources, such as existing genome annotations from JGI, EST collections, orthology information, and MapMan classification. Conclusion: ChlamyCyc provides a curated and integrated systems biology repository that will enable and assist in systematic studies of fundamental cellular processes in Chlamydomonas. The ChlamyCyc database and web-portal is freely available under http://chlamycyc.mpimp-golm.mpg.de.
10

Molecular Morphology

Donath, Alexander 22 July 2011 (has links) (PDF)
A fundamental problem in biology is the reconstruction of the relatedness of all (extant) species. Traditionally, systematists employ visually recognizable characters of organisms for classification and evolutionary analysis. Recent developments in molecular and computational biology, however, lead to a whole different perspective on how to address the problem of inferring relatedness. The discovery of molecules, carrying genetic information, and the comparison of their primary structure has, in a rather short period of time, revolutionized our understanding of the phylogenetic relationship of many organisms. These novel approaches, however, turned out to bear similar problems as previous techniques. Moreover, they created new ones. Hence, taxonomists came to realize that even with this new type of data not all problematic relationships could be unambiguously resolved. The search for complementary approaches has led to the utilization of rare genomic changes and other characters which are largely independent from the primary structure of the underlying sequence(s). These “higher order” characters are thought to be evolutionary conserved in certain lineages and largely unaffected by primary sequence data-based problems, allowing for a better resolution of the Tree of Life. The central aim of this thesis is the utilization of molecular characters of higher order in connection with their consistent and comparable extraction from a given data set. Two novel methods are presented that allow such an inference. This is complemented with the search for and analysis of known and novel molecular characteristics to study the relationships among Metazoa, both intra- as well as interspecific. The first method tackles a common problem in phylogenetic analyses: the inference of reliable data set. As part of this thesis a pipeline was created for the automated annotation of metazoan mitochondrial genomes. Data thus obtained constitutes a reliable and standardized starting point for all downstream analyses, e.g. genome rearrangement studies. The second method utilizes a subclass of gaps, namely those which define an approximate split of a given data set. The definition and inference of such split-inducing indels (splids) is based on two basic principles. First, indels at the same position, i.e. sharing the same end points in two sequences, are likely homologous. Second, independent single-residue insertions and deletions tend to occur more frequently than multi-residue indels. It is shown that trees based on splids recover most of the undisputed monophyletic groups while influence of the underlying alignment algorithm is relatively small. Mitochondrial markers are a valuable tool for the understanding of small and large scale population structure. The non-coding control region of mitochondrial DNA (mtDNA) often contains a higher amount of variability compared to genes encoding proteins and non-coding RNAs. A case study on a small scale population structure investigates the control region of the European Fire-bellied Toad in order to find highly variable parts which are of potential importance to develop informative genetic markers. A particular focus is placed on the investigation of the evolutionary dynamics of the repetitive region at an inter- and intraspecific level. This includes understanding mechanisms underlying its evolution, i.e. by exploring the impact of secondary structure on slipped strand mispairing during mtDNA replication. The 7SK RNA is a key player in the regulation of polymerase II (Pol-II) transcription, interacting with at least three known proteins: It mediates the inhibition of the Positive Transcription Elongation Factor b (P-TEFb) by the HEXIM1/2 proteins, thereby repressing transcript elongation by Pol-II. A highly specific interaction with LARP7 (La-Related Protein 7), on the other hand, regulates its stability. 7SK RNA is capped at its 5’ end by a highly specific methyltransferase MePCE (Methylphosphate Capping Enzyme). Employing sequence and structure similarity it is shown that the 7SK RNA as well as its protein binding partners have a much earlier evolutionary origin than previously expected. Furthermore, this study presents a good illustration of the pitfalls of using markers of higher order for phylogenetic inference.

Page generated in 0.1066 seconds