Global ETD Search

21	Techniques for construction of phylogenetic trees / TÃcnicas para construÃÃo de Ãrvores filogenÃticas Gerardo ValdÃso Rodrigues Viana 27 April 2007 (has links) FundaÃÃo Cearense de Apoio ao Desenvolvimento Cientifico e TecnolÃgico / Phylogenetic tree structures express similarities, ancestrality, and relationships between species or group of species, and are also known as evolutionary trees or phylogenies. Phylogenetic trees have leaves that represent species (taxons), and internal nodes that correspond to hypothetical ancestors of the species. In this thesis we rst present elements necessary to the comprehension of phylogenetic trees systematics, then efcient algorithms to build them will be described. Molecular biology concepts, life evolution, and biological classication are important to the understanding of phylogenies. Phylogenetic information may provide important knowledge to biological research work, such as, organ transplantation from animals, and drug toxicologic tests performed in other species as a precise prediction to its application in human beings. To solve a phylogeny problem implies that a phylogenetic tree must be built from known data about a group of species, according to an optimization criterion. The approach to this problem involves two main steps: the rst refers to the discovery of perfect phylogenies, in the second step, information extracted from perfect phylogenies are used to infer more general ones. The techniques that are used in the second step take advantage of evolutionary hypothesis. The problem becomes NP-hard for a number of interesting hypothesis, what justify the use of inference methods based on heuristics, metaheuristics, and approximative algorithms. The description of an innovative technique based on local search with multiple start over a diversied neighborhood summarizes our contribution to solve the problem. Moreover, we used parallel programming in order to speed up the intensication stage of the search for the optimal solution. More precisely, we developed an efcient algorithm to obtain approximate solutions for a phylogeny problem which infers an optimal phylogenetic tree from characteristics matrices of various species. The designed data structures and the binary data manipulation in some routines accelerate simulation and illustration of the experimentation tests. Well known instances have been used to compare the proposed algorithm results with those previously published. We hope that this work may arise researchers' interest to the topic and contribute to the Bioinformatics area. / Ãrvores filogenÃticas sÃo estruturas que expressam a similaridade, ancestralidade e relacionamentos entre as espÃcies ou grupo de espÃcies. Conhecidas como Ãrvores evolucionÃrias ou simplesmente filogenias, as Ãrvores filogenÃticas possuem folhas que representam as espÃcies (tÃxons) e nÃs internos que correspondem aos seus ancestrais hipotÃticos. Neste trabalho, alÃm das informaÃÃes necessÃrias para o entendimento de toda a sistemÃtica filogenÃtica, sÃo apresentadas tÃcnicas algorÃtmicas para construÃÃo destas Ãrvores. Os conceitos bÃsicos de biologia molecular, evoluÃÃo da vida e classificaÃÃo biolÃgica, aqui descritos, permitem compreender o que Ã uma Filogenia e qual sua importÃncia para a Biologia. As informaÃÃes filogenÃticas fornecem,por exemplo, subsÃdios importantes para decisÃes relativas aos transplantes de ÃrgÃos ou tecidos de outras espÃcies para o homem e para que testes de reaÃÃo imunolÃgica ou de toxicidade sejam feitos antes em outros sistemas biolÃgicos similares ao ser humano. Resolver um Problema de Filogenia corresponde Ã construÃÃo de uma Ãrvore filogenÃtica a partir de dados conhecidos sobre as espÃcies em estudo, obedecendo a algum critÃrio de otimizaÃÃo. A abordagem dada a esse problema envolve duas etapas, a primeira, referente aos casos em que as filogenias sÃo perfeitas cujos procedimentos desenvolvidos serÃo utilizados na segunda etapa, quando deve ser criada uma tÃcnica de inferÃncia para a filogenia num caso geral. Essas tÃcnicas consideram de forma peculiar as hipÃteses sobre o processo de evoluÃÃo. Para muitas hipÃteses de interesse o problema se torna NP-DifÃcil, justificando-se o uso de mÃtodos de inferÃncia atravÃs de heurÃsticas, meta-heurÃsticas e algoritmos aproximativos. Nossa contribuiÃÃo neste trabalho consiste em apresentar uma tÃcnica de resoluÃÃo desse problema baseada em buscas locais com partidas mÃltiplas em vizinhanÃas diversificadas. Foi utilizada a programaÃÃo paralela para minimizar o tempo de execuÃÃo no processo de intensificaÃÃo da busca pela soluÃÃo Ãtima do problema. Desta forma, desenvolvemos um algoritmo para obter soluÃÃes aproximadas para um Problema da Filogenia, no caso, para inferir, a partir de matrizes de caracterÃsticas de vÃrias espÃcies, uma Ãrvore filogenÃtica que mais se aproxima da histÃria de sua evoluÃÃo. Uma estrutura de dados escolhida adequadamente aliada Ã manipulaÃÃo de dados em binÃrio em algumas rotinas facilitaram a simulaÃÃo e ilustraÃÃo dos testes realizados. InstÃncias com resultados conhecidos na literatura foram utilizadas para comprovar a performance do algoritmo. Esperamos com este trabalho despertar o interesse dos pesquisadores da Ãrea de ComputaÃÃo, consolidando, assim, o crescimento da BioinformÃtica. CIENCIA DA COMPUTACAO
22	Combining Prior Information for the Prediction of Transcription Factor Binding Sites Benner, Philipp 21 June 2018 (has links) Despite the fact that each cell in an organism has the same genetic information, it is possible that cells fundamentally differ in their function. The molecular basis for the functional diversity of cells is governed by biochemical processes that regulate the expression of genes. Key to this regulatory process are proteins called transcription factors that recognize and bind specific DNA sequences of a few nucleotides. Here we tackle the problem of identifying the binding sites of a given transcription factor. The prediction of binding preferences from the structure of a transcription factor is still an unsolved problem. For that reason, binding sites are commonly identified by searching for overrepresented sites in a given collection of nucleotide sequences. Such sequences might be known regulatory regions of genes that are assumed to be coregulated, or they are obtained from so-called ChIP-seq experiments that identify approximately the sites that were bound by a given transcription factor. In both cases, the observed nucleotide sequences are much longer than the actual binding sites and computational tools are required to uncover the actual binding preferences of a factor. Aggravated by the fact that transcription factors recognize not only a single nucleotide sequence, the search for overrepresented patterns in a given collection of sequences has proven to be a challenging problem. Most computational methods merely relied on the given set of sequences, but additional information is required in order to make reliable predictions. Here, this information is obtained by looking at the evolution of nucleotide sequences. For that reason, each nucleotide sequence in the observed data is augmented by its orthologs, i.e. sequences from related species where the same transcription factor is present. By constructing multiple sequence alignments of the orthologous sequences it is possible to identify functional regions that are under selective pressure and therefore appear more conserved than others. The processing of the additional information exerted by ortholog sequences relies on a phylogenetic tree equipped with a nucleotide substitution model that not only carries information about the ancestry, but also about the expected similarity of functional sites. As a result, a Bayesian method for the identification of transcription factor binding sites is presented. The method relies on a phylogenetic tree that agrees with the assumptions of the nucleotide substitution process. Therefore, the problem of estimating phylogenetic trees is discussed first. The computation of point estimates relies on recent developments in Hadamard spaces. Second, the statistical model is presented that captures the enrichment and conservation of binding sites and other functional regions in the observed data. The performance of the method is evaluated on ChIP-seq data of transcription factors, where the binding preferences have been estimated in previous studies. info:eu-repo/classification/ddc/500 ddc:500
23	Characterisation of new full-length HIV-1 subtype D viruses from South Africa Loxton, Andre Gareth, Janse van Rensburg, E., Engelbrecht, S. 12 1900 (has links) Thesis (MSc (Medical Virology )--University of Stellenbosch, 2004. / 150 leaves printed on single pages, preliminary pages i-vii and numberd pages 1-143. Includes bibliography and figures digitized at 300 dpi grayscale and 300 dpi 24-bit Color to pdf format (OCR), using a Hp Scanjet 8250 Scanner and digitized at 600 dpi grayscale to pdf format (OCR), using a Bizhub 250 Konica Minolta Scanner. / ENGLISH ABSTRACT: The first episode of HIV-1 in South Africa was documented in 1982. Homosexual transmission of the virus was the predominate mode of transmission in an epidemic of mainly HIV-1 subtype Band D infections. To date, no full-length sequences of Subtype D strains from South Africa has been reported. Here we describe the characterization and some of the unique features of the Tygerberg HIV-1 subtype D strains. A near full-length 9 kb fragment was obtained through a one step PCR using high molecular weight DNA. Cloning was done successfully with the pCR-XLTapa cloning kit. Large quantities of plasmid DNA was grown and sequenced on both strands of the DNA. ORF determination and subtyping was followed by standard phylogenetic methods to construct evolutionary phylogenetic trees. Subtyping and similarity plots revealed that the sequences from Tygerberg are pure subtype D. All the Tygerberg strains had intact genes with no premature stop codons. At the tip of the V3 loop, the Tygerberg strains have the GOGO motif. R214 has a more variable vpu gene than the rest of the Tygerberg strains, but is still subtype D in this region. No premature stop codons have been observed in the tat gene and the glycosilation of the strains are less than the subtype D consensus. We are the first to report full-length sequences of HIV-1 subtype D strains from South Africa. The sequences represent non-mosaic genomes of subtype D. Our results confirm that the subtype D sequences from the beginning of the HIV-1 epidemic differ from the Subtype D sequences from recent isolates. / AFRIKAANSE OPSOMMING: Die eerste episode van HIV-1 infeksie in Suid Afrika is in 1982 gedokumenteer. Die epidemie het hoofsaaklik uit subtipe B en D bestaan en was deur homoseksuele kontak oorgedra. Geen vollengte subtipe D DNS volgordes van Suid-Afrika is tans beskryf nie. Hier beskryf ons die karakterisering van vollengte subtipe D stamme asook sommige van die unieke eienskappe van die virusse. Die vollengte 9 kb genoom volgorde was verkry deur 'n eenstap PKR reaksie met hoë molekulêre gewig DNS uit te voer. Die 9 kb fragment was suksesvol gekloneer met behulp van die peR-Xl-TOPO klonerings toetsstel. Groot hoeveelhede plasmied DNS was opgegroei en die nukleotied volgorde bepaal op beide stringe van die genoom. Die stamme was gesubtipeer en filogenetiese analise was uitgevoer met standaard metodes. Die volledige DNS volgordes was bepaal en subtipering het daarop gedui dat die stamme van Tygerberg suiwer subtipe D is. Geen premature stop kodons is in die nukleotied volgordes van die Tygerberg stamme gevind nie. By die draai van die varieerbare deel (V3) het al die Tygerberg stamme die GQGQ motief gehad. R214 het 'n meer varieerbare vpu geen, maar behoort steeds tot die subtipe D groep in die gedeelte. Daar was geen premature stop kodons in die tat geen gevind nie en die glikosilasie van die stamme is minder as die van die konsensus subtipe D stam. Ons is die eerste groep om vollengte subtipe D stamme van Suid-Afrika te karakteriseer. Die DNS volgordes verteenwoordig suiwer subtipe D genome. Ons resultate bevestig die van ander dat die nukleotied volgordes van die ouer subtipe D stamme verskil van die nuwer stamme. HIV-1 Homosexual transmission Virusses Epidemics DNA Cloning Glycosilation Evolutionary phylogenetic trees Tygerberg HIV-1 subtype D strains DNA viruses -- South Africa
24	Análise de Similaridade de Sequências Genômicas Fonseca, ítallo Costa 28 August 2013 (has links) Made available in DSpace on 2015-05-14T12:14:09Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3134384 bytes, checksum: 253c3fb1aaec508b89c44bcd7766a50c (MD5) Previous issue date: 2013-08-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / In this thesis, we investigate aspects of similarity between sequences of complete mitochondrial DNA. This line of study falls within the framework from the analysis of statistical properties of DNA sequences based on methods that seek to understand the information contained in these sequences a topic of renewed interest in the context of the so called Complex Systems. Previous approaches were used to obtain the frequencies of certain segments of nucleotides, regarded as the words of a given size, contained in sequences. These methods, inspired by studies devoted to the statistical properties of words distribution in linguistic and symbolical sequences, can be considered an alternative to techniques and algorithms for aligning sequences, and have been successful in the description of characteristics that allow to infer similarity and possible species grouping criteria, it means, biological afnity between DNA sequences. Previously, this methodology has been applied to evaluate the diferences between coding and nocoding DNA sequences and to extract linguistic aspects of these sequences by detecting keywords that describe relevant information embedded in the threads. In this dissertation, these studies are expanded in order to directly compare the contents of pairs of complete sequences of mitochondrial DNA, setting parameters that depend on the frequency distribution of sequences of words which highlight both the relevance of certain words as well as the possibility of grouping species estimating the distance between these words. Our results show that the best clusters between diferent species are obtained when we calculate the rate of agglomeration considering only frequencies of words. We have also observed that the larger the word size is, its greater clustering between sequences. The prospect of applying our results to analyze DNA sequences also belong to a single biological species, may be relevant in the construction of phylogenetic trees that are appropriate structures for understanding the evolutionary history of organisms. / Nesta dissertação, investigamos aspectos da similaridade entre sequências completas de DNA mitocondriais. Esta linha de estudo se insere no âmbito da análise de propriedades estatísticas de sequências de DNA baseadas em métodos que buscam entender a informação contida nessas sequências, tema de renovado interesse no contexto dos chamados Sistemas Complexos. Abordagens anteriores foram utilizadas para obtenção das frequências de determinados segmentos de nucleotídeos, considerados como palavras de um dado tamanho, contidos nas sequências. Tais métodos, inspirados em estudos dedicados às propriedades estatísticas de distribuição de palavras em textos linguísticos e sequências simbólicas, podem ser considerados uma alternativa às técnicas e algoritmos de alinhamento de sequências, e têm sido bem sucedidos na descrição de características que permitem inferir similaridade e possíveis critérios de agrupamentos de espécies, ou seja, afinidade biológica entre sequências de DNA. Anteriormente, esta metodologia foi aplicada para avaliar as diferenças entre sequências de DNA codificadas e não codificadas e para extrair aspectos linguísticos dessas sequências através da detecção de palavras-chaves que descrevem informações relevantes embutidas nas sequências. Nesta dissertação, ampliamos tais estudos, no sentido de comparar diretamente o conteúdo de pares de sequências completas de DNA mitocondriais, definindo parâmetros que dependem da distribuição de frequências de palavras das sequências que ressaltam tanto a relevância de determinadas palavras, bem como a possibilidade de agrupamentos de espécies estimando a distância entre essas sequências. Nossos resultados mostram que os melhores agrupamentos entre espécies distintas são obtidos quando calculamos a taxa de aglomeração levando em conta apenas as frequências das palavras. Notamos, também, que quanto maior o tamanho da palavra mais consistente é o agrupamento entre as sequências. A perspectiva de aplicação de nossos resultados, para analisar também sequências de DNA pertencentes a uma única espécie biológica, pode ser relevante na construção de árvores filogenéticas que são estruturas adequadas para se compreender a história evolucionária dos organismos. DNA mitocondrial Frequências de palavras de DNA Similaridade Árvores Filogenéticas Sistemas Complexos Mitochondrial DNA DNA frequencies of words Similarity Phylogenetic trees Complex systems CIENCIAS EXATAS E DA TERRA::FISICA
25	Analyse des Vorkommens, der Morphologie und der genetischen Diversität von Biologischen Bodenkrusten extrazonaler Gebirgssteppenstandorte der nördlichen Mongolei / Analysis of appearance, morphology and genetic diversity of Biological Soil Crusts from extrazonal Mountain Dry Steppes in Northern Mongolia Kemmling, Anne 27 October 2010 (has links) No description available. 500 Naturwissenschaften Biology (incl. Psychology) Biologische Bodenkrusten Mongolei Kryptogamendiversität 16S-rDNA Standortgenbanken Stammbäume Biological Soil Crusts Mongolia Cryptogamic Diversity Bacterial Diversity 16S-rDNA Phylogenetic Trees 42.91

Page generated in 0.0875 seconds