Global ETD Search

301	Desenvolvimento de um algoritmo para identificação e caracterização de cavidades em regiões específicas de estruturas tridimensionais de proteínas / Development of an algorithm to identify and characterize cavities in specific regions of three-dimensional structures of proteins. Saulo Henrique Pires de Oliveira 25 May 2011 (has links) A identificação e caracterização geométrica e físico-química de espaços vazios na estrutura tridimensional de proteínas é capaz de agregar informações importantes para guiar o desenho racional de drogas e a caracterização funcional de sítios de ligação e sítios catalíticos. Dessa forma, algumas ferramentas computacionais foram desenvolvidas nas últimas duas décadas, visando efetuar essas caracterizações. Contudo, as ferramentas existentes lidam com uma série de limitações, dais quais merecem destaque a falta de precisão, falta de capacidade de integração em protocolos de larga escala, falta de capacidade de customização e a falta de uma caracterização eletrostática . Tendo em mente estas limitações, desenvolvemos uma nova ferramenta, denominada KV-Finder, com o objetivo de estender as funcionalidades dos programas existentes, fornecendo assim uma caracterização sistemática mais eficiente e mais informativa dos espaços vazios da estrutura tridimensional de proteínas. Através de uma modelagem matricial baseada em um direcionamento realizado pelo usuário, nossa ferramenta identifica e caracteriza espaços vazios em topologias proteicas. O utilitário é capaz de quantificar o volume, a forma, a extensão de sua superfície, os resíduos proteicos que interagem com os espaços vazios e um mapa de cargas parciais da superfície encontrada. Nossa rotina foi integrada com ferramentas gráficas de modelagem molecular, fornecendo uma interação fácil e eficiente com o usuário. A validação de nosso algoritmo foi realizada em um conjunto de proteínas cujos diversos tipos de espaços vazios englobam os mais variados sítios de ligação e sítios catalíticos. O cálculo do volume de cavidades enzimáticas foi efetuado em larga escala, acompanhando a evolução do tamanho de bolsões na superfamília ALDH. Com relação aos outros softwares existentes, nossa ferramenta apresenta uma série de vantagens das quais merecem destaque menor tempo de execução, maior precisão, maior acessibilidade e facilidade de integração com outros programas, além das características únicas de permitir que a busca ocorra em regiões específicas dentro da proteína e de realizar um mapeamento parcial de cargas da superfície encontrada. / The identification and characterization of geometrical and physical-chemical properties in protein vacant spaces aggregates important information for steering rational drug designing and functional characterization of binding and catalytic sites. Therefore, several softwares have been develop during the past two decades in order to perform such characterization. Nevertheless, the existing tools still present a series of limitations such as lack of precision, lack of integrability in large scale protocols, lack of customization capacity and the lack of a proper electrostatic depiction. We developed a new software, dubbed KV-Finder, in order to complement and extend the functionality of existing softwares, providing a systematic and more descriptive portrayal of protein vacant spaces. By employing a user-driven matrix modeling, our tool identifies and characterizes empty spaces in all sorts of protein topologies. The software quantifies the volume, the area and the shape of the surface, the residues that interact with the vacant spaces and a partial charge map of the computed surface. Our routine was integrated with a graphical molecular modeling software, providing the user with a simple and easy-to-use interface. KV-Finder has been validated with a distinct set of proteins and binding sites. The volume computation was carried in large scale, accompanying the evolution of the pocket volume in the ALDH superfamily. Compared with existing software, KV-Finder presents greater precision, greater accessibility and ease of integration in large scale protocols and visualization softwares. Also, the software possesses unique and innovative features such as the ability to segment and subsegment the empty spaces, a electrostatic depiction and a ligand interaction highlight feature. bioinformática biologia computacional biologia estrutural Cavidades proteicas Bioinformatics Computational Biology Protein Cavities Structural Biology
302	Construção de filogenias baseadas em genomas completos / Phylogenies construction based on whole genomes Oliveira, Karina Zupo de 03 May 2010 (has links) Orientador: João Meidanis / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-16T11:01:22Z (GMT). No. of bitstreams: 1 Oliveira_KarinaZupode_M.pdf: 15064313 bytes, checksum: a46cd0b3c6eebcfc48b81920aa2232db (MD5) Previous issue date: 2010 / Resumo: Contexto: A classificação de espécies começou sendo determinada pelas características fenotípicas dos organismos. Logo que o DNA foi descoberto, o sistema de classificação passou também a utilizar-se das características genotípicas. Ao longo dos últimos anos, avanços científicos permitiram que fossem sequenciados genomas completos. A cada ano, o número de genomas completamente sequenciados aumenta, e, com isso, é cada vez maior o número de trabalhos que tentam utilizar-se do maior número possível de genes para comparar dois ou mais organismos com o objetivo de melhor entender o relacionamento entre as diversas espécies. Experimento: Este trabalho executa comparações de pares de cromossomos de um grupo de 10 genomas completos da família Vibrionaceae e um genoma completo da bactéria Escherichia coli como externo ao grupo. As homologias entre as proteínas são determinadas através da base de famílias Protein Clusters (NCBI). A seguir, arvores ultramétricas e a classificação COG das proteínas são utilizadas para resolver as paralogias correspondentes. Após isto, as proteínas únicas, que representam os eventos de perda e ganho de genes, são eliminadas, de forma a igualar o conteúdo dos cromossomos. Tipicamente, 50% das proteínas originais do pares de organismos de mesma família 'sobrevivem" para serem utilizadas no cálculo da distância de rearranjo. Menos proteínas sobrevivem nas comparações com a bactéria externa ao grupo. A distância total é calculada pela soma do número de proteínas eliminadas e da distância de ordenação, medida através da distância de rearranjo dos cromossomos. Resultados: As comparações produziram matrizes de distâncias utilizadas para inferir árvores filogenéticas através do algoritmo Neighbor-Joining (NJ). As árvores filogenéticas encontradas mostraram-se congruentes em topologia com a árvore produzida pelo gene 16S rRNA. Isto mostra que a comparação de genomas completos é uma proposta sensata. Os desafios agora são aperfeiçoar os detalhes. O material suplementar (Apêndice A) contém uma implementação computacional dos experimentos / Abstract: Context: Species classification was originally determined by phenotypic characteristics. With the advent of DNA sequencing, the classification system started using genotypes as well. Over the last decades, scientific progress allowed complete sequencing of genomes. Each year, the number of genomes completely sequenced increases, and with it, the number of works trying to use as much genes as possible to compare two or more organisms, in order to get a better understand of the relationship between several species. Experiment: This work executes a pairwise chromosome comparison from a set of 10 complete genomes from the Vibrionaceae family and one complete Escherichia coli genome as an outgroup. In our experiment, the homologies between proteins are assessed using the Protein Clusters (NCBI) database. In the next step, paralogies are resolved using ultrametric trees and COG classification. In the sequel, the loss and gain events are treated, thus, proteins present in only one chromosome from the pair are eliminated, in order to equalize the set of families in both chromosomes. Typically, 50% of the original proteins survive in comparisons between organisms of the same family (comparisons with the outgroup yield less survivors). The total distance is calculated by adding the number of eliminated proteins with the order distance, which is measured by the rearrangement distance beetween the chromosomes. Results: Genome comparison produces distance matrices used to infer the phylogenetic trees through the Neighbor-Joining (NJ) algorithm. The phylogenetic trees generated are congruent regarding the topology with the tree inferred using the 16S rRNA gene. Also, in order to run a deeper investigation, the experiment was executed with some variations such as not resolving the paralogies using ultrametric trees or only classifying proteins using COG database. Supplemental material (Appendix A) contains the experiment computational implementation / Mestrado / Biologia Computaçional / Mestre em Ciência da Computação Biologia computacional Filogenia - Processamento de dados Genomas Homologia (Biologia) Vibrião Computational biology Phylogeny - Data processing Genomes Vibrio
303	Enumeração de traces e identificação de breakpoints = estudo de aspectos da evolução / Enumeration of traces and breakpoint identification : study of evolutionary aspects Baudet, Christian 17 August 2018 (has links) Orientador: Zanoni Dias / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-17T08:10:32Z (GMT). No. of bitstreams: 1 Baudet_Christian_D.pdf: 3490604 bytes, checksum: 7f0a8868574d06e11524e5a5de9d1fd0 (MD5) Previous issue date: 2010 / Resumo: O estudo de rearranjo de genomas tem o objetivo de auxiliar o entendimento da evolução. Através da análise dos eventos de mutação como inversões, transposições, fissões, fusões, entre outros, buscamos compreender as suas influências sobre o fenômeno da diferenciação das espécies. Dentro deste contexto, esta tese ataca dois temas distintos: a Enumeração de Traces e a Identificação de Breakpoints. Os algoritmos de ordenação de permutações por reversões orientadas produzem uma única solução ótima enquanto o conjunto de soluções é imenso. A enumeração de traces de soluções para este problema oferece um modo mais compacto de representar o conjunto completo de soluções ótimas. Dessa maneira, esta técnica fornece aos biólogos a possibilidade de análise de diversos cenários evolutivos. Neste trabalho, realizamos um estudo para melhora da eficiência do algoritmo de enumeração através da adoção de uma estrutura de dados mais simples. Devido ao caráter exponencial do problema, grandes permutações não podem ser processadas em um tempo satisfatório. Assim, com o objetivo de produzir cenários evolucionários alternativos para grandes permutações, propomos e avaliamos estratégias para a enumeração parcial de traces. Os pontos de quebra (ou breakpoints) são regiões que delimitam os segmentos conservados existentes nos cromossomos e denotam a ocorrência de rearranjos evolutivos. As técnicas de identificação de breakpoints têm a função de identificar tais pontos nas sequências dos cromossomos. Nesta tese, implementamos um método de detecção e refinamento de pontos de quebra proposto na literatura e o disponibilizamos como um pacote que pode ser utilizado por outros pesquisadores. Além disso, introduzimos uma nova metodologia de identificação de breakpoints baseada na análise da cobertura de hits observada nos alinhamentos de sequências intergênicas, provenientes dos genomas das espécies comparadas / Abstract: The study of genome rearrangements helps biologists understand the evolution of species. The species differentiation phenomenon are derived by analyzing mutational events (inversions, transpositions, fissions, fusions, etc) and their effects. In this context, this work aims the study of two different subjects: Traces Enumeration and Breakpoint Identification. Algorithms that solve the problem of sorting oriented permutations through reversals output only one optimal solution, although the set of solutions can be huge. The enumeration of traces of solutions for this problem allows a compact representation of the set of all optimal solutions which sort a permutation. By using this technique, biologists can study many evolutionary scenarios. We carried out a study to improve the efficiency of the enumeration algorithm by adopting a simple data structure. Due to the exponential nature of the problem, large permutations cannot be processed at a satisfactory time. Thus, in order to produce alternative evolutionary scenarios for large permutations, we proposed and evaluated strategies for partial enumeration of traces. Breakpoints are regions that border conserved segments in the chromosomes and reflect the occurrence of evolutionary rearrangements. The techniques for breakpoint identification are meant to identify such points in the chromosome sequences. In this work, we implemented a method proposed in the literature, that performs detection and refinement of breakpoints. The implementation is available as a package to other researchers. Additionally, we introduced a new methodology for breakpoint identification based on the analysis of the hit coverage observed in the alignments of intergenic sequences / Doutorado / Ciência da Computação / Doutor em Ciência da Computação Evolução molecular Genomas Bioinformática Biologia computacional Molecular evolution Genomes Bioinformatics Computational biology
304	Analise algebrica de problemas de rearranjo em genomas : algoritmos e complexidade / Algebraic analysis of genome rearrangement problems : algorithms and complexity Gomes Mira, Cleber Valgas 19 October 2007 (has links) Orientador: João Meidanis / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-11T00:56:27Z (GMT). No. of bitstreams: 1 GomesMira_CleberValgas_D.pdf: 1443128 bytes, checksum: adcf8d553b49f20bbad0fc0f56cc2aba (MD5) Previous issue date: 2007 / Resumo: O sucesso na obtenção de cadeias completas de DNA de alguns organismos tem incentivado a busca de novas técnicas computacionais capazes de analisar esse montante de informação para aplicá-lo na descoberta de novos remédios, aumento da produção de alimentos e investigação do processo de evolução dos seres vivos, entre outras aplicações. A comparação de seqüências de DNA (ou RNA) de diferentes espécies é uma das técnicas importantes para desvendar novas propriedades biológicas. Uma das maneiras de se comparar dois genomas é analisar como os dois se distinguem com base em certas mutações chamadas eventos de rearranjo em genomas. Nessa técnica de comparação, um genoma é modelado como uma seqüência de regiões que são conservadas em um conjunto de genomas. O problema de rearranjos em genomas consiste genericamente em encontrar, dados dois genomas como entrada e um conjunto de tipos de eventos de rearranjo permitidos, uma seqüência mínima de tais eventos de rearranjo que transforme um genoma em outro. No formalismo clássico de rearranjos em genomas, um genoma tem sido modelado como um conjunto de seqüências de inteiros. Cada número inteiro representa um gene e o seu sinal representa a orientação do gene no genoma. O problema de rearranjos em genomas nesse modelo é analisado de forma geral por meio de diversos diagramas e grafos que representam certas propriedades do par de genomas na entrada do problema. Neste trabalho, usamos um novo modelo para rearranjos em genomas proposto por Meidanis e Dias [39]: o formalismo algébrico. Em vez de se basear na análise de diagramas, o formalismo algébrico usa permutações na modelagem de genomas e, principalmente, utiliza resultados de grupos de permutações para analisar as propriedades de genomas e os efeitos de eventos de rearranjo. A motivação para o desenvolvimento do formalismo algébrico é a possibilidade de formalização de argumentos sobre rearranjos por meio de métodos algébricos, em vez da utilização de recursos gráficos como é feito no formalismo clássico. Esperamos que, por meio do desenvolvimento de um novo formalismo para o tratamento de problemas de rearranjos em genomas, algoritmos mais eficientes para a resolução desses problemas, ou maneiras mais simples de demonstrar alguns dos resultados clássicos na área sejam encontrados com maior facilidade. Nesse trabalho, apresentamos duas soluções simples e eficientes derivadas diretamente do formalismo algébrico para dois problemas de rearranjos em genomas (o problema de rearranjos em genomas por intercâmbio de blocos e reversões com sinais e o problema de rearranjo em genomas por fissões, fusões e reversões com sinais). Também discutimos e propomos um algoritmo polinomial para o problema de rearranjos em genomas por transposições generalizadas. Acreditamos que o sucesso na solução desses problemas possa ser estendido para outros problemas de rearranjos em genomas com a consolidação dos conceitos fundamentais do formalismo algébrico. Esperamos com essa tese convencer o leitor de que o formalismo algébrico é um modelo representativo e poderoso para tratar genomas compostos por cromossomos circulares e ao lidar com a atribuição de pesos a eventos de rearranjo. Por outro lado, não defendemos que o formalismo clássico seja simplesmente substituído pelo formalismo algébrico. Ambos os formalismos podem ser beneficiados por um processo semelhante, porém em menor escala, ao sucesso do desenvolvimento da Geometria Analítica e da Geometria Tradicional / Abstract: The success in obtaining complete sequences of DNA of some species has encouraged the search for new computational techniques for the analysis of such huge amount of information. One hopes that the results of this research could be applied for the development of new medicines, increasing food crops productivity, better understanding of the evolutionary process in live beings, among other applications. One technique for the genome analysis is the comparison of DNA (or RNA) sequences from different species. Such a comparison may reveal the similarities and differences between the genomes, which could be used in phylogeny reconstruction for instance. Two genomes can be compared by the analysis of their differences based on mutational events called genome rearrangements. The genome rearrangement problem (also called a sorting problem) consists of finding a minimum sequence of rearrangement events that transforms one genome into another and the number of rearrangement events in the sequence is called the genomic distance. In the classical formalism for genome rearrangements, a genome is usually modeled by a set of sequences of integers. Each integer represents a gene and its sign stands for the orientation of the gene in the genome. The genome rearrangement problem in this model is analyzed generally with tools such as diagrams and graphs that convey the properties of the genomes in the problem input. We use instead a new model for genome rearrangements proposed by Meidanis and Dias [39]: the algebraic formalism. Instead of being based on the analysis of diagrams, the algebraic formalism uses permutations to model genomes and the results from permutation group theory for the analysis of the properties of genomes and the effects of rearrangement events. The motivation for the development of the algebraic formalism is the possibility of stating arguments more formally by means of algebraic methods than by using graphical resources as the classical formalism does. We hope that more efficient algorithms for genome rearrangement problems or simpler proofs for classical results in the area will be more easily found due to the development of a new formalism. We present a simple, efficient solution based on the algebraic formalism for two genome rearrangement problems (the problem of genome rearrangements by block-interchanges and signed reversals and the problem of genome rearrangements by fissions, fusions, and signed reversals). We also discuss and offer a solution for the problem of genome rearrangements by generalized transpositions. We believe that the success in solving those genome rearrangement problems could be extended to other problems by consolidating the fundamental concepts of the algebraic formalism. We hope that the reader will be convinced that the algebraic formalism is representative and powerful in dealing with circular chromosomes and modeling the assignment of weights to rearrangement events. On the other hand, we do not argue in favor of a substitution of the classical formalism by the algebraic formalism. Both of these formalisms could profit by a similar, even though on a smaller scale, success of the development of the Analytic Geometry and the Traditional Geometry / Doutorado / Doutor em Ciência da Computação Biologia computacional Permutações (Matemática) Ordenação (Computadores) Computational biology Permutations Sorting (Computer science)
305	Uma ferramenta de auditoria para algoritmos de rearranjo de genomas / An audit tool for genome rearrangement algorithms Galvão, Gustavo Rodrigues, 1988- 21 August 2018 (has links) Orientador: Zanoni Dias / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-21T23:02:27Z (GMT). No. of bitstreams: 1 Galvao_GustavoRodrigues_M.pdf: 1280667 bytes, checksum: 0809ad85a3b7f16ff5d7af5fc4124f0a (MD5) Previous issue date: 2012 / Resumo: Ao longo da evolução, mutações globais podem alterar a ordem dos genes de um genoma. Tais mutações são chamadas de eventos de rearranjo. Em Rearranjo de Genomas, estimamos a distância evolutiva entre dois genomas calculando-se a distância de rearranjo entre eles, que é o tamanho da menor sequência de eventos de rearranjo que transforma um genoma no outro. Representando genomas como permutações, nas quais os genes aparecem como elemento, à distância de rearranjo pode ser obtido resolvendo-se o problema combinatório de ordenar uma permutação utilizando o menor número de eventos de rearranjo. Este problema, que é referido como Problema da Ordenação por Rearranjo, varia de acordo com os tipos de eventos de rearranjo considerados. Nesta dissertação, focamos nosso estudo em dois tipos de eventos: reversões e transposições. Variações do Problema da Ordenação por Rearranjo que consideram esses eventos têm se mostrado difíceis de ser resolvida otimamente, por isso a maior parte dos algoritmos propostos - os quais denominamos genericamente por algoritmos de rearranjo de genomas - são aproximados e é esperado que os próximos avanços ocorram nesse sentido. Em razão disso, desenvolvemos uma ferramenta que avalia as respostas desses algoritmos. Para ilustrar sua aplicação, nós a utilizamos para avaliar as respostas de 16 algoritmos de rearranjo de genomas aproximados relativos a 6 variações do Problema da Ordenação por Rearranjo. Além da ferramenta, este trabalho traz outras contribuições. Desenvolvemos um algoritmo exato para calcular distâncias de rearranjo que é mais eficiente em termos de uso de memória do que qualquer outro algoritmo que encontramos na literatura. Apresentamos conjecturas que dizem respeito à forma como as distâncias de rearranjo se distribuem. Validamos conjecturas referentes ao diâmetro, que é o maior valor alcançável pela distância de rearranjo entre uma permutação qualquer e a identidade considerando-se todas as permutações com o mesmo número de elementos. Apresentamos demonstrações formais para o fator de aproximação de alguns dos algoritmos avaliados. Por fim, mostramos que os fatores de aproximação de 7 dos 16 algoritmos avaliados não podem ser melhorados, o que contradiz algumas hipóteses levantadas na literatura, e conjecturamos que os fatores de aproximação de outros 6 algoritmos também não possam / Abstract: During evolution, global mutations may modify the gene order in a genome and such mutations are called rearrangement events. In Genome Rearrangements, we estimate the evolutionary distance between two genomes by computing the rearrangement distance between them, which is the length of the shortest sequence of rearrangement events that transforms one genome into the other. Representing genomes as permutations, in which genes appear as elements, the rearrangement distance can be obtained by solving the combinatorial problem of sorting a permutation using a minimum number of rearrangement events. This problem is referred to as Rearrangement Sorting Problem and varies accordingly to the types of rearrangement events considered. In this dissertation, we focus on two types of rearrangement events: reversals and transpositions. Variants of Rearrangement Sorting Problem involving these events have been shown to be difficult to solve optimally, therefore most of the proposed algorithms - which we denominate generically as genome rearrangement algorithms - are approximations, which have been the expected direction to follow. For this reason, we developed a tool that evaluates the results of these algorithms. To illustrate its application, we used it to evaluate the results of 16 genome rearrangement algorithms regarding 6 variants of Rearrangement Sorting Problem. Besides this tool, we developed an exact algorithm for computing rearrangement distances that is more efficient in terms of memory than any algorithm we have found in literature. Additionally, we presented conjectures on how the rearrangement distance are distributed and validated them regarding their diameter, which is the greatest value that the rearrangement distance between a permutation and the identity can reach considering all permutations with the same number of elements. Moreover, we presented formal proofs on the approximation ratio of some of the evaluated algorithms and showed that the approximation ratio of 7 out of the 16 evaluated algorithms cannot be improved, which contradicts some hypothesis raised in literature. Lastly, we conjectured that the approximation ratio of another 6 algorithms also cannot be improved / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Biologia computacional Algoritmos de aproximação Teoria de computação Computational biology Approximation algorithms Computer theory
306	Experimentos em reconstrução de árvores filogenéticas com a operação de rearranjo de genomas single-cut-or-join / Experiments with phylogenetic tree reconstruction using the genome rearrangement operation Single-Cut-or-Join Biller, Priscila do Nascimento, 1988- 21 August 2018 (has links) Orientador: João Meidanis / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-21T02:04:53Z (GMT). No. of bitstreams: 1 Biller_PrisciladoNascimento_M.pdf: 16893114 bytes, checksum: 4346a38c95f6bb748d840f487e61890b (MD5) Previous issue date: 2012 / Resumo: Os rearranjos são eventos evolutivos que alteram de diferentes formas a ordem de grandes segmentos do genoma. Explicar a história evolutiva de um conjunto de espécies com rearranjos pode ser visto como um problema de otimização computacional, chamado de Problema de Rearranjo de Múltiplos Genomas. Este problema consiste em encontrar uma árvore que relaciona o conjunto de genomas recebido, minimizando a soma dos pesos das arestas, sendo o peso de uma aresta o número de rearranjos que explica a evolução entre os genomas dos vertesses incidentes. A qualidade da inferência e a complexidade do problema dependem do modelo de rearranjo utilizado, que define formalmente como os genomas podem ser modificados. Recentemente, um novo modelo de rearranjo foi proposto, o Single-Cut-or-Join (SCJ), que traz como grande vantagem a simplificação de muitos problemas, que sob outros modelos são NP-difíceis. Apesar da teoria do SCJ ser bem construída, havia dúvidas sobre sua relevância biológica. Neste trabalho contribuímos com o entendimento deste modelo, realizando um extenso estudo que aplica o SCJ sob diferentes condições evolutivas, com dados reais e simulados, analisando dois aspectos da reconstrução evolucionária: a estrutura da árvore e o genoma (ordem dos genes) das espécies ancestrais. Na primeira análise, descobrimos que o SCJ é capaz de recuperar entre 60% e 80% da estrutura da árvore. Em relação à segunda questão, dada a estrutura da árvore, a reconstrução dos genomas ancestrais varia conforme a distância da espécie ancestral para as espécies conhecidas. No caso de espécies ancestrais mais próximas às folhas, cerca de 85% da ordem dos genes foi coberta enquanto, em espécies mais distantes, aproximadamente 50% da ordem dos genes foi coberta, usando conjuntos de genomas de 64 espécies. Em relação ao tempo, os métodos, que implementamos em Java, podem encontrar a topologia de 64 genomas com 2000 genes cada em cerca de 10,7 minutos e reconstruir seus genomas ancestrais em 0,05 minutos, ambos em um computador desktop padrão / Abstract: Rearrangements are evolutionary events that modify in different ways the order of large segments in genomes. To explain the evolutionary history of a set of species with rearrangements can be seen as an computational optimization problem, called Multiple Genome Rearrangement Problem. This problem consists in finding a tree which relates the set of genomes received, minimizing the sum of edge weights, where the weight of an edge is the number of rearrangements that explains the evolution between the genomes of incident vertices. The quality of the inference and complexity of the problem depend on the rearrangement model used, which formally defines how the genomes can be modified. Recently, a new rearrangement model was proposed, Single-Cut-or-Join (SCJ), which brings a significant advantage in simplifying many problems that are NP-hard under other models. Although the SCJ theory is well constructed, there were doubts about its biological relevance. In this work we contribute to the understanding of this model, performing an extensive study that applies the SCJ under different evolutionary conditions, with real and simulated data, analyzing two aspects of evolutionary reconstruction: the tree structure and the genome (gene order) of the ancestral species. In the first analysis, we found out that SCJ can recover between 60% to 80% of the tree structure. Regarding the second question, given a tree structure, the reconstruction of ancestral genomes varies according to the distance from ancestral species to the known species. In the case of ancestral species close to the leaves, about 85% of the gene order can be recovered while, in more distant species, about 50% of gene order are recovered, using genome sets of 64 species. As far as time is concerned, the methods we implemented can find a topology for 64 genomes with 2000 genes each in about 10.7 minutes, and reconstruct the ancestral genomes in about 0.05 minutes, both on a typical desktop computer / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Biologia computacional Bioinformática Filogenia Otimização combinatória Algoritmos Computational biology Bioinformatics Phylogeny Combinatorial optimization Algorithms
307	Identifying gene regulatory interactions using functional genomics data Johansson, Annelie January 2014 (has links) Previously studies used correlation of DNase I hypersensitivity sites sequencing (DNase-seq) experiments to predict interactions between enhancers and its target promoter gene. We investigate the correlation methods Pearson’s correlation and Mutual Information, using DNase-seq data for 100 cell-types in regions on chromosome one. To assess the performances, we compared our results of correlation scores to Hi-C data from Jin et al. 2013. We showed that the performances are low when comparing it to the Hi-C data, and there is a need of improved correlation metrics. We also demonstrate that the use of Hi-C data as a gold standard is limited, because of its low resolution, and we suggest using another gold standard in further studies. bioinformatics gene regulation promoter enhancer correlation Hi-C data DNase-seq Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
308	Haplotype Inference as a caseof Maximum Satisfiability : A strategy for identifying multi-individualinversion points in computational phasing Bergman, Ebba January 2017 (has links) Phasing genotypes from sequence data is an important step betweendata gathering and downstream analysis in population genetics,disease studies, and multiple other fields. This determination ofthe sequences of markers corresponding to the individualchromosomes can be done on data where the markers are in lowdensity across the chromosome, such as from single nucleotidepolymorphism (SNP) microarrays, or on data with a higher localdensity of markers like in next generation sequencing (NGS). Thesorted markers may then be used for many different analyses anddata processing such as linkage analysis, or inference of missinggenotypes in the process of imputation cnF2freq is a haplotype phasing program that uses an uncommonapproach allowing it to divide big groups of related individualsinto smaller ones. It sets an initial haplotype phase and theniteratively changes it using estimations from Hidden MarkovModels. If a marker is judged to have been placed in the wronghaplotype, a switch needs to be made so that it belongs to thecorrect phase. The objective of this project was to go fromallowing only one individual within a group to be switched in aniteration to allowing multiple switches that are dependent on eachother. The result of this project is a theoretical solution for allowingmultiple dependent switches in cnF2freq, and an implementedsolution using the max-SAT solver toulbar2. Haplotypes Maximum Satisfiability Weighted Maximum Satisfiability Logic Bioinformatics cnF2freq chaplink Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
309	Local adaptation of Grauer's gorilla gut microbiome Bebris, Kristaps January 2017 (has links) The availability of high-throughput sequencing technologies has enabled metagenomicinvestigations into complex bacterial communities with unprecedented resolution andthroughput. The production of dedicated data sets for metagenomic analyses is, however, acostly process and, frequently, the first research questions focus on the study species itself. Ifthe source material is represented by fecal samples, target capture of host-specific sequencesis applied to enrich the complex DNA mixtures contained within a typical fecal DNA extract.Yet, even after this enrichment, the samples still contain a large amount of environmentalDNA that is usually left unanalysed. In my study I investigate the possibility of using shotgunsequencing data that has been subjected to target enrichment for mtDNA from the hostspecies, Grauer’s gorilla (Gorilla beringei graueri), for further analysis of the microbialcommunity present in these samples. The purpose of these analyses is to study the differencesin the bacterial communities present within a high-altitude Grauer’s gorilla, low-altitudeGrauer’s gorilla, and a sympatric chimpanzee population. Additionally, I explore the adaptivepotential of the gut microbiota within these great ape populations.I evaluated the impact that the enrichment process had on the microbial community by usingpre- and post-capture museum preserved samples. In addition to this, I also analysed the effectof two different extraction methods on the bacterial communities.My results show that the relative abundances of the bacterial taxa remain relatively unaffectedby the enrichment process and the extraction methods. The overall number of taxa is,however, reduced by each additional capture round and is not consistent between theextraction methods. This means that both the enrichment and extraction processes introducebiases that require the usage of abundance-based distance measures for biological inferences.Additionally, even if the data cannot be used to study the bacterial communities in anunbiased manner, it provides useful comparative insights for samples that were treated in thesame fashion.With this background, I used museum and fecal samples to perform cluster analysis to explorethe relationships between the gut microbiota of the three great ape populations. I found thatpopulations cluster by species first, and only then group according to habitat. I further foundthat a bacterial taxon that degrades plant matter is enriched in the gut microbiota of all threegreat ape species, where it could help with the digestion of vegetative foods. Another bacterialtaxon that consumes glucose is enriched in the gut microbiota of the low-altitude gorilla andchimpanzee populations, where it could help with the modulation of the host’s mucosalimmune system, and could point to the availability of fruit in the animals diet. In addition, Ifound a bacterial taxon that is linked with diarrhea in humans to be part of the gut microbiotaof the habituated high-altitude gorilla population, which could indicate that this pathogen hasbeen transmitted to the gorillas from their interaction with humans, or it could be indicative ofthe presence of a contaminated water source. Metagenomics Grauer's gorilla gut microbiome MALT Megan capture Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
310	Diffusion in fractal globules / På spaning efter onormal diffusion av biomolekyler i DNA med hjälp av stokastisk simulering Hariz, Jakob January 2016 (has links) Recent experiments suggest that the human genome (all of our DNA) is organised as a so-called fractal globule. The fractal globule is a knot--free dense polymer that easily folds and unfolds any genomic locus, for example a group of nearby genes. Proteins often need to locate specific target sites on the DNA, for instance to activate a gene. To understand how proteins move through the DNA polymer, we simulate diffusion of particles through a fractal globule. The fractal globule was generated on a cubic lattice as spheres connected by cylinders. With the structure in place, we simulate particle diffusion and measure how their mean squared displacement ($\langle R^2(t)\rangle$) grows as function of time $t$ for different particle radii. This quantity allows us to better understand how the three dimensional structure of DNA affects the protein's motion. From our simulations we found that $\langle R^2(t)/t\rangle$ is a decaying function when the particle is sufficiently large. This means that the particles diffuse slower than if they were free. Assuming that $\langle R^2(t) \rangle \propto t^\alpha$ for long times, we calculated the growth exponent $\alpha$ as a function of particle radius $r_p$. When $r_p$ is small compared to the average distance between two polymer segments $d$, we find that $\alpha \approx 1$. This means the polymer network does not affect the particle's motion. However, in the opposite limit $r_p\sim d$ we find that $\alpha<1$ which means that the polymer strongly slows down the particle's motion. This behaviour is indicative of sub-diffusive dynamics and has potentially far reaching consequences for target finding processes and biochemical reactions in the cell. Fractal Globule Anomalous Diffusion Diffusion DNA Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)

Search results