Spelling suggestions: "subject:"phylogenetic tre""
21 |
Efficient Algorithms for Comparing, Storing, and Sharing Large Collections of Phylogenetic TreesMatthews, Suzanne 2012 May 1900 (has links)
Evolutionary relationships between a group of organisms are commonly summarized in a phylogenetic (or evolutionary) tree. The goal of phylogenetic inference is to infer the best tree structure that represents the relationships between a group of organisms, given a set of observations (e.g. molecular sequences). However, popular heuristics for inferring phylogenies output tens to hundreds of thousands of equally weighted candidate trees. Biologists summarize these trees into a single structure called the consensus tree. The central assumption is that the information discarded has less value than the information retained. But, what if this assumption is not true?
In this dissertation, we demonstrate the value of retaining and studying tree collections. We also conduct an extensive literature search that highlights the rapid growth of trees produced by phylogenetic analysis. Thus, high performance algorithms are needed to accommodate this increasing production of data. We created several efficient algorithms that allow biologists to easily compare, store and share tree collections over tens to hundreds of thousands of phylogenetic trees. Universal hashing is central to all these approaches, allowing us to quickly identify the shared evolutionary relationships contained in tree collections. Our algorithms MrsRF and Phlash are the fastest in the field for comparing large collections of trees. Our algorithm TreeZip is the most efficient way to store large tree collections. Lastly, we developed Noria, a novel version control system that allows biologists to seamlessly manage and share their phylogenetic analyses.
Our work has far-reaching implications for both the biological and computer science communities. We tested our algorithms on four large biological datasets, each consisting of 20; 000 to 150; 000 trees over 150 to 525 taxa. Our experimental results on these datasets indicate the long-term applicability of our algorithms to modern phylogenetic analysis, and underscore their ability to help scientists easily exchange and analyze their large tree collections. In addition to contributing to the reproducibility of phylogenetic analysis, our work enables the creation of test beds for improving phylogenetic heuristics and applications. Lastly, our data structures and algorithms can be applied to managing other tree-like data (e.g. XML).
|
22 |
Regional evolutionary distinctiveness and endangerment as a means of prioritizing protection of endangered speciesBrantner, Emily K 12 November 2015 (has links)
Conservation is costly, and choices must be made about where to best allocate limited resources. I propose a regional evolutionary diversity and endangerment (RED-E) approach to prioritization of endangered species. It builds off of the evolutionary diversity and global endangerment (EDGE) approach, but will allow conservation agencies to focus their efforts on species in specific regions. I used the RED-E approach to prioritize mammal and bird species listed under the U.S. Endangered Species Act (ESA), as well as to make a ranking of species without ESA critical habitat (CH), as a practical application. Regional conservation approaches differ significantly from global approaches. The RED-E approach places a high significance on the level of endangerment of a species, but also allows for very distinct species to have increased prioritization on the RED-E list. Using the CH RED-E list, the U.S. government could begin focusing resources toward endangered and genetically diverse species.
|
23 |
Statistické vyhodnocení fylogeneze biologických sekvencí / Statistic evaluation of phylogeny of biological sequencesZembol, Filip January 2013 (has links)
The topic of my diploma thesis is the statistical evaluation of biological sequences with the help of phylogenic trees. In the theoretical part we will create a literary recherche of estimation methodology concerning the course of phylogeny on the basis of the similarity of biological sequences (DNA and proteins) and we will focus on the inaccuracies of the estimation, their causes and the possibilities of their elimination. Afterwards, we will compare the methods for the statistical evaluation of the correctness of the course of phylogeny. In the practical part of the thesis we will suggest algorithms that will be used for testing the correctness of the phylogenic trees on the basis of bootstrapping, jackknifing, OTU jackknifing and PTP test which are able to the capture phylogenic tree with the method neighbor joining from the biological sequences in FASTA code. It is also possible to change the distance model and the substitution matrix. To be able to use these algorithms for the statistical support of phylogenic trees we have to verify their right function. This verification will be evaluated on the theoretical sequences of the amino acids. For the verification of the correct function of the algorithms, we will carry out single statistical tests on real 10 sequences of mammalian ubiquitin. These results will be analysed and appropriately discussed.
|
24 |
Aplikace pro zpracování dat z oblasti evoluční biologie / Application for the Data Processing in the Area of Evolutionary BiologyVogel, Ivan January 2011 (has links)
Phylogenetic tree inference is a very common method for visualising evolutionary relationships among species. This work focuses on explanation of mathematical theory behind molecular phylogenetics as well as design of a modified algorithm for phylogenetic tree inference based on intra-group analysis of nucleotide and amino acid sequences. Furthermore, it describes the object design and implementation of the proposed methods in Python language, as well as its integration into powerful bioinformatic portal. The proposed modified algorithmic solutions give better results comparing to standard methods, especially on the field of clustering of predefined groups. Finally, future work as well as an application of proposed methods to other fields of information technology are discussed.
|
25 |
Ancestral Reconstruction and Investigations of Genomics Recombination on Chloroplasts Genomes / Reconstruction ancestrale et investigation de recombinaison génomique sur chloroplastes génomesAl-Nuaimi, Bashar 13 October 2017 (has links)
La théorie de l’évolution repose sur la biologie moderne. Toutes les nouvelles espèces émergent d’une espèce existante. Il en résulte que différentes espèces partagent une ascendance commune, telle que représentée dans la classification phylogénétique. L’ascendance commune peut expliquer les similitudes entre tous les organismes vivants, tels que la chimie générale, la structure cellulaire, l’ADN comme matériau génétique et le code génétique. Les individus d’une espèce partagent les mêmes gènes mais (d’ordinaire) différentes séquences d’allèles de ces gènes. Un individu hérite des allèles de leur ascendance ou de leurs parents. Le but des études phylogénétiques est d’analyser les changements qui se produisent dans différents organismes pendant l’évolution en identifiant les relations entre les séquences génomiques et en déterminant les séquences ancestrales et leurs descendants. Une étude de phylogénie peut également estimer le temps de divergence entre les groupes d’organismes qui partagent un ancêtre commun. Les arbres phylogénétiques sont utiles dans les domaines de la biologie, comme la bio informatique, pour une phylogénétique systématique et comparative. L’arbre évolutif ou l’arbre phylogénétique est une exposition ramifiée les relations évolutives entre divers organismes biologiques ou autre existence en fonction des différences et des similitudes dans leurs caractéristiques génétiques. Les arbres phylogénétiques sont construits à partir de données moléculaires comme les séquences d’ADN et les séquences de protéines. Dans un arbre phylogénétique, les nœuds représentent des séquences génomiques et s’appellent des unités taxonomiques. Chaque branche relie deux nœuds adjacents. Chaque séquence similaire sera un voisin sur les branches extérieures, et une branche interne commune les reliera à un ancêtre commun. Les branches internes sont appelées unités taxonomiques hypothétiques. Ainsi, les unités taxonomiques réunies dans l’arbre impliquent d’être descendues d’un ancêtre commun. Notre recherche réalisée dans cette dissertation met l’accent sur l’amélioration des prototypes évolutifs appropriés et des algorithmes robustes pour résoudre les problèmes d’inférence phylogénétiques et ancestrales sur l’ordre des gènes et les données ADN dans l’évolution du génome complet, ainsi que leurs applications.[...] / The theory of evolution is based on modern biology. All new species emerge of an existing species. As a result, different species share common ancestry,as represented in the phylogenetic classification. Common ancestry may explainthe similarities between all living organisms, such as general chemistry, cell structure,DNA as genetic material and genetic code. Individuals of one species share the same genes but (usually) different allele sequences of these genes. An individual inheritsalleles of their ancestry or their parents. The goal of phylogenetic studies is to analyzethe changes that occur in different organisms during evolution by identifying therelationships between genomic sequences and determining the ancestral sequences and theirdescendants. A phylogeny study can also estimate the time of divergence betweengroups of organisms that share a common ancestor. Phylogenetic trees are usefulin the fields of biology, such as bioinformatics, for systematic phylogeneticsand comparative. The evolutionary tree or the phylogenetic tree is a branched exposure the relationsevolutionary between various biological organisms or other existence depending on the differences andsimilarities in their genetic characteristics. Phylogenetic trees are built infrom molecular data such as DNA sequences and protein sequences. Ina phylogenetic tree, the nodes represent genomic sequences and are calledtaxonomic units. Each branch connects two adjacent nodes. Each similar sequencewill be a neighbor on the outer branches, and a common internal branch will link them to acommon ancestor. Internal branches are called hypothetical taxonomic units. Thus,Taxonomic units gathered in the tree involve being descended from a common ancestor. Ourresearch conducted in this dissertation focuses on improving evolutionary prototypesappropriate and robust algorithms to solve phylogenetic inference problems andancestral information about the order of genes and DNA data in the evolution of the complete genome, as well astheir applications.
|
26 |
Classificação de tecidos da mama em massa e não-massa usando índice de diversidade taxonômico e máquina de vetores de suporte / Classification of breast tissues in mass and non-mass using index of Taxonomic diversity and support vector machineOLIVEIRA, Fernando Soares Sérvulo de 20 February 2013 (has links)
Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-08-17T17:25:58Z
No. of bitstreams: 1
FernandoOliveira.pdf: 2347086 bytes, checksum: 0b2d54b7d13b7467bee9db13f63100f5 (MD5) / Made available in DSpace on 2017-08-17T17:25:58Z (GMT). No. of bitstreams: 1
FernandoOliveira.pdf: 2347086 bytes, checksum: 0b2d54b7d13b7467bee9db13f63100f5 (MD5)
Previous issue date: 2013-02-20 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Breast cancer is the second most common type of cancer in the world and difficult to
diagnose. Distinguished Systems Aided Detection and Diagnosis Computer have been used to
assist experts in the health field with an indication of suspicious areas of difficult perception
to the human eye, thus aiding in the detection and diagnosis of cancer. This dissertation
proposes a methodology for discrimination and classification of regions extracted from the
breast mass and non-mass. The Digital Database for Screening Mammography (DDSM) is
used in this work for the acquisition of mammograms, which are extracted from the regions of
mass and non-mass. The Taxonomic Diversity Index (∆) and the Taxonomic Distinctness (∆*)
are used to describe the texture of the regions of interest, originally applied in ecology. The
calculation of those indices is based on phylogenetic trees, which applied in this work to
describe patterns in regions of the images of the breast with two regions bounding approaches
to texture analysis: circle with rings and internal with external masks. Suggested in this work
to be applied in the description of patterns of regions in breast imaging approaches circle with
rings and masks as internal and external boundaries regions for texture analysis. Support
Vector Machine (SVM) is used to classify the regions in mass or non-mass. The proposed
methodology provides successful results for the classification of masses and non-mass,
reaching an average accuracy of 99.67%. / O câncer de mama é o segundo tipo de câncer mais frequente no mundo e de difícil
diagnóstico. Distintos Sistemas de Detecção e Diagnóstico Auxiliados por Computador
(Computer Aided Detection/Diagnosis) têm sido utilizados para auxiliar especialistas da área
da saúde com a indicação de áreas suspeitas de difícil percepção ao olho humano, assim
ajudando na detecção e diagnóstico de câncer. Este trabalho propõe uma metodologia de
discriminação e classificação de regiões extraídas da mama em massa e não-massa. O banco
de imagens Digital Database for Screening Mammography (DDSM) é usado neste trabalho
para aquisição das mamografias, onde são extraído as regiões de massa e não-massa. Na
descrição da textura da região de interesse são utilizados os Índices de Diversidade
Taxonômica (∆) e Distinção Taxonômica (∆*), provenientes da ecologia. O cálculo destes
índices é baseado nas árvores filogenéticas, sendo aplicados neste trabalho na descrição de
padrões em regiões das imagens da mama com duas abordagens de regiões delimitadoras para
análise da textura: círculo com anéis e máscaras internas com externas. Para classificação das
regiões em massa e não-massa é utilizado o classificador Máquina de Vetores de Suporte
(MVS). A metodologia apresenta resultados promissores para a classificação de massas e
não-massas, alcançando uma acurácia média de 99,67%.
|
27 |
Fauna stenica (Heteroptera) različitih ekosistema imolekularne karakteristike važnijih vrsta / Faunistic research of true bugs (Heteroptera) indifferent ecosystems and molecular analyze of certainspeciesKonjević Aleksandra 01 July 2015 (has links)
<p>Istraživanja faune stenica na prostoru Vojvodine u<br />poslednjih nekoliko decenija baziraju se pre svega na<br />praćenju brojnosti i štetnosti vrsta u pšenici, lucerki i<br />soji. Malo je podataka o korisnim vrstama, kao i<br />drugim vrstama koji imaju mali ili gotovo beznačajan<br />uticaj na biljnu proizvodnju. Stoga je u ovom radu<br />istražena fauna stenica različitih ekosistema koji<br />obuhvataju useve pšenice i lucerke, ali i biljke<br />ruderalnih staništa i poljozaštitnih pojaseva navedenih<br />kultura. Istovremeno je istražena i fauna stenica sa<br />biljaka spontane flore na lokalitetima viših<br />nadmorskih visina Fruške gore i Divčibara, koje<br />predstavljaju mesta prezimljavanja određenih vrsta.<br />Podaci o prisutnim vrstama navedenih ekosistema<br />predstavljaju dodadatak dosadašnjim istraživanjima<br />faune stenica naše zemlje.<br />Stenice su uzorkovane entomološkim kečerom<br />i ručno na više od 48 lokaliteta na teritoriji Bačke,<br />Fruške gore i Divčibara. Determinacija uzorkovanih<br />jedinki rađena je prema morfološkim karakteristikama<br />uz upotrebu odgovarajućih ključeva, pri čemu je<br />zabeleženo ukupno 59 vrsta iz 14 familija. Najveći<br />broj vrsta zabeležen je na biljkama spontane flore,<br />ukupno 42 vrste, zatim u usevu lucerke, 26 vrsta, a<br />najmanji broj vrsta, ukupno 17, zabeleženo je u usevu<br />pšenice. Među uzorkovanim stenicama najviše je bilo<br />fitofagnih oligofagnih vrsta, ali je zabeleženo i<br />prisustvo ukupno šest predatorskih vrsta.<br />Kao dodatak morfološkoj determinaciji vrsta<br />urađena je molekularna analiza osam vrsta, u prvom<br />redu žitnih stenica iz familija Scutelleridae i<br />Pentatomidae, ali i tri izrazito polifagne vrste čije<br />prisustvo je zabeleženo na velikom broju lokaliteta.<br />Kod pomenutih vrsta analiziran je mitohondrijalni<br />citohrom c oksidaza I standardni barkod fragmet i<br />formirano je filogenetsko stablo. Ova istraživanja<br />predstavljaju preliminarna istraživanja stenica sa<br />našeg podneblja na molekularnom nivou.<br />Spisak registrovanih vrsta stenica, koji je<br />jedan od rezultata ovog rada, predstavlja značajan<br />doprinos poznavanju faune Heteroptera u gajenim<br />kulturama, pšenici i lucerki, ali i na biljkama spontane<br />flore. Molekularna analiza ukazala je na sličnost<br />pojedinih vrsta i rodova na molekularnom nivou i<br />istovremeno potvrdila pozdanost morfoloških<br />karaktera u determinaciji stenica. Najvažnije osobine<br />svih registrovanih stenica koje su iznete u radu<br />predstavljaju prilog izučavanju faune stenica u<br />Vojvodini, pa i Srbiji.</p> / <p>Faunistic research of true bugs (Heteroptera) in<br />Vojvodina, in several last decades was mainly<br />focused on the most important pest species in wheat,<br />alfalfa and soybean. There are very few data of<br />beneficial species and/or species of low importance to<br />named crops. Therefore the main focus of this work<br />was to investigate the whole fauna of true bugs in<br />different ecosystems, including wheat, alfalfa and<br />ruderal plants in and around the cultivated fields. At<br />the same time true bugs fauna of spontaneous flora in<br />localities of higher altitudes, such as Fruška gora<br />mountain and Divčibare, was also investigated. List of<br />registered species is a great contribution to the fauna<br />of true bugs in Vojvodina and Serbia.<br />During research true bugs were sampled by<br />sweep net and by hand, at more than 48 localities all<br />around the Bačka region (Vojvodina), as well as in<br />Fruška gora mountain and Divčibare. Specimens were<br />identified according to their morphology, using many<br />keys for identification. 59 species belonging to 14<br />terrestrial families were recorded. The most species<br />were recorded in spontaneous flora, 42 in total. This<br />was followed by 26 species in alfalfa fields and only<br />17 species registered in wheat. Most of these species<br />were phytophagous and only six were predaceous.<br />Presence of zoophagous specimens is important as<br />indicator of biological balance which exists in<br />described environment despite the human activity.<br />Molecular analysis of eight true bugs species<br />was done as additional method for identification of<br />sampled specimens. Species were chosen by their<br />importance in wheat fields, and by their presence in<br />each sampled ecosystem. Mitochondrial cytochrome c<br />oxidase subunit I gene was analyzed and phylogenetic<br />tree was constructed. This is a preliminary survey of<br />true bugs in Vojvodina on molecular level.<br />List of recorded true bug species, as one of the results<br />of this work, is a contribution to the list of species in<br />wheat and alfalfa which includes not only pest<br />species, but beneficial and neutral ones as well.<br />Knowledge of true bugs species which inhabit<br />spontaneous plants around the fields is of importance<br />for cultivated crops having in mind bugs vicinity and<br />ability to live and hide inside of different plants.<br />Molecular analysis revealed the similarity of some<br />species and genera at molecular level and at the same<br />time confirmed the reliability of morphological<br />characters in identification of true bugs. The most<br />important characteristics of recorded species were<br />given as contribution of true bugs investigations in<br />Vojvodina and Serbia.</p>
|
28 |
Recoloração convexa de grafos: algoritmos e poliedros / Convex recoloring of graphs: algorithms and polyhedraMoura, Phablo Fernando Soares 07 August 2013 (has links)
Neste trabalho, estudamos o problema a recoloração convexa de grafos, denotado por RC. Dizemos que uma coloração dos vértices de um grafo G é convexa se, para cada cor tribuída d, os vértices de G com a cor d induzem um subgrafo conexo. No problema RC, é dado um grafo G e uma coloração de seus vértices, e o objetivo é recolorir o menor número possível de vértices de G tal que a coloração resultante seja convexa. A motivação para o estudo deste problema surgiu em contexto de árvores filogenéticas. Sabe-se que este problema é NP-difícil mesmo quando G é um caminho. Mostramos que o problema RC parametrizado pelo número de mudanças de cor é W[2]-difícil mesmo se a coloração inicial usa apenas duas cores. Além disso, provamos alguns resultados sobre a inaproximabilidade deste problema. Apresentamos uma formulação inteira para a versão com pesos do problema RC em grafos arbitrários, e então a especializamos para o caso de árvores. Estudamos a estrutura facial do politopo definido como a envoltória convexa dos pontos inteiros que satisfazem as restrições da formulação proposta, apresentamos várias classes de desigualdades que definem facetas e descrevemos os correspondentes algoritmos de separação. Implementamos um algoritmo branch-and-cut para o problema RC em árvores e mostramos os resultados computacionais obtidos com uma grande quantidade de instâncias que representam árvores filogenéticas reais. Os experimentos mostram que essa abordagem pode ser usada para resolver instâncias da ordem de 1500 vértices em 40 minutos, um desempenho muito superior ao alcançado por outros algoritmos propostos na literatura. / In this work we study the convex recoloring problem of graphs, denoted by CR. We say that a vertex coloring of a graph G is convex if, for each assigned color d, the vertices of G with color d induce a connected subgraph. In the CR problem, given a graph G and a coloring of its vertices, we want to find a recoloring that is convex and minimizes the number of recolored vertices. The motivation for investigating this problem has its roots in the study of phylogenetic trees. It is known that this problem is NP-hard even when G is a path. We show that the problem CR parameterized by the number of color changes is W[2]-hard even if the initial coloring uses only two colors. Moreover, we prove some inapproximation results for this problem. We also show an integer programming formulation for the weighted version of this problem on arbitrary graphs, and then specialize it for trees. We study the facial structure of the polytope defined as the convex hull of the integer points satisfying the restrictions of the proposed ILP formulation, present several classes of facet-defining inequalities and the corresponding separation algorithms. We also present a branch-and-cut algorithm that we have implemented for the special case of trees, and show the computational results obtained with a large number of instances. We considered instances which are real phylogenetic trees. The experiments show that this approach can be used to solve instances up to 1500 vertices in 40 minutes, comparing favorably to other approaches that have been proposed in the literature.
|
29 |
Stochastic Tree Models for MacroevolutionKeller-Schmidt, Stephanie 24 September 2012 (has links) (PDF)
Phylogenetic trees capture the relationships between species and can be investigated by morphological and/or molecular data. When focusing on macroevolution, one considers the large-scale history of life with evolutionary changes affecting a single species of the entire clade leading to the enormous diversity of species obtained today. One major problem of biology is the explanation of this biodiversity. Therefore, one may ask which kind of macroevolutionary processes have given rise to observable tree shapes or patterns of species distribution which refers to the appearance of branching orders and time periods. Thus, with an increasing number of known species in the context of phylogenetic studies, testing hypotheses about evolution by analyzing the tree shape of the resulting phylogenetic trees became matter of particular interest. The attention of using those reconstructed phylogenies for studying evolutionary processes increased during the last decades. Many paleontologists (Raup et al., 1973; Gould et al., 1977; Gilinsky and Good, 1989; Nee, 2004) tried to describe such patterns of macroevolution by using models for growing trees. Those models describe stochastic processes to generate phylogenetic trees. Yule (1925) was the first who introduced such a model, the Equal Rate Markov (ERM) model, in the context of biological branching based on a continuous-time, uneven branching process. In the last decades, further dynamical models were proposed (Yule, 1925; Aldous, 1996; Nee, 2006; Rosen, 1978; Ford, 2005; Hernández-García et al., 2010) to address the investigation of tree shapes and hence, capture the rules of macroevolutionary forces. A common model, is the Aldous\\\' Branching (AB) model, which is known for generating trees with a similar structure of \\\"real\\\" trees. To infer those macroevolutionary forces structures, estimated trees are analyzed and compared to simulated trees generated by models. There are a few drawbacks on recent models such as a missing biological motivation or the generated tree shape does not fit well to one observed in empirical trees.
The central aim of this thesis is the development and study of new biologically motivated approaches which might help to better understand or even discover biological forces which lead to the huge diversity of organisms.
The first approach, called age model, can be defined as a stochastic procedure which describes the growth of binary trees by an iterative stochastic attachment of leaves, similar to the ERM model. At difference with the latter, the branching rate at each clade is no longer constant, but decreasing in time, i.e., with the age. Thus, species involved in recent speciation events have a tendency to speciate again. The second introduced model, is a branching process which mimics the evolution of species driven by innovations. The process involves a separation of time scales. Rare innovation events trigger rapid cascades of diversification where a feature combines with previously existing features. The model is called innovation model. Three data sets of estimated phylogenetic trees are used to analyze and compare the produced tree shape of the new growth models. A tree shape statistic considering a variety of imbalance measurements is performed. Results show that simulated trees of both growth models fit well to the tree shape observed in real trees. In a further study, a likelihood analysis is performed in order to rank models with respect to their ability to explain observed tree shapes. Results show that the likelihoods of the age model and the AB model are clearly correlated under the trees in the databases when considering small and medium-sized trees with up to 19 leaves. For a data set, representing of phylogenetic trees of protein families, the age model outperforms the AB model. But for another data set, representing phylogenetic trees of species, the AB model performs slightly better. To support this observation a further analysis using larger trees is necessary. But an exact computation of likelihoods for large trees implies a huge computational effort. Therefore, an efficient method for likelihood estimation is proposed and compared to the estimation using a naive sampling strategy. Nevertheless, both models describe the tree generation process in a way which is easy to interpret biologically.
Another interesting field of research in biology is the coevolution between species. This is the interaction of species across groups such that the evolution of a species from one group can be triggered by a species from another group. Most prominent examples are systems of host species and their associated parasites. One problem is the reconciliation of the common history of both groups of species and to predict the associations between ancestral hosts and their parasites. To solve this problem some algorithmic methods have been developed in recent years. But only a few host parasite systems have been analyzed in sufficient detail which makes an evaluation of these methods complex. Within the scope of coevolution, the proposed age model is applied to the generation of cophylogenies to evaluate such host parasite reconciliation methods.
The presented age model as well as the innovation model produce tree shapes which are similar to obtained tree structures of estimated trees. Both models describe an evolutionary dynamics and might provide a further opportunity to infer macroevolutionary processes which lead to the biodiversity which can be obtained today. Furthermore with the application of the age model in the context of coevolution by generating a useful benchmark set of cophylogenies is a first step towards systematic studies on evaluating reconciliation methods.
|
30 |
Investigating the Impact of Insertion Sequences on the Evolution of Prokaryotic Genomes / Etude de l’Impact des séquences d’Insertion sur l’évolution des énomes ProcaryotesAl-Nayyef, Huda 15 December 2015 (has links)
Le nombre de génomes bactériens et archées complètement séquencés augmentant sans cesse plus, une telle augmentation rend possible le développement de nouveaux types d’approches large échelle, afin de comprendre l’évolution de la structure des génomes au cours du temps. La prédiction du contenu en gènes et la comparaison des génomes ont évolué de telle sorte qu’il est dorénavant possible d’extraire un certain nombre de nouvelles information permettant de comprendre l’évolution des procaryotes. Des séquences importantes dans la compréhension des opérations de réarrangements au sein des génomes de au cours du temps sont les éléments transposables, qui sont des fragments d’ADN ayant la possibilité de se mouvoir d’un lieu à l’autre, et peuvent se dupliquer au cours de ces transpositions. Les éléments transposables chez les procaryotes sont les séquences d’insertion, qui suivent un processus de couper-coller à l’intérieur des séquences ADN. Cependant, les outils ayant pour but de découvrir de telles séquences d’insertions d’une manière efficace et de développer une manière algorithmique originale pour découvrir les séquences d’insertions dans des génomes bactériens, et de constituer une base de données pour découvrir les séquences d’insertion dans des génomes bactériens, et de constituer une base de données les insérant. A l’aide de ces données, nous devons déduire un modèle d’évolution de ces éléments transposables, qui doit être relié à l’évolution de la séquence hôte (le génome procaryote). En particulier, nous devons déterminer si les séquences d’insertion et les génomes hôtes ont évolué de la même manière, et si ces séquences sont responsables, au moins jusqu’à une certaine mesure, de recombinaisons génomiques telles que les inversions. / The number of completely sequenced bacterial and archaeal genomes are rising steadily, such an increasingmakes it possible to develop novel kind of large scale approaches to understand genomes structureand evolution over time. Gene content prediction and genome comparison have both provided newmajor information and deciphering keys to understand evolution of prokaryotes. Important sequencesin understanding rearrangement operations inside genome sequences during evolution are the so-calledtransposable elements (TEs), which are DNA fragments or segments that have the ability to insert themselvesinto new chromosomal locations, and often make duplicate copies of themselves during transposition process.The transposable elements involved in such a move are the insertion sequences (ISs) in prokaryotes, theyfollow a cut-and-paste process inside the host DNA sequence. But the tools that deal with discovering ISs inan efficient way and that relate them to genome rearrangements are still too few and not totally accurate.The aim of this thesis is to develop an accurate algorithmic way to discover insertion sequences (ISs) inbacterial genomes and to constitute a database with these discoveries. Using these data, we must deduce amodel of evolution of these transposable elements, which must be related to the evolution of the host sequence(the prokaryotic genome). In particular, wemust ask whether insertion sequences and host genomes haveevolved in a similar way, and if ISs are responsible, at least to some extent, for genomic recombinationlike inversions.
|
Page generated in 0.0944 seconds