Spelling suggestions: "subject:"neighborhood joining""
1 |
EFFICIENT CONSTRUCTION OF ACCURATE MULTIPLE ALIGNMENTS AND LARGE-SCALE PHYLOGENIESWheeler, Travis John January 2009 (has links)
A central focus of computational biology is to organize and make use of vast stores of molecular sequence data. Two of the most studied and fundamental problems in the field are sequence alignment and phylogeny inference. The problem of multiple sequence alignment is to take a set of DNA, RNA, or protein sequences and identify related segments of these sequences. Perhaps the most common use of alignments of multiple sequences is as input for methods designed to infer a phylogeny, or tree describing the evolutionary history of the sequences. The two problems are circularly related: standard phylogeny inference methods take a multiple sequence alignment as input, while computation of a rudimentary phylogeny is a step in the standard multiple sequence alignment method.Efficient computation of high-quality alignments, and of high-quality phylogenies based on those alignments, are both open problems in the field of computational biology. The first part of the dissertation gives details of my efforts to identify a best-of-breed method for each stage of the standard form-and-polish heuristic for aligning multiple sequences; the result of these efforts is a tool, called Opal, that achieves state-of-the-art 84.7% accuracy on the BAliBASE alignment benchmark. The second part of the dissertation describes a new algorithm that dramatically increases the speed and scalability of a common method for phylogeny inference called neighbor-joining; this algorithm is implemented in a new tool, called NINJA, which is more than an order of magnitude faster than a very fast implementation of the canonical algorithm, for example building a tree on 218,000 sequences in under 6 days using a single processor computer.
|
2 |
Méthodes de distance pour l'inférence phylogénomiqueCriscuolo, Alexis 05 December 2006 (has links) (PDF)
L'inférence phylogénomique cherche à combiner le signal évolutif induit par un ensemble de gènes dans le but de construire un unique arbre phylogénétique.<br />Elle peut être décomposée en trois grandes familles méthodologiques: la combinaison basse, qui s'appuie sur la concaténation des différents gènes, la combinaison haute, qui considère l'ensemble des arbres inférés à partir de chaque gène, et la combinaison moyenne, qui encode les différents signaux phylogénétiques puis combine ces différents encodages.<br />Une méthode d'inférence d'arbre est ensuite appliquée sur le résultat de la combinaison.<br /><br />Cette thèse développe de nouveaux scénarios d'inférence phylogénomique, principalement basés sur l'estimation de distances évolutives entre chaque paire de taxons.<br />Elle propose une nouvelle méthode de combinaison moyenne, nommée SDM, qui considère les matrices de distance estimées à partir de chaque gène et qui les combine en une unique supermatrice de distance.<br />Cette dernière pouvant parfois contenir des distances manquantes, cette thèse décrit également de nouveaux algorithmes, nommés NJ*, UNJ*, BioNJ* et MVR*, permettant d'inférer très rapidement un arbre à partir d'une matrice de distance complète ou incomplète.<br />De nombreuses simulations ont permis d'observer les bonnes performances de ces nouvelles méthodes de distance.<br />Initialement développées pour la combinaison moyenne, elles permettent toutefois d'améliorer significativement les résultats de certaines approches standards en combinaison basse, et représentent une alternative efficace à MRP, la plus utilisée des techniques de combinaison haute, en termes de fiabilité et de rapidité.<br />La taille des jeux de données phylogénomiques étant de plus en plus importante, les méthodes développées dans cette thèse constituent ainsi des outils de choix pour construire l'Arbre de la Vie.
|
3 |
Bumblebees in a region of northwestern Scania: Is species number correlated to the number of flowering angiosperms and does gene flow occur between four locations?Dahlgren, Linnea January 2014 (has links)
Pollination, one of our ecosystem services, is considered to be in critical condition due to a worldwide reduction in pollinators and their biodiversity. As the agricultural landscape becomes more and more intense, the pollinators lose important food and living resources. In temperate ecosystems, bumblebees (Bombus spp) are an important group of wild pollinators, and as with pollinators in general, they are declining in both abundance and richness, in Sweden as well as other countries. The purpose of this study was to see if bumblebee species number of a location is linked to the location’s number of flowering angiosperm species in northwestern Scania when examining eight locations, and to see if gene flow existed between four chosen locations. The result of this study suggests that it is not possible to tell from the flowering angiosperm species how many bumblebee species that will be abundant, but that it might be possible to tell the number of bumblebee individuals. With the number of bumblebee species, the abundant Fabaceae species was more important than the total number of flowering angiosperms of the location. The number of abundant Fabaceae species was strongly correlated to the bumblebee diversity index of the locations, indicating that it is a group of flowers closely linked to bumblebees. To see if gene flow occurred between the chosen locations, mtDNA sequences were compared in neighbor joining trees. The result showed that though some tendencies of isolation existed, gene flow seemed to occur in general between the locations in that fragmented and human dominated landscape of northwestern Scania.
|
4 |
A COMPARATIVE STUDY OF NEIGHBOR JOINING BASED APPROACHES FOR PHYLOGENETIC INFERENCECorrea, Maria Fernanda 01 December 2010 (has links)
One of the most relevant issues in the field of biology is the unveiling of the evolutionary history of different species and organisms. The evolutionary relationships of these species and organisms are explained by constructing phylogenetic trees whose leaves represent species and whose internal nodes represent hypothesized ancestors. The tree reconstruction process is known as Phylogenetic Inference. Phylogenies can be used not only for explaining the evolutionary history of organisms but also for many other purposes such as the design of new drugs by tracking the evolution of diseases. In the last few years, the amount of genetic data collected from organisms and species has increased greatly. Based on this, biologists have sought methods that are capable of computing phylogenies of small, medium, and even large datasets in a reasonable time and with accuracy. The neighbor-joining method is one used most for phylogenetic inference because of its computation efficiency. Since the increase of datasets, novel neighbor-joining- based approaches have been developed with the goal of computing efficiency and accurate phylogenies of thousands of sequences. Therefore, this study compared the canonical neighbor-joining method represented by MEGA software with two novel neighbor-joining-based approaches--the NINJA method and the FastTree method--to identify the most efficient and effective method for the computational performance, topological accuracy, and topological similarity through the scalability of the sequences size. The study was accomplished by executing experiments using small, medium, and large protein and nucleotide sequences. The FastTree method was the most successful at balancing the trade-off among the Computational Performance, Topological Accuracy, and Topological Similarity when scaling up the number of sequences in this study.
|
5 |
An Extended Study On The Alu Insertion Polymorphisms In Anatolian Human PopulationSekeryapan, Ceran 01 September 2005 (has links) (PDF)
In the present study, for estimating the Central Asia contribution to the Anatolia, nine Alu insertion polymorphisms (ACE, PV92, FXIIIB, APO, A25, B65, TPA25, D1, HS4.32 ) in 100 individuals from Anatolia were examined. Alu insertion frequency for these loci were calculated as 0,410 / 0,220 / 0,579 / 0,963 / 0,067 / 0,667 / 0,390 / 0,427 / and 0,637 respectively and they were found to be in Hardy-Weinberg equilibrium (p< / 0,05). Observed insertion frequencies of each loci were compared with those of the previous observations (Dinç / , 2003 / Comas et al., 2004) and it was found that the present study results were not different than those obtained by Comas et al. (2004). Thus, these two data were pooled (N = 143) and used to examine genetic relationships between populations from Eurasia and Africa.
Pairwise Fst statistics indicated that there is higher genetic similarity between Anatolia and all of the Balkans and some of the Caucasian populations. Neighbor Joining (NJ) tree based on Reynold&rsquo / s genetic distances and Principal Component Analysis (PCA) both grouped the Anatolian populations with Balkans and some of the Caucasian populations and show clear differentiation of Asian populations from the Anatolian population.
The relative genetic contribution of Central Asian genes to the current Anatolian gene pool was quantified using Admix analysis, considering for comparison populations of Balkans (Greek, Romania, Albania and Hungarian) and Central Asia (Uighur, Uzbeks, Tajicks, Kazaks, Kyrgyzes, Dungans). Estimates suggest roughly 28 % contribution from Asia to Anatolia in concordance with the previous estimation (Benedetto et al., 2001).
|
6 |
Statistické vyhodnocení fylogeneze biologických sekvencí / Statistic evaluation of phylogeny of biological sequencesZembol, Filip January 2013 (has links)
The topic of my diploma thesis is the statistical evaluation of biological sequences with the help of phylogenic trees. In the theoretical part we will create a literary recherche of estimation methodology concerning the course of phylogeny on the basis of the similarity of biological sequences (DNA and proteins) and we will focus on the inaccuracies of the estimation, their causes and the possibilities of their elimination. Afterwards, we will compare the methods for the statistical evaluation of the correctness of the course of phylogeny. In the practical part of the thesis we will suggest algorithms that will be used for testing the correctness of the phylogenic trees on the basis of bootstrapping, jackknifing, OTU jackknifing and PTP test which are able to the capture phylogenic tree with the method neighbor joining from the biological sequences in FASTA code. It is also possible to change the distance model and the substitution matrix. To be able to use these algorithms for the statistical support of phylogenic trees we have to verify their right function. This verification will be evaluated on the theoretical sequences of the amino acids. For the verification of the correct function of the algorithms, we will carry out single statistical tests on real 10 sequences of mammalian ubiquitin. These results will be analysed and appropriately discussed.
|
7 |
Aplikace pro zpracování dat z oblasti evoluční biologie / Application for the Data Processing in the Area of Evolutionary BiologyVogel, Ivan January 2011 (has links)
Phylogenetic tree inference is a very common method for visualising evolutionary relationships among species. This work focuses on explanation of mathematical theory behind molecular phylogenetics as well as design of a modified algorithm for phylogenetic tree inference based on intra-group analysis of nucleotide and amino acid sequences. Furthermore, it describes the object design and implementation of the proposed methods in Python language, as well as its integration into powerful bioinformatic portal. The proposed modified algorithmic solutions give better results comparing to standard methods, especially on the field of clustering of predefined groups. Finally, future work as well as an application of proposed methods to other fields of information technology are discussed.
|
8 |
Alu Insertion Polymorphisms In Anatolian TurksDinc, Havva 01 September 2003 (has links) (PDF)
In the present study / ten autosomal human-specific Alu insertion polymorphisms / ACE, APO, A25, B65, D1, FXIIIB, HS4.32, HS4.69, PV92 and TPA25 were analyzed in approximately 100 unrelated individuals from Anatolia. Alu insertion polymorphisms offer several advantages over other nuclear DNA polymorphisms for human evolution studies.
The frequencies of the ten biallelic Alu insertions in Anatolians were calculated and all systems were found to be in Hardy-Weinberg equilibrium (p> / 0.05).
By combining the results of this study with results of previous studies done on worldwide populations, the genetic distance (Nei&rsquo / s DA) between each pair of populations was calculated and neighbor joining trees were constructed. In general, geographically closer populations were found to be also genetically similar. Principal component analysis (PCA) was performed and Anatolia was found to be in the European cluster. As a result of PCA / it was concluded that FXIIIB, PV92 and ACE were the variables contributing the most to the explanation of the variation between the populations. Additionally / canonical variates analysis (CVA) concluded that the most discriminative markers for the groups of populations were PV92, D1, ACE and HS4.32.
Pair-wise Fst values were also calculated between Anatolians and some of the populations for which the data was available. It was concluded that, Anatolians have non-significant pair-wise Fst values with Swiss and French Acadian populations.
Lastly, heterozygosity vs. distance from centroid graph was constructed and it was found that Anatolians and India-Hindu had exactly the expected heterozygosity value predicted by the model of Harpending and Ward (1982).
|
9 |
Applying a Molecular Genetics Approach to Shark Conservation and Management: Assessment of DNA Barcoding in Hammerhead Sharks and Global Population Genetic Structuring in the Gray Reef Shark, Carcharhinus amblyrhynchos.Horn, Rebekah L. 01 February 2010 (has links)
Chapter 1
DNA barcoding based on the mitochondrial cytochrome c oxidase subunit I (COI) gene sequence is emerging as a useful tool for identifying unknown, whole or partial organisms to species level. However, the application of only a single mitochondrial marker for robust species identification has also come under some criticism due to the possibility of erroneous identifications resulting from species hybridizations and/or the potential presence of nuclear-mitochondrial psuedogenes. The addition of a complementary nuclear DNA barcode has therefore been widely recommended to overcome these potential COI gene limitations, especially in wildlife law enforcement applications where greater confidence in the identifications is essential. In this study, we examined the comparative nucleotide sequence divergence and utility of the mitochondrial COI gene (N=182 animals) and nuclear ribosomal internal transcribed spacer 2 (ITS2) locus (N=190 animals) in the 8 known and 1 proposed cryptic species of globally widespread, hammerhead sharks (family Sphyrnidae). Since hammerhead sharks are under intense fishing pressure for their valuable fins with some species potentially set to receive CITES listing, tools for monitoring their fishery landings and tracking trade in their body parts is necessary to achieve effective management and conservation outcomes. Our results demonstrate that both COI and ITS2 loci function robustly as stand-alone barcodes for hammerhead shark species identification. Phylogenetic analyses of both loci independently and together accurately place each hammerhead species together in reciprocally monophyletic groups with strong bootstrap support. The two barcodes differed notably in levels of intraspecific divergence, with average intraspecific K2P distance an order of magnitude lower in the ITS2 (0.297% for COI and 0.0967% for ITS2). The COI barcode also showed phylogeographic separation in Sphyrna zygaena, S. lewini and S. tiburo, potentially providing a useful option for assigning unknown specimens (e.g. market fins) to a broad geographic origin. We suggest that COI supplemented by ITS2 DNA barcoding can be used in an integrated and robust approach for species assignment of unknown hammerhead sharks and their body parts in fisheries and international trade.
Chapter 2
The gray reef shark (Carcharhinus amblyrhynchos) is an Indo-Pacific, coral reef associated species that likely plays an important role as apex predator in maintaining the integrity of coral reef ecosystems. Populations of this shark have declined substantially in some parts of its range due to over-fishing, with recent estimates suggesting a 17% decline per year on the Great Barrier Reef (GBR). Currently, there is no information on the population structure or genetic status of gray reef sharks to aid in their management and conservation. We assessed the genetic population structure and genetic diversity of this species by using complete mitochondrial control region sequences and 15 nuclear microsatellite markers. Gray reef shark samples (n=305) were obtained from 10 locations across the species’ known longitudinal Indo-Pacific range: western Indian Ocean (Madagascar), eastern Indian Ocean (Cocos [Keeling] Islands, Andaman Sea, Indonesia, and western Australia), central Pacific (Hawaii, Palmyra Atoll, and Fanning Atoll), and southwestern Pacific (eastern Australia – Great Barrier Reef). The mitochondrial and nuclear marker data were concordant in most cases with population-based analysis showing significant overall structure (FST = 0.27906 (pST = 0.071 ± 0.02), and significant pairwise genetic differentiation between nearly all of the putative populations sampled (i.e., 9 of the 10 for mitochondrial and 8 of the 10 for nuclear markers). Individual-based analysis of microsatellite genotypes identified at least 5 populations. The concordant mitochondrial and nuclear marker results are consistent with a scenario of very low to no appreciable connectivity (gene flow) among most of the sampled locations, suggesting that natural repopulation of overfished regions by sharks from distant reefs is unlikely. The results also indicate that conservation of genetic diversity in gray reef sharks will require management measures on relatively local scales. Our findings of extensive genetic structuring suggests that a high level of genetic isolation is also likely to be the case in unsampled populations of this species.
|
10 |
Detekce dynamických síťových aplikací / Detection of Dynamic Network ApplicationsBurián, Pavel January 2013 (has links)
This thesis deals with detection of dynamic network applications. It describes some of the existing protocols and methods of their identification from IP flow and packet contents. It constitues a design of a detection system based on the automatic creation of regular expressions and describes its implementation. It presents the created regular expressions for BitTorrent and eDonkey protocol. It compares their quality with the solution of L7-filter.
|
Page generated in 0.092 seconds