Spelling suggestions: "subject:"DNA barcodes"" "subject:"DNA barcode""
1 |
DNA barcodes and meiofaunal identificationMann, Jenna D. January 2010 (has links)
In recent years there has been a desire to definitively catalogue the life on our planet. In light of the increasing extinction rates that are driven by human activities, it is unlikely that this will be achieved using traditional methods. Whilst most organisms which have a body size of more than 1cm have been described, the vast majority of animal life is smaller than this, collectively known as meiofauna, and is yet to be catalogued. Meiofaunal organisms present a range of problems for traditional taxonomy. Firstly they are microscopic, meaning that morphological features are often difficult to resolve. Secondly these creatures often exhibit cryptic diversity meaning that different species often look the same. Thirdly, it is often the case that the organisms are poorly described in the literature making it very difficult to confirm identification, assuming that someone has already described it. It is possible, however, to obtain DNA sequences from these organisms. DNA barcoding, the use of short sequences of DNA to identify individuals, is now commonly used in a wide range of applications. It has been proposed that a single target gene should be sufficient to describe all organisms this way. Barcodes can be acquired from individuals or from bulk extractions from environmental samples. In the latter case, many of the sequences obtained are novel and unlikely to ever have a type specimen associated with them. When this is the case, assessing the diversity of a sample becomes a computational exercise. However, as yet, there is no agreed standard method adopted for analyzing the barcodes produced. Indeed most methods currently employed lack objectivity. This thesis investigates the efficiency of a range of gene targets and analysis methods for DNA barcoding, with an emphasis on meiofaunal organisms (nematodes, tardigrades and thrips). DNA barcodes were generated for up to three genes for each specimen. Sequences for each gene were analysed using two programs, MOTU_define.pl and DOTUR. These programs use different methods to assign sequences to operational taxonomic units (OTU), which were then compared. An objective method for analysing sequences such as MOTU_define.pl, which relies on only the information contained in the sequences, was found to be most suitable for designating taxa. It does not attempt to apply evolutionary models to the data, and then infer taxa from the derived data. In addition to barcoding, some samples were pre-processed using video capture and editing (VCE). This creates a virtual slide of a specimen so that a sequence can be linked to a morphological identification. VCE proved to be an efficient method to preserve morphological data from specimens.
|
2 |
Exploring Species Diversity and Molecular Evolution of Arachnida through DNA BarcodesYoung, Monica Rose 11 February 2013 (has links)
This thesis investigates species diversity and patterns of molecular evolution in Arachnida through DNA barcoding. The first chapter assesses mite species richness through comprehensive sampling at a subarctic location in Canada. Barcode analysis of 6279 specimens revealed nearly 900 presumptive species with high rates of turnover between major habitat types, demonstrating the utility of DNA barcoding for biodiversity surveys of understudied taxa. The second chapter explores nucleotide composition, indel occurrence, and rates of amino acid evolution in Arachnida. The results suggest a significant shift in nucleotide composition in the arachnid subclasses of Pulmonata (GC = 37.0%) and Apulmonata (GC = 34.2%). Indels were detected in five apulmonate orders, with deletions being much more common than insertions. Finally, rates of amino acid evolution were detected among the orders, and were negatively correlated with generation length, suggesting that generation time is a significant contributor to variation in molecular rates of evolution in arachnids.
|
3 |
Bayesian classification of DNA barcodesAnderson, Michael P. January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Suzanne Dubnicka / DNA barcodes are short strands of nucleotide bases taken from the cytochrome c oxidase
subunit 1 (COI) of the mitochondrial DNA (mtDNA). A single barcode may have the form C
C G G C A T A G T A G G C A C T G . . . and typically ranges in length from 255 to around
700 nucleotide bases. Unlike nuclear DNA (nDNA), mtDNA remains largely unchanged as
it is passed from mother to offspring. It has been proposed that these barcodes may be
used as a method of differentiating between biological species (Hebert, Ratnasingham, and
deWaard 2003). While this proposal is sharply debated among some taxonomists (Will
and Rubinoff 2004), it has gained momentum and attention from biologists. One issue
at the heart of the controversy is the use of genetic distance measures as a tool for species differentiation. Current methods of species classification utilize these distance measures that are heavily dependent on both evolutionary model assumptions as well as a clearly defined "gap" between intra- and interspecies variation (Meyer and Paulay 2005). We point out the limitations of such distance measures and propose a character-based method of species classification which utilizes an application of Bayes' rule to overcome these deficiencies. The proposed method is shown to provide accurate species-level classification. The proposed methods also provide answers to important questions not addressable with current methods.
|
4 |
Tagging systems for sequencing large cohortsNeiman, Mårten January 2010 (has links)
<p>Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets.</p><p>This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized.</p><p>The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set.</p><p>In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained.</p> / QC20100907
|
5 |
Tagging systems for sequencing large cohortsNeiman, Mårten January 2010 (has links)
Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets. This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized. The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set. In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained. / QC20100907
|
6 |
Rhiniidae (Diptera: Oestroidea) diversity in South Africa. Taxonomic review and phylogenetic advances for the Afrotropical regionThomas, Arianna 24 November 2020 (has links)
La familia de dípteros Rhiniidae (Diptera: Oestroidea) se encuentra distribuida fundamentalmente en las areas tropicales y subtropicales de las regiones Afrotropical, Australiana, Oriental y Paleártica. Tradicionalmente era considerada con el rango taxonómico de subfamilia de la familia Calliphoridae. No obstante, estudios filogenéticos recientes, basados en el análisis de caracteres morfológicos y moleculares, evidencian que Calliphoridae no es un grupo monofilético. Esto provocó diversos cambios sistemáticos, considerando a los rhiniidos con el rango taxonómico de familia independiente. Actualmente, se reconocen casi 400 especies de Rhiniidae agrupadas en dos subfamilias y 30 géneros. La región Afrotropical alberga la mayor diversidad de rhiniidos a nivel mundial, con un total aproximado de 170 especies comprendidas en 5 géneros de la subfamilia Rhiniinae y 11 de Cosmininae. Existe muy poca información sobre la diversidad, biología y distribución geográfica de la familia Rhiniidae. El ciclo biológico y en particular los hábitos y morfología larvaria es desconocido para la mayoría de las especies. La mayor parte del conocimiento se limita a unas pocas especies restringidas a enclaves geográficos muy concretos. En general, se conoce que tienen una fuerte asociación ecológica a ambientes naturales, que los adultos frecuentan flores por lo que se cree que son importantes polinizadores y que algunas especies parecen tener una estrecha relación con termitas. En cuanto al estudio de su diversidad y taxonomia, desde los años setenta muy pocas investigaciones se han realizado en relación a Rhiniidae en la región Afrotropical, por lo cual el conocimiento del grupo se encuentra desactualizado. Además, su identificación morfológica, en muchos casos, depende exclusivamente de la terminalia masculina y por lo tanto muchos ejemplares femeninos permanecen sin identificar o inadecuadamente identificados. El objetivo general de esta tesis doctoral es contribuir y actualizar el conocimiento de la familia Rhiniidae en la región Afrotropical, a través del estudio de su diversidad, taxonomía y filogenia en la región, con especial énfasis en Sudáfrica. Para ello, en primer lugar se realizó un estudio taxonómico y de la diversidad de la familia en Sudáfrica, país que a nivel mundial se considera como un hot-spot de biodiversidad. Se examinaron más de 4.000 especímenes de Rhiniidae depositados en colecciones entomológicas de África, Europa y los Estados Unidos. Se generó una lista actualizada de las especies presentes en el país, así como se revisó el estatus taxonómico de las mismas. Adicionalmente, se generaron mapas de distribución histórica y fotografías de alta resolución del habitus para la mayoría de las especies estudiadas. Entre los resultados más importantes que se han obtenido destacan nueve citas nuevas para el país, para un total de 73 especies de Rhiniidae, alrededor de 15 nuevas especies a ser descritas en trabajos futuros y la compilación de información bionómica novedosa para varias especies (Capítulo I). Posteriormente profundizamos dentro de la familia con la revisión taxonómica del género Fainia Zumpt, 1958, exclusivo de la región Afrotropical. Este género incluye siete especies descritas, pero el estatus taxonómico de algunas de ellas es controvertido. Se realizó un estudio morfológico de la terminalia masculina de las especies descritas, junto a la revisión de su respectivo material tipo disponible, para así aclarar el estatus taxonómico de sus especies. De esta forma, se aportan nuevas herramientas de identificación para el género, tales como claves de identificación para ambos sexos, redescripciones, y fotografías de alta resolución de la morfología general del adulto y terminalia masculina, así como nuevas sinonimias. Este estudio se complementó con la homogenización y actualización de la nomenclatura morfológica utilizada para la familia Rhiniidae, así como con la proposición de posibles sinapomorfias para la diagnosis de las dos subfamilias actuales Cosmininae y Rhiniinae (Capítulo II). Finalmente, empleamos herramientas moleculares con el fin de corroborar las identificaciones basadas en morfología, asociar los morfotipos femeninos a sus masculinos conspecíficos, explorar las relaciones filogenéticas entre géneros y especies, y generar la primera biblioteca de códigos de barras de ADN (CO1) para las especies de Rhiniidae. Para ello, generamos fragmentos de códigos de barras de ADN (CO1) de 138 especímenes de Rhiniidae. Para inferir los límites entre especies y su monofilia se utilizaron árboles de Inferencia Bayesiana y Máxima Verosimilitud. Esto se complementó con las variaciones genéticas intraespecíficas e interespecíficas reconstruidas con distancias por pares utilizando el modelo de sustitución de nucleótidos de Kimura-dos-parámetros (K2P) y la delimitación de especies mediante el análisis ABGD. La mayoría de las especies delimitadas a nivel morfológico se lograron recuperar como monofiléticas. Se determinaron entre 65 y 68 posibles especies de Rhiniidae presentes en nuestro estudio, así como 31 morfotipos femeninos se vincularon con éxito a sus machos conspecíficos (Capítulo III). Esta investigación demuestra la importancia de revisar las colecciones entomológicas para mejorar el conocimiento de la diversidad y de usar la información que aportan las etiquetas de los especímenes como un valioso recurso de datos para interpretar: ocurrencia temporal y espacial, preferencias ambientales y asociaciones con plantas u otros organismos como termitas, que a su vez son relevantes para estudios de biología de la conservación, polinización e interacciones ecológicas. Además, los códigos de barras de ADN mostraron eficiencia como medio complementario para la revisión taxonómica de Rhiniidae; sin embargo, entre especies muy similares a nivel morfológico no tuvo el éxito esperado, lo que sugiere una posible divergencia evolutiva reciente y la necesidad de realizar más estudios moleculares. / Parcialmente financiada por H2020 Research and Innovation Staff Exchange Programme of the European Commission (RISE), project 645636: ‘Insect-plant relationships: insights into biodiversity and new applications’ (FlyHigh).
|
7 |
The Systematic Design and Application of Robust DNA BarcodesBuschmann, Tilo 19 September 2016 (has links) (PDF)
High-throughput sequencing technologies are improving in quality, capacity, and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag, index, or barcode that is attached to the sequencing or amplification primer and hence accompanies every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence.
Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and codes based on the Levenshtein distance.
Levenshtein-based codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this thesis we demonstrate the decreased error correction capability of Levenshtein-based codes in a DNA context and suggest an adaptation of Levenshtein-based codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaptation, we take any DNA context into account and impose more strict rules for the selection of barcode sets. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors.
We present an adaptation of Levenshtein-based codes to DNA contexts capable of guaranteed correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved method is additionally capable of correcting on average more random mutations than traditional Levenshtein-based or Hamming codes. As part of this work we prepared software for the flexible generation of DNA codes based on our new approach. To adapt codes to specific experimental conditions, the user can customize sequence filtering, the number of correctable mutations and barcode length for highest performance.
However, not every platform is susceptible to a large number of both indel and substitution errors. The Illumina “Sequencing by Synthesis” platform shows a very large number of substitution errors as well as a very specific shift of the read that results in inserted and deleted bases at the 5’-end and the 3’-end (which we call phaseshifts). We argue in this scenario that the application of Sequence-Levenshtein-based codes is not efficient because it aims for a category of errors that barely occurs on this platform, which reduces the code size needlessly. As a solution, we propose the “Phaseshift distance” that exclusively supports the correction of substitutions and phaseshifts. Additionally, we enable the correction of arbitrary combinations of substitution and phaseshift errors. Thus, we address the lopsided number of substitutions compared to phaseshifts on the Illumina platform.
To compare codes based on the Phaseshift distance to Hamming Codes as well as codes based on the Sequence-Levenshtein distance, we simulated an experimental scenario based on the error pattern we identified on the Illumina platform. Furthermore, we generated a large number of different sets of DNA barcodes using the Phaseshift distance and compared codes of different lengths and error correction capabilities. We found that codes based on the Phaseshift distance can correct a number of errors comparable to codes based on the Sequence-Levenshtein distance while offering the number of DNA barcodes comparable to Hamming codes. Thus, codes based on the Phaseshift distance show a higher efficiency in the targeted scenario. In some cases (e.g., with PacBio SMRT in Continuous Long Read mode), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.
For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.
|
8 |
The Systematic Design and Application of Robust DNA BarcodesBuschmann, Tilo 02 September 2016 (has links)
High-throughput sequencing technologies are improving in quality, capacity, and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag, index, or barcode that is attached to the sequencing or amplification primer and hence accompanies every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence.
Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and codes based on the Levenshtein distance.
Levenshtein-based codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this thesis we demonstrate the decreased error correction capability of Levenshtein-based codes in a DNA context and suggest an adaptation of Levenshtein-based codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaptation, we take any DNA context into account and impose more strict rules for the selection of barcode sets. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors.
We present an adaptation of Levenshtein-based codes to DNA contexts capable of guaranteed correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved method is additionally capable of correcting on average more random mutations than traditional Levenshtein-based or Hamming codes. As part of this work we prepared software for the flexible generation of DNA codes based on our new approach. To adapt codes to specific experimental conditions, the user can customize sequence filtering, the number of correctable mutations and barcode length for highest performance.
However, not every platform is susceptible to a large number of both indel and substitution errors. The Illumina “Sequencing by Synthesis” platform shows a very large number of substitution errors as well as a very specific shift of the read that results in inserted and deleted bases at the 5’-end and the 3’-end (which we call phaseshifts). We argue in this scenario that the application of Sequence-Levenshtein-based codes is not efficient because it aims for a category of errors that barely occurs on this platform, which reduces the code size needlessly. As a solution, we propose the “Phaseshift distance” that exclusively supports the correction of substitutions and phaseshifts. Additionally, we enable the correction of arbitrary combinations of substitution and phaseshift errors. Thus, we address the lopsided number of substitutions compared to phaseshifts on the Illumina platform.
To compare codes based on the Phaseshift distance to Hamming Codes as well as codes based on the Sequence-Levenshtein distance, we simulated an experimental scenario based on the error pattern we identified on the Illumina platform. Furthermore, we generated a large number of different sets of DNA barcodes using the Phaseshift distance and compared codes of different lengths and error correction capabilities. We found that codes based on the Phaseshift distance can correct a number of errors comparable to codes based on the Sequence-Levenshtein distance while offering the number of DNA barcodes comparable to Hamming codes. Thus, codes based on the Phaseshift distance show a higher efficiency in the targeted scenario. In some cases (e.g., with PacBio SMRT in Continuous Long Read mode), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.
For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.
|
Page generated in 0.0494 seconds