Spelling suggestions: "subject:"deepsequencing"" "subject:"estsequencing""
21 |
The Systematic Design and Application of Robust DNA BarcodesBuschmann, Tilo 19 September 2016 (has links) (PDF)
High-throughput sequencing technologies are improving in quality, capacity, and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag, index, or barcode that is attached to the sequencing or amplification primer and hence accompanies every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence.
Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and codes based on the Levenshtein distance.
Levenshtein-based codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this thesis we demonstrate the decreased error correction capability of Levenshtein-based codes in a DNA context and suggest an adaptation of Levenshtein-based codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaptation, we take any DNA context into account and impose more strict rules for the selection of barcode sets. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors.
We present an adaptation of Levenshtein-based codes to DNA contexts capable of guaranteed correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved method is additionally capable of correcting on average more random mutations than traditional Levenshtein-based or Hamming codes. As part of this work we prepared software for the flexible generation of DNA codes based on our new approach. To adapt codes to specific experimental conditions, the user can customize sequence filtering, the number of correctable mutations and barcode length for highest performance.
However, not every platform is susceptible to a large number of both indel and substitution errors. The Illumina “Sequencing by Synthesis” platform shows a very large number of substitution errors as well as a very specific shift of the read that results in inserted and deleted bases at the 5’-end and the 3’-end (which we call phaseshifts). We argue in this scenario that the application of Sequence-Levenshtein-based codes is not efficient because it aims for a category of errors that barely occurs on this platform, which reduces the code size needlessly. As a solution, we propose the “Phaseshift distance” that exclusively supports the correction of substitutions and phaseshifts. Additionally, we enable the correction of arbitrary combinations of substitution and phaseshift errors. Thus, we address the lopsided number of substitutions compared to phaseshifts on the Illumina platform.
To compare codes based on the Phaseshift distance to Hamming Codes as well as codes based on the Sequence-Levenshtein distance, we simulated an experimental scenario based on the error pattern we identified on the Illumina platform. Furthermore, we generated a large number of different sets of DNA barcodes using the Phaseshift distance and compared codes of different lengths and error correction capabilities. We found that codes based on the Phaseshift distance can correct a number of errors comparable to codes based on the Sequence-Levenshtein distance while offering the number of DNA barcodes comparable to Hamming codes. Thus, codes based on the Phaseshift distance show a higher efficiency in the targeted scenario. In some cases (e.g., with PacBio SMRT in Continuous Long Read mode), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.
For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.
|
22 |
The Systematic Design and Application of Robust DNA BarcodesBuschmann, Tilo 02 September 2016 (has links)
High-throughput sequencing technologies are improving in quality, capacity, and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag, index, or barcode that is attached to the sequencing or amplification primer and hence accompanies every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence.
Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and codes based on the Levenshtein distance.
Levenshtein-based codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this thesis we demonstrate the decreased error correction capability of Levenshtein-based codes in a DNA context and suggest an adaptation of Levenshtein-based codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaptation, we take any DNA context into account and impose more strict rules for the selection of barcode sets. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors.
We present an adaptation of Levenshtein-based codes to DNA contexts capable of guaranteed correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved method is additionally capable of correcting on average more random mutations than traditional Levenshtein-based or Hamming codes. As part of this work we prepared software for the flexible generation of DNA codes based on our new approach. To adapt codes to specific experimental conditions, the user can customize sequence filtering, the number of correctable mutations and barcode length for highest performance.
However, not every platform is susceptible to a large number of both indel and substitution errors. The Illumina “Sequencing by Synthesis” platform shows a very large number of substitution errors as well as a very specific shift of the read that results in inserted and deleted bases at the 5’-end and the 3’-end (which we call phaseshifts). We argue in this scenario that the application of Sequence-Levenshtein-based codes is not efficient because it aims for a category of errors that barely occurs on this platform, which reduces the code size needlessly. As a solution, we propose the “Phaseshift distance” that exclusively supports the correction of substitutions and phaseshifts. Additionally, we enable the correction of arbitrary combinations of substitution and phaseshift errors. Thus, we address the lopsided number of substitutions compared to phaseshifts on the Illumina platform.
To compare codes based on the Phaseshift distance to Hamming Codes as well as codes based on the Sequence-Levenshtein distance, we simulated an experimental scenario based on the error pattern we identified on the Illumina platform. Furthermore, we generated a large number of different sets of DNA barcodes using the Phaseshift distance and compared codes of different lengths and error correction capabilities. We found that codes based on the Phaseshift distance can correct a number of errors comparable to codes based on the Sequence-Levenshtein distance while offering the number of DNA barcodes comparable to Hamming codes. Thus, codes based on the Phaseshift distance show a higher efficiency in the targeted scenario. In some cases (e.g., with PacBio SMRT in Continuous Long Read mode), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.
For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.
|
23 |
The role of amyloid beta 4-42 in the etiology of Alzheimer's diseaseBouter, Yvonne 12 November 2014 (has links)
No description available.
|
24 |
Identification of Factors Involved in 18S Nonfunctional Ribosomal RNA Decay and a Method for Detecting 8-oxoguanosine by RNA-SeqLimoncelli, Kelly A. 18 December 2017 (has links)
The translation of mRNA into functional proteins is essential for all life. In eukaryotes, aberrant RNAs containing sequence features that stall or severely slow down ribosomes are subject to translation-dependent quality control. Targets include mRNAs encoding a strong secondary structure (No-Go Decay; NGD) or stretches of positively-charged amino acids (Peptide-dependent Translation Arrest/Ribosome Quality Control; PDTA/RQC), mRNAs lacking an in-frame stop codon (Non-Stop Decay; NSD), or defective 18S rRNAs (18S Nonfunctional rRNA Decay; 18S NRD). Previous work from our lab showed that the S. cerevisiae NGD factors DOM34 and HBS1, and PDTA/RQC factor ASC1, all participate in the kinetics of 18S NRD. Upon further investigation of 18S NRD, our research revealed the critical role of ribosomal protein S3 (RPS3), thus adding to the emerging evidence that the ribosome senses its own translational status.
While aberrant mRNAs mentioned above can occur endogenously, damaging agents, such as oxidative stress or UV irradiation, can negatively affect the chemical integrity of RNA. Such lesions could lead to translation errors and ribosome stalling. However, current tools to monitor the fate of damaged RNA are quite limited and only provide a low-resolution picture. Therefore, we sought to develop a deep-sequencing method to detect damaged RNA, taking advantage of reverse transcriptase's ability to insert a mutation across a damaged site. Using oxidized RNA as a model damaged RNA, our preliminary data showed increased G>T mutations in oxidized RNA. This method provides the foundation for future work aimed at understanding how cells deal with damaged RNA.
|
25 |
Echecs virologiques au sein de cohortes hospitalières de patients adultes infectés par le VIH : apport de l'ultra-deep sequencing et étude des charges virales de faible niveau persistantes / Virological failure in cohorts of HIV-infected patients : contribution of ultra-deep sequencing and impact of persistent low-level viremiaVandenhende, Marie-Anne 24 November 2015 (has links)
L’objectif d’un traitement antirétroviral (ARV) est d’obtenir une charge virale VIH plasmatiqueindétectable afin de réduire la morbi-mortalité associée au VIH.La résistance du virus aux ARV est un facteur de risque d’échec virologique (EV). Les testsgénotypiques de résistance par séquençage classique (méthode de Sanger) ne permettent pas dedétecter les virus porteurs de mutations de résistance (MR) présents à taux minoritairereprésentant moins de 20% de la population virale plasmatique. Dans notre étude (cohorteANRS CO3), l’utilisation de l’ultra-deep sequencing (UDS) a permis de détecter 1.4 fois plusde MR avant traitement et 1.3 fois plus à l’EV en comparaison à la technique de séquençageclassique, confirmant la haute sensibilité de l’UDS pour la détection des MR. Les MRminoritaires détectées uniquement par UDS augmentaient la résistance génotypique du virus autraitement ARV chez 4% des patients à l’initiation du traitement et 21% des patients à l’EV.Les conséquences des épisodes de charges virales de faible niveau persistantes (CVF) entre 50et 200 copies/ml (CVF50-200) ne sont pas clairement établies du fait de l’insuffisance dedonnées dans la littérature. Dans nos études de cohortes (cohortes ANRS CO3 et ART-CC), 4-9% des patients ont présenté au moins un épisode de CVF50-200. La survenue d’une CVF50-200 était associée à un risque plus de 2 fois plus élevé d’EV>200 copies/ml, quels que soient ladurée de la CVF, l’historique de traitement ou le traitement ARV lors de la CVF50-200(régimes comportant des INNTI ou des IP/r). La survenue d’une CVF n’était pas associée à lasurvenue d’un évènement classant SIDA ni au décès, avec toutefois un suivi médian de 3 ans. / The goal of antiretroviral therapy (ART) is to reach undetectable plasma HIV viral load in orderto reduce HIV-related morbidity and mortality.The presence of ART-resistant virus can compromise the efficiency of these treatments,resulting in virological failure (VF). Standard genotyping by Sanger sequencing (SS) usedcurrently in clinical practice cannot detect low-frequency viral variants harbouring drugresistance associated mutations (DRM) representing less than 20% of the viral population. Inour study, the use of ultra-deep sequencing (UDS) allowed us to detect 1.4-fold more DRMsbefore ART and 1.3-fold more DRMs at VF compared to SS, confirming the high sensitivity ofUDS for the detection of DRMs. The low-frequency DMRs detected only by UDS modified thegenotypic resistance of the virus to the prescribed treatment for 4% of the patients before ARTinitiation and for 21% of the patients at VF.The impact of persistent low-level viremia (LLV) between 50 and 200 copies/ml (LLV50-200)remains uncertain due to the lack of controlled comparison data. In our cohort studies (ANRSCO3 and ART-CC cohorts), 4-9% of HIV-infected patients experienced at least one episode ofLLV50-200. LLV50-200 was strongly associated with a twice higher risk of VF>200copies/ml,independently of the duration of LLV, the history of treatment or the type of ART regimen atLLV (NNRTI or PI-based regimens). LLV was not associated with AIDS event or death witha median follow-up of only 3 years.
|
Page generated in 0.0505 seconds