Spelling suggestions: "subject:"– text generation sequencing""
61 |
Algorithms for non-coding transcriptome analysis and their application to study the germ-layers developmentHita Ardiaca, Andrea 09 July 2024 (has links)
Next-generation sequencing (NGS) ermöglicht das molekulare Profiling von Zellen mit beispiellos hohem Durchsatz. Allerdings ist der Fokus oftmals auf proteinkodierende Proteine beschränkt, wodurch die vollständige Diversität des Transkriptoms übersehen wird. Nicht-kodierende RNA-Moleküle variieren stark in ihrer Biogenese, Struktur und Funktion, wodurch ihre unverzerrte Inklusion in die Analyse erschwert wird. Diese Promotion fokussiert sich auf das Verständnis nicht-kodierender RNA und navigiert durch drei aufeinander aufbauende Säulen in der Analyse, um Beobachtungen in Wissen zu verwandeln: Generierung von Daten, Quantifizierung und Interpretation. Diese drei Säulen werden in den drei Kapiteln der Dissertation aus der bioinformatischen Perspektive adressiert, indem Schlüsselherausforderungen beschrieben und neue Lösungen vorgestellt werden, um die Analyse des gesamten Transkriptoms mit NGS-Techniken zu verbessern. Zunächst wird ein vollautomatischer Algorithmus vorgestellt, welcher die verschiedenen Quellen von aus der Vorberei- tung von Bibliotheken resultierenden Artefakten mittels unüberwachtes Lernen erkennt, was anschließend zur Optimierung der Protokolle zur Vorbereitung von total-RNA-seq-Bibliotheken genutzt werden kann. Zudem werden die primären Herausforderungen der Quantifizierung von total-RNA-seq behandelt: die Prozessierung von Reads, die mehreren, möglicherweise überlappenden Loci zugeordnet werden können, wie auch die Tatsache, dass manche Loci mehrfach im Genom vorkommen und ein Read zu all diesen Loci passen kann. Diese beiden Fälle können auch gleichzeitig vorkommen, was die Analyse von nicht-kodierender RNA mit üblichen Methoden erschwert. Um diese Problematik anzugehen, wird eine neue Software namens Multi-Graph count (MGcount) vorgestellt. Diese ordnet hierarchisch Reads Transkripten zu, um unter anderem eine Diskrepanz zwischen der Loci-Länge von small und long RNA zu berücksichtigen. Wenn Reads konsistent mehrfach alignieren, fasst MGcount Loci in Communitys zusammen. Es wird gezeigt, dass die Beurteilung der Expression auf der Community-Ebene eine genauere Quantifizierung von biologisch bedeutsamen RNA-Einheiten (Einfachtranskript oder Locusfamilien) ermöglicht. Schließlich wird MGcount angewandt, um nicht-kodierende RNA während der Differenzierung von induzierten pluripotenten Stammzellen in die Keimblätter Mesoderm, Endoderm und Ektoderm zu analysieren. In dieser Dissertation wird eine Multi-Omics-Analyse erfolgreich angewandt, um sowohl die Expressionsverläufe von verschiedenen RNA-Biotypen während der Determination zu charakterisieren als auch einen Zusammenhang bezüglich Chromatin-Remodellierung (“chromatin remodeling“) und DNA-Methylierung an den jeweiligen Loci herzustellen. Schlussendlich dient diese Dissertation als Ratgeber für alle Forschenden, die neue Einsichten in das nicht-kodierende Transkriptom gewinnen wollen. / Next-generation sequencing (NGS) techniques enable the molecular profiling of cells with unprecedented high throughput. Yet, in transcriptome analysis, the focus is often restricted to protein-coding RNA, overlooking the transcriptome in its entire diversity. Non-coding RNA molecules largely vary in biogenesis, structure and function and this challenges their unbiased inclusion into the analyses. This doctoral research places non-coding RNA understanding at the focus spot and navigates through the three workflow pillars that must align effectively to turn observations into knowledge: data generation, quantification, and interpretation. Throughout three chapters, this Thesis addresses these pillars from a Bioinformatics perspective, by outlining key challenges and introducing novel solutions to improve whole-transcriptome analysis through NGS techniques. First, we introduce a fully automatic algorithm that identifies sources of library preparation artifacts in an unsupervised manner and we demonstrate its utility within the development and optimization of total-RNA-seq library preparation protocols. Secondly, we address a major challenge in total-RNA-seq quantification; processing reads that align to multiple loci that overlap within the same genomic region or/and multiple loci that are present in high copy numbers. Such ambiguous alignments commonly arise due to the inherent characteristics of non-coding RNA. To tackle this, we introduce a novel software, named Multi-Graph count (MGcount), that hierarchically assigns reads to transcripts to account for loci length disparity between small-RNA and long-RNA and subsequently collapses loci where reads consistently multi-map into communities defined in a data-driven fashion. We show that these cohesive communities allow the quantification of biologically meaningful RNA entities (single-transcripts or locus-families) and estimate their abundance more accurately. Finally, we apply the developed method to investigate non-coding RNA in early development, specifically during the differentiation of Induced Pluripotent Stem Cells into the three germ-layer lineages, namely, mesoderm, endoderm, and ectoderm. In this study, we leverage a multi-omics analysis to characterize the expression trajectories of diverse RNA biotypes along cell-commitment and the interplay with chromatin remodeling and DNA methylation patterns at the locus surroundings. Ultimately, this work is intended to serve as a guide for all those who want to gain new insights from the non-coding transcriptome.
|
62 |
CoenzymeQ10-associated gene mutations in South African patients with respiratory chain deficiencies / Lindi-Maryn JonckJonck, Lindi-Maryn January 2015 (has links)
CoenzymeQ10 (CoQ10) functions as an electron carrier in mitochondria transporting electrons from CI and CII to CIII in the respiratory chain (RC) for normal cellular energy (ATP) production. Mutations in genes of a complicated and not yet well understood CoQ10 biosynthesis cause primary CoQ10 deficiency, a rare autosomal recessive mitochondrial disorder (MD) with diverse heterogeneous clinical phenotypes. Although the major function of CoQ10 is to serve as electron transfer molecule it furthermore possesses multiple metabolic functions which can result in secondary CoQ10 deficiency. Five main clinical phenotypes are associated with CoQ10 deficiency although distinct genotype-phenotype associations are still absent due to the limited molecular genetic diagnoses of most reported CoQ10 deficiency cases. A correlation was found between reduced levels of CoQ10 in muscle tissue and deficient CII + III RC enzyme activities in a South African patient cohort, the current indicators for potential CoQ10 deficiency. The aim of the study was therefore to identify nuclear-encoded mutations in genes associated with CoQ10 deficiencies in a cohort of South African patients diagnosed with respiratory chain deficiencies (RCDs). A high throughput target enrichment strategy was performed in order to identify previously reported and/or possible novel CoQ10-assosciated disease-causing variants using Ion Torrent next generation sequencing (NGS) and an in-house developed bioinformatics pipeline. The data obtained were compared to clinical presentations of the patients to interpret the results of the identified variants considered to be possibly pathogenic. Targeted genes associated with primary CoQ10- and secondary CoQ10 deficiency was successfully sequenced in 24 patients, identifying 16 possible disease-causing variants. Of these variants three compound heterozygous variants were identified in three patients in genes ETFDH, COQ6 and COQ7, which were considered to be pathogenic according to the available data provided. Further validation of these three variants supported its pathogenicity in at least two of these variants (ETFDH and COQ6). In conclusion: This study contributed to better understanding the aetiology of a South African cohort of patients diagnosed with MDs. It also highlighted the valuable role of NGS for such investigations, and furthermore identified areas in the biochemical and molecular diagnostic strategy where improvements could be made in the future. / MSc (Biochemistry), North-West University, Potchefstroom Campus, 2015
|
63 |
CoenzymeQ10-associated gene mutations in South African patients with respiratory chain deficiencies / Lindi-Maryn JonckJonck, Lindi-Maryn January 2015 (has links)
CoenzymeQ10 (CoQ10) functions as an electron carrier in mitochondria transporting electrons from CI and CII to CIII in the respiratory chain (RC) for normal cellular energy (ATP) production. Mutations in genes of a complicated and not yet well understood CoQ10 biosynthesis cause primary CoQ10 deficiency, a rare autosomal recessive mitochondrial disorder (MD) with diverse heterogeneous clinical phenotypes. Although the major function of CoQ10 is to serve as electron transfer molecule it furthermore possesses multiple metabolic functions which can result in secondary CoQ10 deficiency. Five main clinical phenotypes are associated with CoQ10 deficiency although distinct genotype-phenotype associations are still absent due to the limited molecular genetic diagnoses of most reported CoQ10 deficiency cases. A correlation was found between reduced levels of CoQ10 in muscle tissue and deficient CII + III RC enzyme activities in a South African patient cohort, the current indicators for potential CoQ10 deficiency. The aim of the study was therefore to identify nuclear-encoded mutations in genes associated with CoQ10 deficiencies in a cohort of South African patients diagnosed with respiratory chain deficiencies (RCDs). A high throughput target enrichment strategy was performed in order to identify previously reported and/or possible novel CoQ10-assosciated disease-causing variants using Ion Torrent next generation sequencing (NGS) and an in-house developed bioinformatics pipeline. The data obtained were compared to clinical presentations of the patients to interpret the results of the identified variants considered to be possibly pathogenic. Targeted genes associated with primary CoQ10- and secondary CoQ10 deficiency was successfully sequenced in 24 patients, identifying 16 possible disease-causing variants. Of these variants three compound heterozygous variants were identified in three patients in genes ETFDH, COQ6 and COQ7, which were considered to be pathogenic according to the available data provided. Further validation of these three variants supported its pathogenicity in at least two of these variants (ETFDH and COQ6). In conclusion: This study contributed to better understanding the aetiology of a South African cohort of patients diagnosed with MDs. It also highlighted the valuable role of NGS for such investigations, and furthermore identified areas in the biochemical and molecular diagnostic strategy where improvements could be made in the future. / MSc (Biochemistry), North-West University, Potchefstroom Campus, 2015
|
64 |
From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomesClarke, Andrew, Prost, Stefan, Stanton, Jo-Ann, White, W. T., Kaplan, Matthew, Matisoo-Smith, Elizabeth, The, Genographic Consortium January 2014 (has links)
BACKGROUND:Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users.RESULTS:Here we present an 'A to Z' protocol for obtaining complete human mitochondrial (mtDNA) genomes - from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling).CONCLUSIONS:All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual 'modules' can be swapped out to suit available resources.
|
65 |
The genomic signatures of adaptive evolution in PopulusWang, Jing January 2016 (has links)
Understanding the genetic basis of adaptive evolution, and how natural selection has shaped patterns of polymorphism and divergence within and between species are enduring goals of evolutionary genetics. In this thesis, I used whole genome re-sequencing data to characterize the genomic signatures of natural selection along different evolutionary timescales in three Populus species: Populus tremula, P. tremuloides and P. trichocarpa. First, our study shows multiple lines of evidence suggesting that natural selection, due to both positive and purifying selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. Second, we characterize the evolution of genomic divergence patterns between two recently diverged aspen species: P. tremula and P. tremuloides. Our findings indicate that the two species diverged ~2.2-3.1 million years ago, coinciding with the severing of the Bering land bridge and the onset of dramatic climatic oscillations during the Pleistocene. We further explore different mechanisms that may explain the heterogeneity of genomic divergence, and find that variation in linked selection and recombination likely plays a key role in generating the heterogeneous genomic landscape of differentiation between the two aspen species. Third, we link whole-genome polymorphic data with local environmental variables and phenotypic variation in an adaptive trait to investigate the genomic basis of local adaptation in P. tremula along a latitudinal gradient across Sweden. We find that a majority of single nucleotide polymorphisms (SNPs) (>90%) identified as being involved in local adaptation are tightly clustered in a single genomic region on chromosome 10. The signatures of selection at this region are more consistent with soft rather than hard selective sweeps, where multiple adaptive haplotypes derived from standing genetic variation sweep through the populations simultaneously, and where different haplotypes rise to high frequency in different latitudinal regions. In summary, this thesis uses phylogenetic comparative approaches to elucidate how various evolutionary forces have shaped genome-wide patterns of sequence evolution in Populus. / <p>The research in this thesis was supported by the Swedish research council (to Pär K. Ingvarsson) and the JC Kempe Memorial Scholarship Foundation (to Jing Wang). The PhD study of Jing Wang in Sweden was funded by the State Scholarship from China Scholarship council.</p>
|
66 |
Tagging systems for sequencing large cohortsNeiman, Mårten January 2010 (has links)
<p>Advances in sequencing technologies constantly improves the throughput andaccuracy of sequencing instruments. Together with this development comes newdemands and opportunities to fully take advantage of the massive amounts of dataproduced within a sequence run. One way of doing this is by analyzing a large set ofsamples in parallel by pooling them together prior to sequencing and associating thereads to the corresponding samples using DNA sequence tags. Amplicon sequencingis a common application for this technique, enabling ultra deep sequencing andidentification of rare allelic variants. However, a common problem for ampliconsequencing projects is formation of unspecific PCR products and primer dimersoccupying large portions of the data sets.</p><p>This thesis is based on two papers exploring these new kinds of possibilities andissues. In the first paper, a method for including thousands of samples in the samesequencing run without dramatically increasing the cost or sample handlingcomplexity is presented. The second paper presents how the amount of high qualitydata from an amplicon sequencing run can be maximized.</p><p>The findings from the first paper shows that a two-tagging system, where the first tagis introduced by PCR and the second tag is introduced by ligation, can be used foreffectively sequence a cohort of 3500 samples using the 454 GS FLX Titaniumchemistry. The tagging procedure allows for simple and easy scalable samplehandling during sequence library preparation. The first PCR introduced tags, that arepresent in both ends of the fragments, enables detection of chimeric formation andhence, avoiding false typing in the data set.</p><p>In the second paper, a FACS-machine is used to sort and enrich target DNA covered emPCR beads. This is facilitated by tagging quality beads using hybridization of afluorescently labeled target specific DNA probe prior to sorting. The system wasevaluated by sequencing two amplicon libraries, one FACS sorted and one standardenriched, on the 454 showing a three-fold increase of quality data obtained.</p> / QC20100907
|
67 |
Rule-Based Approaches for Large Biological Datasets Analysis : A Suite of Tools and MethodsKruczyk, Marcin January 2013 (has links)
This thesis is about new and improved computational methods to analyze complex biological data produced by advanced biotechnologies. Such data is not only very large but it also is characterized by very high numbers of features. Addressing these needs, we developed a set of methods and tools that are suitable to analyze large sets of data, including next generation sequencing data, and built transparent models that may be interpreted by researchers not necessarily expert in computing. We focused on brain related diseases. The first aim of the thesis was to employ the meta-server approach to finding peaks in ChIP-seq data. Taking existing peak finders we created an algorithm that produces consensus results better than any single peak finder. The second aim was to use supervised machine learning to identify features that are significant in predictive diagnosis of Alzheimer disease in patients with mild cognitive impairment. This experience led to a development of a better feature selection method for rough sets, a machine learning method. The third aim was to deepen the understanding of the role that STAT3 transcription factor plays in gliomas. Interestingly, we found that STAT3 in addition to being an activator is also a repressor in certain glioma rat and human models. This was achieved by analyzing STAT3 binding sites in combination with epigenetic marks. STAT3 regulation was determined using expression data of untreated cells and cells after JAK2/STAT3 inhibition. The four papers constituting the thesis are preceded by an exposition of the biological, biotechnological and computational background that provides foundations for the papers. The overall results of this thesis are witness of the mutually beneficial relationship played by Bioinformatics in modern Life Sciences and Computer Science.
|
68 |
Genome Evolution of Neurospora tetraspermaSun, Yu January 2013 (has links)
In this thesis work, I have used a comparative genomics approach to study a fungal model organism, Neurospora tetrasperma. My specific focus has been on genomic introgression, intron evolution, chromosomal structural rearrangements and codon usage. All of the studies are based on large-scale dataset generated by next-generation sequencing technology (NGS), combined with other techniques, such as Optical Mapping. In the introgression study, we detected large-scale introgression tracts in three N. tetrasperma lineages, and the introgression showed allele-specific and chromosomal-specific pattern. In the study of introns, we found indications of mRNA mediated intron loss and non-homologous end joining (NHEJ) mediated intron gains in N. tetrasperma. We found that selection is involved in shaping intron gains and losses, and associated with intron position, intron phase and GC content. In the study of chromosomal structural rearrangements, we found a lineage specific chromosomal inversion pattern in N. tetrasperma, which indicates that inversions are unlikely to associate with the origin of the suppressed recombination and the mating system transition in N. tetrasperma. The result suggests inversions are the consequences, rather than the causes, of suppressed recombination on the mating-type chromosome of N. tetrasperma. In the final study, analyses of codon usage indicated that the region of suppressed recombination in N. tetrasperma is subjected to genomic degeneration, and selection efficiency has been much reduced in this region.
|
69 |
Identification, Validation and Characterization of the Mutation on Chromosome 18p which is Responsible for Causing Myoclonus-DystoniaVanstone, Megan 02 November 2012 (has links)
Myoclonus-Dystonia (MD) is an inherited, rare, autosomal dominant movement disorder characterized by quick, involuntary muscle jerking or twitching (myoclonus) and involuntary muscle contractions that cause twisting and pulling movements, resulting in abnormal postures (dystonia). The first MD locus was mapped to 7q21-q31 and called DYT11; this locus corresponds to the SGCE gene. Our group previously identified a second MD locus (DYT15) which maps to a 3.18 Mb region on 18p11. Two patients were chosen to undergo next-generation sequencing, which identified 2,292 shared novel variants within the critical region. Analysis of these variants revealed a 3 bp duplication in a transcript referred to as CD108131, which is believed to be a long non-coding RNA. Characterization of this transcript determined that it is 863 bp in size, it is ubiquitously expressed, with high expression in the cerebellum, and it accounts for ~3% of MD cases.
|
70 |
Next generation sequencing identifies ‘interactome’ signatures in relapsed and refractory metastatic colorectal cancerJohnson, Benny, Cooke, Laurence, Mahadevan, Daruka 02 1900 (has links)
Background: In the management of metastatic colorectal cancer (mCRC), KRAS, NRAS and BRAF mutational status individualizes therapeutic options and identify a cohort of patients (pts) with an aggressive clinical course. We hypothesized that relapsed and refractory mCRC pts develop unique mutational signatures that may guide therapy, predict for a response and highlight key signaling pathways important for clinical decision making. Methods: Relapsed and refractory mCRC pts (N=32) were molecularly profiled utilizing commercially available next generation sequencing (NGS) platforms. Web-based bioinformatics tools (Reactome/Enrichr) were utilized to elucidate mutational profile linked pathways-networks that have the potential to guide therapy. Results: Pts had progressed on fluoropyrimidines, oxaliplatin, irinotecan, bevacizumab, cetuximab and/or panitumumab. Most common histology was adenocarcinoma (colon N=29; rectal N=3). Of the mutations TP53 was the most common, followed by APC, KRAS, PIK3CA, BRAF, SMAD4, SPTA1, FAT1, PDGFRA, ATM, ROS1, ALK, CDKN2A, FBXW7, TGFBR2, NOTCH1 and HER3. Pts had on average had >= 5 unique mutations. The most frequent activated signaling pathways were: HER2, fibroblast growth factor receptor (FGFR), p38 through BRAF-MEK cascade via RIT and RIN, ARMS-mediated activation of MAPK cascade, and VEGFR2. Conclusions: Dominant driver oncogene mutations do not always equate to oncogenic dependence, hence understanding pathogenic ` interactome(s)' in individual pts is key to both clinically relevant targets and in choosing the next best therapy. Mutational signatures derived from corresponding ` pathway-networks' represent a meaningful tool to (I) evaluate functional investigation in the laboratory; (II) predict response to drug therapy; and (III) guide rational drug combinations in relapsed and refractory mCRC pts.
|
Page generated in 0.1378 seconds