• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • Tagged with
  • 5
  • 5
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Integrating regulatory and methylome data for the discovery of clear cell Renal Cell Carcinoma (ccRCC) variants

Calvert-Joshua, Tracey January 2015 (has links)
>Magister Scientiae - MSc / Kidney cancers, of which clear cell renal cell carcinoma comprises an estimated 70%, have been placed amongst the top ten most common cancers in both males and females. With a mortality rate that exceeds 40%, kidney cancer is considered the most lethal cancer of the genitourinary system. Despite advances in its treatment, the mortality- and incidence rates across all stages of the disease have continued to climb. Since the release of the Human Genome Project in the early 2000’s, most genetics studies have focused on the protein coding region of the human genome, which accounts for a mere 2% of the entire genome. It has been suggested that diverting our focus to the other 98% of the genome, which was previously dismissed as non-functional “junk DNA”, could possibly contribute significantly to our understanding of the underlying mechanisms of complex diseases.In this study a whole genome sequencing somatic mutation data set from the International Cancer Genome Consortium was used. The non-coding somatic mutations within the promoter, intronic, 5-prime untranslated and 3-prime untranslated regions of clear cell renal cell carcinoma-implicated genes were extracted and submitted to RegulomDB for their functional annotation.As expected, most of the variants were located within the intronic regions and only a small subset of identified variants was predicted to be deleterious. Although the variants all belonged to a selected subset of kidney cancer-associated genes, the genes frequently mutated in the non-coding regions were not the same genes that were frequently mutated in the whole exome studies (where the focus is on the coding sequences). This indicates that with whole genome sequencing studies a new set of genes/variants previously unassociated with the clear cell renal cell carcinoma could be identified. In addition, most of the non-coding somatic variants fell within multiple transcriptions factor binding sites. Since many of these variants were also deleterious (as predicted by RegulomDB), this suggests that mutations in the non-coding regions could contribute to disease due to their role in transcription factor binding site disruptions and their subsequent impact on transcriptional regulation. The substantial overlap between the genes with the most aberrantly methylated variants and the genes with the most transcription factor binding site disruptions signifies a potential link between differential methylation and transcription factor binding site affinities. In contrast to the upregulated DNA methylation generally seen in promoter methylation studies, all of the significant hits in this study were hypomethylated, with the subsequent up-regulation of the genes of interest, suggesting that in the clear cell renal cell carcinoma, aberrant methylation may play a role in activating proto-oncogenes, rather than the silencing of genes. When a cross-analysis was carried out between the gene expression patterns and the transcription factor binding site disruptions, the non-coding somatic variants and differential methylation profiles, the genes affected again showed a clear overlap. Interestingly, most of the variants were not present in the 1000genomes data and thus represent novel mutations, which possibly occurred as a result of genomic instability. However, identifying novel variants are always promising, since they epitomise the possibility of developing pioneering ways to target diseases. The numerous detrimental effects a single non-coding mutation can have on other genomic processes have been demonstrated in this study and therefore validate the inclusion of non-coding regions of the genome in genetic studies in order to study complex multifactorial diseases. / National Research Foundation (NRF) and DAAD
2

Differential evolution of non-coding DNA across eukaryotes and its close relationship with complex multicellularity on Earth

Lozada Chávez, Irma 06 April 2023 (has links)
Here, I elaborate on the hypothesis that complex multicellularity (CM, sensu Knoll) is a major evolutionary transition (sensu Szathmary), which has convergently evolved a few times in Eukarya only: within red and brown algae, plants, animals, and fungi. Paradoxically, CM seems to correlate with the expansion of non-coding DNA (ncDNA) in the genome rather than with genome size or the total number of genes. Thus, I investigated the correlation between genome and organismal complexities across 461 eukaryotes under a phylogenetically controlled framework. To that end, I introduce the first formal definitions and criteria to distinguish ‘unicellularity’, ‘simple’ (SM) and ‘complex’ multicellularity. Rather than using the limited available estimations of unique cell types, the 461 species were classified according to our criteria by reviewing their life cycle and body plan development from literature. Then, I investigated the evolutionary association between genome size and 35 genome-wide features (introns and exons from protein-coding genes, repeats and intergenic regions) describing the coding and ncDNA complexities of the 461 genomes. To that end, I developed ‘GenomeContent’, a program that systematically retrieves massive multidimensional datasets from gene annotations and calculates over 100 genome-wide statistics. R-scripts coupled to parallel computing were created to calculate >260,000 phylogenetic controlled pairwise correlations. As previously reported, both repetitive and non-repetitive DNA are found to be scaling strongly and positively with genome size across most eukaryotic lineages. Contrasting previous studies, I demonstrate that changes in the length and repeat composition of introns are only weakly or moderately associated with changes in genome size at the global phylogenetic scale, while changes in intron abundance (within and across genes) are either not or only very weakly associated with changes in genome size. Our evolutionary correlations are robust to: different phylogenetic regression methods, uncertainties in the tree of eukaryotes, variations in genome size estimates, and randomly reduced datasets. Then, I investigated the correlation between the 35 genome-wide features and the cellular complexity of the 461 eukaryotes with phylogenetic Principal Component Analyses. Our results endorse a genetic distinction between SM and CM in Archaeplastida and Metazoa, but not so clearly in Fungi. Remarkably, complex multicellular organisms and their closest ancestral relatives are characterized by high intron-richness, regardless of genome size. Finally, I argue why and how a vast expansion of non-coding RNA (ncRNA) regulators rather than of novel protein regulators can promote the emergence of CM in Eukarya. As a proof of concept, I co-developed a novel ‘ceRNA-motif pipeline’ for the prediction of “competing endogenous” ncRNAs (ceRNAs) that regulate microRNAs in plants. We identified three candidate ceRNAs motifs: MIM166, MIM171 and MIM159/319, which were found to be conserved across land plants and be potentially involved in diverse developmental processes and stress responses. Collectively, the findings of this dissertation support our hypothesis that CM on Earth is a major evolutionary transition promoted by the expansion of two major ncDNA classes, introns and regulatory ncRNAs, which might have boosted the irreversible commitment of cell types in certain lineages by canalizing the timing and kinetics of the eukaryotic transcriptome.:Cover page Abstract Acknowledgements Index 1. The structure of this thesis 1.1. Structure of this PhD dissertation 1.2. Publications of this PhD dissertation 1.3. Computational infrastructure and resources 1.4. Disclosure of financial support and information use 1.5. Acknowledgements 1.6. Author contributions and use of impersonal and personal pronouns 2. Biological background 2.1. The complexity of the eukaryotic genome 2.2. The problem of counting and defining “genes” in eukaryotes 2.3. The “function” concept for genes and “dark matter” 2.4. Increases of organismal complexity on Earth through multicellularity 2.5. Multicellularity is a “fitness transition” in individuality 2.6. The complexity of cell differentiation in multicellularity 3. Technical background 3.1. The Phylogenetic Comparative Method (PCM) 3.2. RNA secondary structure prediction 3.3. Some standards for genome and gene annotation 4. What is in a eukaryotic genome? GenomeContent provides a good answer 4.1. Background 4.2. Motivation: an interoperable tool for data retrieval of gene annotations 4.3. Methods 4.4. Results 4.5. Discussion 5. The evolutionary correlation between genome size and ncDNA 5.1. Background 5.2. Motivation: estimating the relationship between genome size and ncDNA 5.3. Methods 5.4. Results 5.5. Discussion 6. The relationship between non-coding DNA and Complex Multicellularity 6.1. Background 6.2. Motivation: How to define and measure complex multicellularity across eukaryotes? 6.3. Methods 6.4. Results 6.5. Discussion 7. The ceRNA motif pipeline: regulation of microRNAs by target mimics 7.1. Background 7.2. A revisited protocol for the computational analysis of Target Mimics 7.3. Motivation: a novel pipeline for ceRNA motif discovery 7.4. Methods 7.5. Results 7.6. Discussion 8. Conclusions and outlook 8.1. Contributions and lessons for the bioinformatics of large-scale comparative analyses 8.2. Intron features are evolutionarily decoupled among themselves and from genome size throughout Eukarya 8.3. “Complex multicellularity” is a major evolutionary transition 8.4. Role of RNA throughout the evolution of life and complex multicellularity on Earth 9. Supplementary Data Bibliography Curriculum Scientiae Selbständigkeitserklärung (declaration of authorship)
3

Birds as a Model for Comparative Genomic Studies

Künstner, Axel January 2011 (has links)
Comparative genomics provides a tool to investigate large biological datasets, i.e. genomic datasets. In my thesis I focused on inferring patterns of selection in coding and non-coding regions of avian genomes. Until recently, large comparative studies on selection were mainly restricted to model species with sequenced genomes. This limitation has been overcome with advances in sequencing technologies and it is now possible to gather large genomic data sets for non-model species.  Next-generation sequencing data was used to study patterns of nucleotide substitutions and from this we inferred how selection has acted in the genomes of 10 non-model bird species. In general, we found evidence for a negative correlation between neutral substitution rate and chromosome size in birds. In a follow up study, we investigated two closely related bird species, to study expression levels in different tissues and pattern of selection. We found that between 2% and 18% of all genes were differentially expressed between the two species. We showed that non-coding regions adjacent to genes are under evolutionary constraint in birds, which suggests that noncoding DNA plays an important functional role in the genome. Regions downstream to genes (3’) showed particularly high level of constraint. The level of constraint in these regions was not correlated to the length of untranslated regions, which suggests that other causes play also a role in sequence conservation. We compared the rate of nonsynonymous substitutions to the rate of synonymous substitutions in order to infer levels of selection in protein-coding sequences. Synonymous substitutions are often assumed to evolve neutrally. We studied synonymous substitutions by estimating constraint on 4-fold degenerate sites of avian genes and found significant evolutionary constraint on this category of sites (between 24% and 43%). These results call for a reappraisal of synonymous substitution rates being used as neutral standards in molecular evolutionary analysis (e.g. the dN/dS ratio to infer positive selection). Finally, the problem of sequencing errors in next-generation sequencing data was investigated. We developed a program that removes erroneous bases from the reads. We showed that low coverage sequencing projects and large genome sequencing projects will especially gain from trimming erroneous reads.
4

Mutational dynamics and phylogenetic utility of plastid introns and spacers in early branching eudicots

Barniske, Anna-Magdalena 22 January 2010 (has links) (PDF)
Major progress has been made during the last twenty years towards a better understanding of the evolution of angiosperms. Early molecular-phylogenetic analyses revealed three major groups, with eudicots as well as monocots being monophyletic, arisen from a paraphyletic group of dicotyledonous angiosperms (= basal angiosperms). Consistently, numerous phylogenetic studies based on sequence data have recovered the eudicot-clade and increased confidence in its existence. Furthermore this clade, which contains about 75% of angiosperm species diversity, is characterized by the possession of tricolpate and tricolpate-derived pollen and has thus also been called the tricolpate clade. Based on molecular-phylogenetic investigations several lineages, such as Ranunculales, Proteales (= Proteaceae, Nelumbonaceae, Platanaceae), Sabiaceae, Buxaceae plus Didymelaceae, and Trochodendraceae plus Tetracentraceae were shown as belonging to a early-diverging grade (early-diverging or “basal” eudicots), while larger groups like asterids, Caryophyllales, rosids, Santalales, and Saxifragales were identified as being members of a highly supported core clade, the so called “core eudicots”. Nevertheless, phylogenetic relationships among several lineages of the eudicots remained difficult to resolve. This thesis is mainly concentrated on fully resolving the branching order among the different clades of the early-diverging eudicots as well as on clarifying phylogenetic and systematic conditions within several lineages, based on phylogenetic reconstructions using sequence data of rapidly-evolving and non-coding molecular regions, such as spacers and introns. Commonly, fast-evolving and non-coding DNA was used to infer relationships among species and genera, as practised in chapter 3, due to the assumption of being inapplicable caused by putative high levels of homoplasy through multiple substitutions and frequent microstructural changes resulting in non-alignability. However, during the last few years numerous molecular-phylogenetic studies were able to present well resolved angiosperm trees on the basis of rapidly-evolving and non-coding regions from the large single copy region of the chloroplast genome comparable to multi-gene analyses concerning topology and statistical support. Mutational dynamics in spacers and introns was revealed to follow complex patterns related to structural constraints like the introns secondary structure. Therefore extreme sequence variability was always confirmed to mutational hotspots that could be excluded from calculations. Moreover it became clear that combining these non-coding regions with the fast-evolving matK gene can lead to further resolved and statistical supported trees. Chapter 1 deals with the placement of Sabiales inside the early-diverging eudicot grade, while investigating mutational dynamics as well as the utility of different kinds of non-coding and rapidly-evolving DNA within deep-level phylogenetics. It was done by analyzing a combination of nine regions from the large single copy region of the chloroplast genome, including spacers, the sole group I intron, three group II introns and the coding matK for a sampling of 56 taxa. The presented topology is in mainly congruence with the hypothesis on phylogenetic relationships among early-branching eudicots that was gained through the application of a reduced set of five non-coding and fast-evolving molecular markers, including the plastid petD (petB-petD spacer, petD group II intron) plus the trnL-F (trnL group I intron, trnL-F spacer) region and the matK gene. It showed a grade of Ranunculales, Sabiales, Proteales, Trochodendrales and Buxales. The current study differs in showing Sabiales as sister to Proteales in all phylogenetic analyses, in contrast to a second-branching inside early-diverging eudicots and a Bayesian tree displaying Sabiales branching after Proteales. All three hypotheses were tested concerning their likelihood. None of them was shown as being significantly declinable. Thus, albeit the number of characters and informative sites was doubled in comparision to the five-region investigation, the exact position of the Sabiales remained to be resolved with confidence. However, the advanced analyses of the phylogenetic structure of the three different non-coding partitions in comparison to coding genes resulted in the recognition of a significantly higher mean phylogenetic signal per informative character within spacers and introns than in the frequently applied slowly-evolving rbcL gene. The fast-evolving and well performing matK gene is shown to be nested within the non-coding partitions in this respect. Interestingly, the least constrained spacers displayed considerably less phylogenetic structure than both, the group I intron and the group II introns. Molecular evolution is again shown to follow certain patterns in angiosperms, as indicated by the occurrence of mutational hotspots and their connection to structural and functional constraints. This is especially shown for the group II introns studied where highly dynamic sequence parts were rather found in loops than stems. The aim of chapter 2 was to present a comprehensive reconstruction of the phylogenetic relationships inside the order of Ranunculales, the first-branching clade of the early-diverging eudicots, with an emphasis on the evolution of growth forms within the group. Currently, the order comprises seven families (Ranunculaceae, Berberidaceae, Menispermaceae, Lardizabalaceae, Circaeasteraceae – not included due to lacking plant material, Eupteleaceae, Papaveraceae) containing predominantly herbaceous groups as well as trees and lianescent/shrubby forms. A surprising result that emerged due to the increased use of molecular data within systematics during the last twenty years is the inclusion of the woody Eupteleaceae into Ranunculales. Because of its adaptation to wind pollination it was previously placed next to Hamamelididea. Although phylogenetic hypotheses agreed in the exclusion of Eupteleaceae and the predominantly herbaceous Papaveraceae from a core clade the branching order within early-diverging Ranunculales remained a question to be answered. Thus phylogenetic reconstructions based on molecular data of 50 taxa (including outgroup), applying the well-performing non-coding petD and trnL-F as well as the trnK/matK-psbA region including the coding matK, were carried out. The comprehensive sampling resulted in fully resolved and highly supported phylogenies in both, maximum parsimony and model based approaches, with family relations within the core clade being identical and Euptelea appearing as first branching lineage. However, the relationships among the early-diverging Ranunculales could not be resolved with confidence, a result in line with the finding made in chapter 1. The topology was further resolved as Lardizabalaceae being sister to the remaining members of the order, followed by Menispermaceae, Berberidaceae and Ranunculaceae, the latter sharing a sistergroup relationship. Inside the mainly lianescent Lardizabalaceae the shrubby Decaisnea was clearly depicted as first-branching. The systematic controversial Glaucidium and Hydrastis are shown to be early-diverging members of the Ranunculaceae. A central goal of chapter 3 was to test phylogenetic relationships among the members of the ranunculaceous tribe Anemoneae. Currently it consists of the subtribes Anemoninae including Anemone, Hepatica, Pulsatilla and Knowltonia, and Clematidinae, consisting of Archiclematis, Clematis and Naravelia. Furthermore the position and taxonomic rank of several lineages inside the subtribe Anemoninae were examined. Since recent comprehensive molecular-phylogenetic investigations have been carried out for the members of Clematidinae or Anemoninae, 63 species representing all major lineages of the two subtribes were included into analyses. Calculations were carried out on the basis of molecular data of the nuclear ribosomal ITS1&2 and the plastid atpB-rbcL intergenic spacer region. Phylogenetic reconstructions resulted in the recognition of two distinct clades within the tribe, thus corroborating the formation of the two subtribes. Within the subtribe Anemoninae the traditional genera Knowltonia, Pulsatilla and Hepatica are confidently shown to be nested within the genus Anemone. The preliminary classification of the genus, currently consisting of the two subgenera Anemone and Anemonidium, is complemented by the subgenus Hepatica.
5

Mutational dynamics and phylogenetic utility of plastid introns and spacers in early branching eudicots

Barniske, Anna-Magdalena 16 December 2009 (has links)
Major progress has been made during the last twenty years towards a better understanding of the evolution of angiosperms. Early molecular-phylogenetic analyses revealed three major groups, with eudicots as well as monocots being monophyletic, arisen from a paraphyletic group of dicotyledonous angiosperms (= basal angiosperms). Consistently, numerous phylogenetic studies based on sequence data have recovered the eudicot-clade and increased confidence in its existence. Furthermore this clade, which contains about 75% of angiosperm species diversity, is characterized by the possession of tricolpate and tricolpate-derived pollen and has thus also been called the tricolpate clade. Based on molecular-phylogenetic investigations several lineages, such as Ranunculales, Proteales (= Proteaceae, Nelumbonaceae, Platanaceae), Sabiaceae, Buxaceae plus Didymelaceae, and Trochodendraceae plus Tetracentraceae were shown as belonging to a early-diverging grade (early-diverging or “basal” eudicots), while larger groups like asterids, Caryophyllales, rosids, Santalales, and Saxifragales were identified as being members of a highly supported core clade, the so called “core eudicots”. Nevertheless, phylogenetic relationships among several lineages of the eudicots remained difficult to resolve. This thesis is mainly concentrated on fully resolving the branching order among the different clades of the early-diverging eudicots as well as on clarifying phylogenetic and systematic conditions within several lineages, based on phylogenetic reconstructions using sequence data of rapidly-evolving and non-coding molecular regions, such as spacers and introns. Commonly, fast-evolving and non-coding DNA was used to infer relationships among species and genera, as practised in chapter 3, due to the assumption of being inapplicable caused by putative high levels of homoplasy through multiple substitutions and frequent microstructural changes resulting in non-alignability. However, during the last few years numerous molecular-phylogenetic studies were able to present well resolved angiosperm trees on the basis of rapidly-evolving and non-coding regions from the large single copy region of the chloroplast genome comparable to multi-gene analyses concerning topology and statistical support. Mutational dynamics in spacers and introns was revealed to follow complex patterns related to structural constraints like the introns secondary structure. Therefore extreme sequence variability was always confirmed to mutational hotspots that could be excluded from calculations. Moreover it became clear that combining these non-coding regions with the fast-evolving matK gene can lead to further resolved and statistical supported trees. Chapter 1 deals with the placement of Sabiales inside the early-diverging eudicot grade, while investigating mutational dynamics as well as the utility of different kinds of non-coding and rapidly-evolving DNA within deep-level phylogenetics. It was done by analyzing a combination of nine regions from the large single copy region of the chloroplast genome, including spacers, the sole group I intron, three group II introns and the coding matK for a sampling of 56 taxa. The presented topology is in mainly congruence with the hypothesis on phylogenetic relationships among early-branching eudicots that was gained through the application of a reduced set of five non-coding and fast-evolving molecular markers, including the plastid petD (petB-petD spacer, petD group II intron) plus the trnL-F (trnL group I intron, trnL-F spacer) region and the matK gene. It showed a grade of Ranunculales, Sabiales, Proteales, Trochodendrales and Buxales. The current study differs in showing Sabiales as sister to Proteales in all phylogenetic analyses, in contrast to a second-branching inside early-diverging eudicots and a Bayesian tree displaying Sabiales branching after Proteales. All three hypotheses were tested concerning their likelihood. None of them was shown as being significantly declinable. Thus, albeit the number of characters and informative sites was doubled in comparision to the five-region investigation, the exact position of the Sabiales remained to be resolved with confidence. However, the advanced analyses of the phylogenetic structure of the three different non-coding partitions in comparison to coding genes resulted in the recognition of a significantly higher mean phylogenetic signal per informative character within spacers and introns than in the frequently applied slowly-evolving rbcL gene. The fast-evolving and well performing matK gene is shown to be nested within the non-coding partitions in this respect. Interestingly, the least constrained spacers displayed considerably less phylogenetic structure than both, the group I intron and the group II introns. Molecular evolution is again shown to follow certain patterns in angiosperms, as indicated by the occurrence of mutational hotspots and their connection to structural and functional constraints. This is especially shown for the group II introns studied where highly dynamic sequence parts were rather found in loops than stems. The aim of chapter 2 was to present a comprehensive reconstruction of the phylogenetic relationships inside the order of Ranunculales, the first-branching clade of the early-diverging eudicots, with an emphasis on the evolution of growth forms within the group. Currently, the order comprises seven families (Ranunculaceae, Berberidaceae, Menispermaceae, Lardizabalaceae, Circaeasteraceae – not included due to lacking plant material, Eupteleaceae, Papaveraceae) containing predominantly herbaceous groups as well as trees and lianescent/shrubby forms. A surprising result that emerged due to the increased use of molecular data within systematics during the last twenty years is the inclusion of the woody Eupteleaceae into Ranunculales. Because of its adaptation to wind pollination it was previously placed next to Hamamelididea. Although phylogenetic hypotheses agreed in the exclusion of Eupteleaceae and the predominantly herbaceous Papaveraceae from a core clade the branching order within early-diverging Ranunculales remained a question to be answered. Thus phylogenetic reconstructions based on molecular data of 50 taxa (including outgroup), applying the well-performing non-coding petD and trnL-F as well as the trnK/matK-psbA region including the coding matK, were carried out. The comprehensive sampling resulted in fully resolved and highly supported phylogenies in both, maximum parsimony and model based approaches, with family relations within the core clade being identical and Euptelea appearing as first branching lineage. However, the relationships among the early-diverging Ranunculales could not be resolved with confidence, a result in line with the finding made in chapter 1. The topology was further resolved as Lardizabalaceae being sister to the remaining members of the order, followed by Menispermaceae, Berberidaceae and Ranunculaceae, the latter sharing a sistergroup relationship. Inside the mainly lianescent Lardizabalaceae the shrubby Decaisnea was clearly depicted as first-branching. The systematic controversial Glaucidium and Hydrastis are shown to be early-diverging members of the Ranunculaceae. A central goal of chapter 3 was to test phylogenetic relationships among the members of the ranunculaceous tribe Anemoneae. Currently it consists of the subtribes Anemoninae including Anemone, Hepatica, Pulsatilla and Knowltonia, and Clematidinae, consisting of Archiclematis, Clematis and Naravelia. Furthermore the position and taxonomic rank of several lineages inside the subtribe Anemoninae were examined. Since recent comprehensive molecular-phylogenetic investigations have been carried out for the members of Clematidinae or Anemoninae, 63 species representing all major lineages of the two subtribes were included into analyses. Calculations were carried out on the basis of molecular data of the nuclear ribosomal ITS1&2 and the plastid atpB-rbcL intergenic spacer region. Phylogenetic reconstructions resulted in the recognition of two distinct clades within the tribe, thus corroborating the formation of the two subtribes. Within the subtribe Anemoninae the traditional genera Knowltonia, Pulsatilla and Hepatica are confidently shown to be nested within the genus Anemone. The preliminary classification of the genus, currently consisting of the two subgenera Anemone and Anemonidium, is complemented by the subgenus Hepatica.

Page generated in 0.061 seconds