Spelling suggestions: "subject:"ctructural ariants"" "subject:"ctructural rariants""
1 |
Exploration of the Gossypium raimondii Genome Using Bionano Genomics Physical Mapping TechnologyHanson, Christopher Jon 01 June 2018 (has links)
Cotton is a crop with a large global economic impact as well as a large, complex genome. Most industrial cotton production is from two tetraploid species (Gossypium hirsutum L. and Gossypium barbadense L.) which contain two subgenomes, specifically the AT and DT subgenomes. The DT subgenome is nearly half the size of the AT subgenome in tetraploid cotton and is closely related to an extant D-genome Gossypium species, G. raimondii Ulbr. Characterization of the structural variants present in diploid D-genome should provide greater insight into the evolution of the DT subgenome in the tetraploid cotton. Bionano (BNG) optical mapping uses patterns of fluorescent labels inserted at specific endonuclease sites to create physical maps of the genomes which can then be examined for structural variation. To develop optical maps in G. raimondii, we first developed a de novo PacBio long read sequence assembly of G. raimondii. This sequence assembly consisted of 2,379 contigs, an average contig length of 413 Kb and a contig N50 of 4.9 Mb. Using BNG technology, we developed two optical maps of the diploid D genome of G. raimondii. One was created using the Nt.BssSI endonuclease and one with the Nt.BspQI endonuclease. Using the BNG optical maps, the PacBio assembly was hybrid scaffolded into 100 scaffolds (+ 5 unscaffolded contigs) with an average scaffold length of 7.5 Mb and a scaffold N50 of 13.1 Mb. A comparison between the Nt. BssSI BNG optical map and the two sequence assemblies identified 3,195 structural variants. These were used to validate the accuracy of the reference sequence of G. raimondii and structural variants were used to create a new phylogeny of nine major cotton species.
|
2 |
Assembly of Two CCDD Rice Genomes, Oryza grandiglumis and Oryza latifolia, and the Study of Their Evolutionary ChangesAlsantely, Aseel O. 01 1900 (has links)
Every day more than half of the world consumes rice as a primary dietary resource. Thus, rice is one of the most important food crops in the world. Rice and its wild relatives are part of the genus Oryza. Studying the genome structure, function, and evolution of Oryza species in a comparative genomics framework is a useful approach to provide a wealth of knowledge that can significantly improve valuable agronomic traits. The Oryza genus includes 27 species, with 11 different genome types as identified by genetic and cytogenetic analyses. Six genome types, including that of domesticated rice - O. sativa and O. glaberrima, are diploid, and the remaining 5 are tetraploids. Three of the tetraploid species contain the CCDD genome types (O. grandiglumis, O. latifolia, and O. alta), which arose less than 2 million years ago. Polyploidization is one of the major contributors to evolutionary divergence and can thereby lead to adaptation to new environmental niches. An important first step in the characterization of the polyploid Oryza species is the generation of a high-quality reference genome sequence. Unfortunately, up until recently, the generation of such an important and fundamental resource from polyploid species has been challenging, primarily due to their genome complexity and repetitive sequence content. In this project, I assembled two high-quality genomes assemblies for O. grandiglumis and O. latifolia using PacBio long-read sequencing technology and an assembly pipeline that employed 3 genome assemblers (i.e., Canu/2.0, Mecat2, and Flye/2.5) and multiple rounds of sequence polishing with both Arrow and Pilon/1.23. After the primary assembly, sequence contigs were arranged into pseudomolecules, and homeologous chromosomes were assigned to their respective genome types (i.e., CC or DD). Finally, the assemblies were extensively edited manually to close as many gaps as possible. Both assemblies were then analyzed for transposable element and structural variant content between species and homoeologous chromosomes. This enabled us to study the evolutionary divergence of those two genomes, and to explore the possibility of neo-domesticating either species in future research for my PhD dissertation.
|
3 |
Hypothesis-free detection of genome-changing events in pedigree sequencingGarimella, Kiran January 2016 (has links)
In high-diversity populations, a complete accounting of de novo mutations can be difficult to obtain. Most analyses involve identifying such mutations by sequencing pedigrees on second-generation sequencing platforms and aligning the short reads to a reference assembly, the genomic sequence of a canonical member (or members) of a species. Often, large regions of the genomes under study may be greatly diverged from the reference sequence, or not represented at all (e.g. the HLA, antigenic genes, or other regions under balancing selective pressure). If the haplotypic background upon which a mutation occurs is absent, events can easily be missed (as reads have nowhere to align) and false-positives may abound (as the software forces the reads to align elsewhere). This thesis presents a novel method for de novo mutation discovery and allele identification. Rather than relying on alignment, our method is based on the de novo assembly of short-read sequence data using a multi-color de Bruijn graph. In this data structure, each sample is assigned a unique index (or "color"), reads from each sample are decomposed into smaller subsequences of length k (or "kmers"), and color-specific adjacency information between kmers is recorded. Mutations can be discovered in the graph itself by searching for characteristic motifs (e.g. a "bubble motifs", indicative of a SNP or indel, and "linear motifs" indicative of allelic and non-allelic recombination). De novo mutations differ from inherited mutations in that the kmers spanning the variant allele are absent in the parents; in a sense, they facilitate their own discovery by generating "novel" sequence. We exploit this fact to limit processing of the graph to only those regions containing these novel kmers. We verified our approach using simulations, validation, and visualization. On the simulations, we developed genome and read generation software driven by empirical distributions computed from real data to emit genomes with realistic features: recombinations, de novo variants, read fragment sizes, sequencing errors, and coverage profiles. In 20 artifical samples, we determined our sensitivity and specificity for novel kmer recovery to be approximately 98% and 100% at worst, respectively. Not every novel stretch can be reconstituted as a variant, owing to errors and homology in the graph. In simulations, our false discovery rate was 10% for "bubble" events and 12% for "linear" events. On validation, we obtained a high-quality draft assembly for a single P. falciparum child using a third-generation sequencing platform. We discovered three de novo events in the draft assembly, all three of which are recapitulated in our calls on the second-generation sequencing data for the same sample; no false-positives are present. On visualization, we developed an interactive web application capable of rendering a multi-color subgraph that assists in visually distinguishing between true variation and sequencing artifacts. We applied our caller to real datasets: 115 progeny across four previously analyzed experimental crosses of Plasmodium falciparum. We demonstrate our ability to access subtelomeric compartments of the genome, regions harboring antigenic genes under tremendous selective pressure, thus highly divergent between geographically distinct isolates and routinely masked and ignored in reference-based analyses. We also show our caller's ability to recover an important form of structural de novo variation: non-allelic homologous recombination (NAHR) events, an important mechanism for the pathogen to diversify its own antigenic repertoire. We demonstrate our ability to recover the few events in these samples known to exist, and overturn some previous findings indicating exchanges between "core" (non-subtelomeric) genes. We compute the SNP mutation rate to be approximately 2.91 per sample, insertion and deletion mutation rates to be 0.55 and 1.04 per sample, respectively, multi-nucleotide polymorphisms to be 0.72 per sample, and NAHR events to be 0.33 per sample. These findings are consistent across crosses. Finally, we investigated our method's scaling capabilities by processing a quintet of previously analyzed Pan troglodytes verus (western chimpanzee) samples. The genome of the chimpanzee is two orders of magnitude larger than the malaria parasite's (3, 300 Mbp versus 23 Mbp), diploid rather than haploid, poorly assembled, and the read dataset is lower coverage (20x versus 120x). Comparing to Sequenom validation data as well as visual validation, our sensitivity is expectedly low. However, this can be attributed to overaggressiveness in data cleaning applied by the de novo assembler atop which our software is built. We discuss the precise changes that would likely need to be made in future work to adapt our method to low-coverage samples.
|
4 |
Identification of causal factors for recessive lethals in dairy cattle with special focus on large chromosomal deletions / Etude de délétions chromosomiques et de variants génétiques responsables de mortalité embryonnaire chez les bovins laitiersUddin, Md Mesbah 17 September 2019 (has links)
L'objectif général de cette thèse est d'identifier les variants causaux ou, à défaut, un ensemble de marqueurs prédictifs - qui présentent un déséquilibre de liaison élevé avec les variants causaux - pour la fertilité des vaches laitières. Nous avons abordé cet objectif général dans cinq articles: (i) décrit une approche systématique de cartographie des variants létaux récessifs chez les bovins Normands français basée sur la recherche de déficit en haplotypes homozygotes (HHD). Cette étude montre l’influence de la taille de l’échantillon, de la qualité des génotypes, de la qualité du phasage des génotypes en haplotypes et de l’imputation, de l’âge de l’haplotype et enfin, de la définition des seuils de signification prenant en compte les tests multiples, sur la découverte et la reproductibilité des résultats de HHD. Elle illustre également l’importance de la cartographie fine avec les données de généalogie et de séquence de génome entier (WGS), l’annotation intégrative (entre espèces) pour hiérarchiser les mutations candidates et, enfin, le génotypage à grande échelle de la mutation candidate, pour valider ou invalider les mutations initiales. (ii) décrit une cartographie à haute résolution de grandes délétions chromosomiques de séquences du génome dans une population de 175 animaux appartenant à trois races laitières nordiques. Cette étude utilise trois approches différentes pour valider les résultats de la cartographie. Le chapitre décrit les propriétés génétiques des populations et l’importance fonctionnelle des délétions identifiées. (iii) traite de trois questions liées à l’imputation de variants structuraux, ici de délétions chromosomiques importantes: la disponibilité des génotypes de délétion, la taille du panel de référence d'haplotypes et, enfin, l’imputation elle-même. Pour aborder les deux premières questions, cette étude décrit une approche basée sur un modèle de mélange gaussien dans laquelle les données de profondeur de lecture provenant de fichiers au format VCF (variant call format) sont utilisées pour génotyper un locus de délétion connu, en l’absence d’information sur la séquence brute. Enfin, il présente un pipeline pour l'imputation conjointe de variants WGS et de grandes délétions chromosomiques. (iv) décrit des études d'association pangénomiques de la fertilité femelle dans trois races de bovins laitiers nordiques à l'aide de variants WGS imputés et de grandes délétions chromosomiques. Cette étude concerne huit caractères de fertilité et utilise des analyses d'association mono-marqueur, conditionnelles et conjointes. Cette étude montre qu’une surestimation, ou « inflation », des statistiques de test peut être observée même après correction pour la stratification de la population à l'aide de composantes principales génomiques et pour les structures familiales à l'aide de matrices de relations génomiques. Ce biais était connu pour les caractères très polygéniques. Enfin, cette étude présente plusieurs locus de traits quantitatifs (QTL) nouveaux et confirme plusieurs autres déjà connus. Elle souligne également l’importance d’inclure les grandes délétions (imputées) pour la cartographie par association des caractères de fertilité. (v) décrit la prédiction des valeurs génomiques de fertilité (ou indice de fertilité) à l'aide de génotypes à puces SNP, de QTL sélectionnés et de délétions chromosomiques importantes. En utilisant la méthode de meilleure prédiction linéaire sans biais génomique (GBLUP) avec une ou plusieurs matrices de relations génomiques dérivées d'un ensemble de marqueurs sélectionnés, cette étude rapporte une précision de prédiction améliorée. Cette étude met également en évidence l’influence de la sélection des marqueurs les plus prédictifs, en particulier pour une race ayant une population d’apprentissage réduite, sur la précision des prédictions génomiques. Enfin, les résultats démontrent que les grandes délétions ont en général un pouvoir prédictif élevé. / The overall aim of this PhD thesis is to identify causal variants for recessive lethal mutations and select a set of predictive markers that are in high linkage-disequilibrium with the causal variants for female fertility in dairy cattle. We addressed this broad aim under five articles: (i) describes a systematic approach of mapping recessive lethals in French Normande cattle using homozygous haplotype deficiency (HHD). This study shows the influence of sample size, quality of genotypes, quality of (genotype) phasing and imputation, age of haplotype (of interest), and last but not the least, multiple testing corrections, on discovery and replicability of HHD results. It also illustrates the importance of fine-mapping with pedigree and whole-genome sequence (WGS) data, (cross-species) integrative annotation to prioritize candidate mutation, and finally, large-scale genotyping of the candidate mutation, to validate or invalidate initial results. (ii) describes a high-resolution population-scale mapping of large chromosomal deletions from whole-genome sequences of 175 animals from three Nordic dairy breeds. This study employs three different approaches to validate identified deletions. Next, it describes population genetic properties and functional importance of these deletions. (iii) deals with three main issues related to imputation of structural variants, in this case, large chromosomal deletions, e.g. availability of deletion genotypes, size of haplotype reference panel, and finally, imputation itself. To address the first two issues, this study describes a Gaussian mixture model-based approach where read-depth data from the variant call format (VCF) file is used to genotype a known deletion locus, without the need for raw sequence (BAM) file. Finally, it presents a pipeline for joint imputation of WGS variants along with large chromosomal deletions. (iv) describes genome-wide association studies for female fertility in three Nordic dairy cattle breeds using imputed WGS variants including large chromosomal deletions. This study is based on the analyses of eight fertility related traits using single-marker association, conditional and joint analyses. This study illustrates that inflation in association test-statistics could be seen even after correcting for population stratification using (genomic) principal components, and relatedness among the samples using genomic relationship matrices; however, this was known for traits with strong polygenic effects, among other factors. Finally, mapping of several new quantitative trait loci (QTL), along with the previously known ones, are reported in this study. This study also highlights the importance of including (imputed) large deletions for association mapping of fertility traits. (v) describes prediction of genomic breeding values for fertility using SNP array-chip genotypes, selected QTL and large chromosomal deletion. Using genomic best linear unbiased prediction (GBLUP) method with one or several genomic-relationship matrices derived from a set of selected markers, this study reports higher prediction accuracy compared with previous report. This study also highlights the influence of selecting markers with best predictability, especially for a breed with small training population, in accuracy of genomic prediction. The results demonstrate that large deletions in general have a high predictive performance.
|
5 |
AnáIisis de la herencia epigenética en trastornos neurológicosIraola Guzmán, Susana 03 December 2012 (has links)
Las enfermedades neurodegenerativas, como la enfermedad de Alzheimer (EA) y la enfermedad de Parkinson (EP), representan un grave problema de salud pública, sobre todo en los países occidentales, donde el envejecimiento creciente de la población augura un incremento sustancial de la prevalencia de estas patologías. A pesar de que ciertos tratamientos proporcionan una disminución de las manifestaciones clínicas, el avance del proceso neurodegenerativo es irreversible. La identificación de los mecanismos, como la interacción entre factores genéticos y medio-ambientales, implicados en la etiología y evolución de estas patologías es de importancia capital. En el presente trabajo de tesis se explora el papel de la metilación del ADN genómico y el mosaicismo genético en enfermedades neurodegenerativas. El análisis del perfil de metilación del ADN se realizó empleando dos arrays de metilación: “HumanMethylation” (27K y 450K, IlIumina), cuyas sondas distribuidas estratégicamente por todo el genoma, permiten detectar cuantitativamente el estado de mutilación de unos 27.000 y 450.000 dinucleótidos CpG, respectivamente. La comparación de un total de 60 individuos (28 con enfermedad de Alzheimer, 3 con enfermedad de Parkinson y 29 controles) ha permitido identificar el perfil de metilación del genoma de distintas áreas del sistema nervioso central (SNC) (corteza, amígdala, hipocampo, hipotálamo, protuberancia, sustancia negra y cerebelo), mostrando la existencia de un patrón diferencial entre hombres y mujeres, asociado a la inactivación del cromosoma X, un patrón independiente para cerebelo, y un patrón de metilación de un conjunto de dianas característico de los estadíos 3 y 4 de Braak de la EA. Asimismo, se observaron diferencias significativas de metilación (1.112 CpGs, p<0,0l) en el cerebelo asociadas a la EA, confirmando su implicación en la enfermedad. El análisis del mosaicismo somático del cerebro se realizó empleando el "SurePrint G3 human CGH array 400K" (Agilent). Tomando como área de referencia el cerebelo se detectaron ganancias o pérdidas de material genómico entre áreas del cerebro de un mismo individuo. Dos muestras de corteza, pertenecientes a dos controles, presentaron una ganancia de material genómico en el gen WWOX, mientras que tan solo una muestra mostró una ganancia de material genómico en el gen ADAM5P3A. La elevada frecuencia de variantes en el número de copia en WWOX y su posible implicación en EA llevó a genotipar un mayor número de individuos, aunque ninguno mostró mosaicismo somático. El análisis del estado de metilación de las sondas ubicadas en WWOX permitió observar una disminución significativa de la metilación entre pacientes y controles en 14 sondas (T-student, p<0,05), sugiriendo que la regulación epigenética de WWOX puede estar alterada en la EA. En conjunto, estos resultados muestran la alteración de los perfiles de mutilación del SNC en relación con la EA tardía (estadíos 3 y 4 de Braak). Principalmente, en una de las regiones cuya afectación patológica en la EA ha sido más controvertida, cerebelo. Es especialmente interesante remarcar que la aparición de las lesiones características de cerebelo tienen lugar en estadíos más avanzados, indicando la posibilidad de que la alteraciones epigenéticas observadas podrían corresponder a un evento prematuro en la progresión de la patología. / Neurodegenerative disorders, such as Alzheimer's disease (AD) and Parkinson's disease (PD), represent a major issue of public health in developing countries where the aging of the population is leading to a progressive increase of its prevalence rates. Currently, several therapeutic strategies help to palliate clinical symptoms, but the neurodegeneration is progressive and irreversible. Identification of underlying mechanisms leading to these disorders is essential to improve patient's life expectancy and quality. In this context, many efforts have been focused on identifying genetics and environment causes of these disorders with little success, highlighting the need to evaluate new mechanisms and factors involved. The present thesis project has explored the implication of new mechanisms, such as DNA methylation and somatic mosaicism in AD and PD. The analysis of DNA methylation was performed with a new methylation array technology: 'HumanMethylation' (27K and 450K, IlIumina), whose probes strategically distributed along the human genome, enables to quantify the methylation state of around 27,000 and 450,000 CpG sites, respectively. The pattern of methylation of 60 subjects (28 AD, 3 PD and 29 unaffected) with four to seven brain regions (cortex, amygdala, hippocampus, hypothalamus, pons, substantia nigra and cerebellum) has been assessed. The study has shown three ma in clusters depending on gender (female/male), brain area (cerebellum vs others) and disease stage (AD3 vs AD4). In addition, a' differential analysis performed in individual CpG sites proved the presence of significant differences associated to AD patient's cerebellum (1112 CpG sites, p<0.01). Somatic mosaicism analysis has been carried out with a 'SurePrint G3 human CGH array 400K' (Agilent) to detect intra-individual genomic gains and losses compared to cerebellum. A total of two cortex samples showed a genomic gain in the WWOX gene, whereas only one sample showed a gain on ADAM5P3A. WWOX has been considered as a potential candidate gene in previous AD studies, and was further analyzed in a larger cohort of human brain samples. Genotyping assays did not confirm the presence of new somatic mosaicism cases, but it was possible to determine the genotype distribution and compared data between samples. A significant hypomethylation of the WWOX promoter region was observed in AD patients compared to controls subjects (T-test, p<0.05) in 14 probes, suggesting a potential regulation of expression by methylation. Overall, these results highlight the implication of epigenetic mechanisms in neurodegenerative disorders, as AD. In particular, it is remarkable the specific pattern of methylation in the cerebellum in intermediate stages of AD, suggesting an overlap with early modifications, which could contribute to unraveling new mechanisms implicated in AD.
|
6 |
Caracterización de reordenamientos cromosómicos asociados a fenotipoVilla Marcos, Olaya 27 October 2009 (has links)
El establecimiento de correlaciones entre fenotipo y genotipo es uno de los principales objetivos de la genética. La obtención de un diagnóstico ajustado facilita el manejo clínico del paciente, así como poder ofrecer un correcto consejo genético, con asesoramiento reproductivo a las familias de pacientes con enfermedades genéticas. La identificación de genes asociados a patología desde alteraciones citogenéticas asociadas a fenotipo es uno de los métodos de clonación posicional. En este trabajo nos hemos basado en dos tipos de modelos de anomalías citogenéticas: balanceadas y no balanceadas (translocaciones y cromosomas marcadores). Hemos caracterizado las alteraciones citogenéticas de cinco pacientes de cada modelo con fenotipos diversos, empleando una combinación de técnicas citogenéticas y moleculares, con el objetivo de proponer genes candidatos asociados a cada fenotipo. / One of the main objectives of Genetics is the establishment of phenotype-genotype correlations. A correct diagnosis facilitates the clinical management of the patient and the possibility to offer a genetic counselling, with reproductive assessment to the families with a patient with a genetic disease. The identification of genes associated to pathology from cytogenetic alterations associated to phenotype is one of the methods of positional cloning. In this work we have based in two different models of cytogenetic alterations: balanced and unbalanced anomalies (translocations and marker chromosomes). We have characterized five patients of each group with different phenotypes, using a combination of cytogenetic and molecular techniques, with the objective of establish candidate genes associated to disease.
|
Page generated in 0.0723 seconds