• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 356
  • 195
  • 154
  • 52
  • 16
  • 10
  • 7
  • 7
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 911
  • 147
  • 136
  • 130
  • 105
  • 99
  • 91
  • 89
  • 87
  • 84
  • 75
  • 74
  • 69
  • 69
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Genomic Data Augmentation with Variational Autoencoder

Thyrum, Emily 12 1900 (has links)
In order to treat cancer effectively, medical practitioners must predict pathological stages accurately, and machine learning methods can be employed to make such predictions. However, biomedical datasets, including genomic datasets, often have disproportionately more samples from people of European ancestry than people of other ethnic or racial groups, which can cause machine learning methods to perform better on the European samples than on the people of the under-represented groups. Data augmentation can be employed as a potential solution in order to artificially increase the number of samples from people of under-represented racial groups, and can in turn improve pathological stage predictions for future patients from such under-represented groups. Genomic data augmentation has been explored previously, for example using a Generative Adversarial Network, but to the best of our knowledge, the use of the variational autoencoder for the purpose of genomic data augmentation remains largely unexplored. Here we utilize a geometry-based variational autoencoder that models the latent space as a Riemannian manifold so that samples can be generated without the use of a prior distribution to show that the variational autoencoder can indeed be used to reliably augment genomic data. Using TCGA prostate cancer genotype data, we show that our VAE-generated data can improve pathological stage predictions on a test set of European samples. Because we only had European samples that were labeled in terms of pathological stage, we were not able to validate the African generated samples in this way, but we still attempt to show how such samples may be realistic. / Computer and Information Science
102

Quantitative Analysis of Microbial Species in a Metagenome Based onTheir Signature Sequences

Yadav, Pooja 26 July 2017 (has links)
No description available.
103

Identifying The Structure Of Genomic Islands In Prokaryotes

Aldaihani, Reem A. A. H. S. 03 August 2022 (has links)
Prokaryotic genomes evolve via horizontal gene transfer (HGT), mutations, and rearrangements. HGT is a mechanism that plays a significant role in prokaryotic evolution and leads to biodiversity in nature. One of the important components of HGT is the genomic island (GI) which is a subsequence of the genome created by HGT. This research aims to identify the structures of the prokaryotic GIs that have a fundamental role in the adoption of prokaryotes and the impact of the species on the environment. Previous computational biology research has focused on developing tools that detect GIs in prokaryotic genomes, while there is little research investigating GI structure. This research introduces a novel idea that has not yet been addressed intensively, which is identifying additional structures of the GIs in prokaryotes. There are two main directions in this research used to study the prokaryotic GIs structure from each different perspective. In the first direction, the aim is to investigate GI patterns and the existence of biological connections across bacterial phyla in terms of GIs on a large scale. This direction mainly aims to pursue the novel idea of connecting GIs across prokaryotic and phage genomes via patterns of protein families across many species. A pattern is a sequence of protein families that is found to frequently occur in the genomes of a number of species. Here the large data set available from the IslandViewer4 database and protein families from the Pfam database have been combined. Furthermore, implementing a comprehensive strategy to identify patterns that makes use of HMMER, BLAST, and MUSCLE; also implement Python programs that link the analysis into a single pipeline. Research results demonstrate that related GIs often exist in multiple species that are not evolutionarily related and indeed may be from multiple bacterial phyla. Analysis of the discovered patterns led to the identification of biological connections among prokaryotes and phages through their GIs. A connection is an HGT relation represented as a pattern that exists in a phage and a number of prokaryotic species. These discovered connections suggest quite broad HGT connections across the bacterial kingdom and its associated phages. In addition, these connections provide the basis for additional analysis of the breadth of HGT and the identification of individual HGT events that span bacterial phyla. Moreover, these patterns can suggest the basis for discovering the specific patterns in pathogenic GIs that could play a crucial role in antibiotic resistance. The second direction aims to identify the structure of the GIs in terms of their location within the genome. Prokaryotic GIs have been analyzed according to the genome structure that they are located in, whether it be a circular or a linear genome. The analysis is performed to study the GIs' location in relation to the oriC, investigating the nature of the distances between the GIs, and determining the distribution of GIs in the genome. The analysis has been performed on all of the GIs in the data set. Moreover, the GIs in one genome from each species and the GIs of the most frequent species are in the data set, in order to avoid bias. Overall, the results showed that there are preferable sites for the GIs in the genome. In the linear genomes, they are usually located in the origin of replication area and terminus, and in the circular genomes they are located in the terminus. / Doctor of Philosophy / Prokaryotes are one of the most abundant species on earth that play an essential role in naturally shaping the planet and its life. This research aims to identify the structure of a component in these species that has a fundamental role in the adoption of prokaryotes and the impact of the species on the environment. This component is a part of the genome named the genomic island (GI). This dissertation aims to identify the structure of the GIs in two different ways that have not yet been addressed extensively. The first direction aims to discover patterns in the GIs and then use them to bring to light biological connections between prokaryotic and bacteriophages. In this direction, a comprehensive strategy has been utilized to identify patterns and connections. This strategy uses several tools such as BLAST, HMMER, and MUSCLE. Furthermore, Python programs that link the analysis into a single pipeline have been implemented. In the second direction, an investigation has been performed to understand the nature of the GIs' locations within the genome. This direction addresses three different analysis techniques to achieve its target. The three analyses are studying the GIs' location in relation to the origin of replication, investigating the nature of the distances between the GIs, and discovering the location distribution of GIs in the genome. The analysis is performed on linear genomes and circular genomes separately. In each group of GIs, the data set has been utilized to see the results from different perspectives. The overall analysis in both directions relived several findings. In the first direction, the discovered patterns merit deep investigation based on the possibility that they are related to diseases. In addition, in prokaryotic genomes, there are specific sites where the GIs can be frequently seen that need further search to understand the relation between the GIs' location and the content of the GI in terms of proteins.
104

Molecular phenotypes associated with heifer fertility

Marrella, Mackenzie Ann 16 May 2024 (has links)
Doctor of Philosophy / Before the early 2000s, many producers were heavily selecting for production traits without accounting for the negative relationship between many production traits and fertility. The large decrease in fertility that occurred as a result of these selection practices put a heavy burden on producers. Replacement heifer development is the third largest cost incurred by producers, behind feed and labor. Consequently, the failure of a heifer to conceive in her first breeding season translates directly into financial loss for the producer and lasting consequences on the animal's longevity and performance in subsequent breeding seasons. To make improvements in a trait, or traits, of interest, producers often selectively breed two animals with desirable characteristics. However, the complex nature of fertility traits limits the effectiveness of this method. As a result, researchers have been attempting to identify biological molecules whose abundance differs between cattle of differing fertility potential, termed molecular markers, that could be used to identify superior cattle earlier and more accurately. While a good amount of research has been conducted in mature cows and in a laboratory setting, very few studies have attempted to identify molecular markers of fertility in heifers. Therefore, the objective of this work was to identify different biological molecules (genomic variants, genes, proteins, and metabolites) whose abundance differed between heifers with varying fertility potential. Investigation into DNA variations led to the identification of three variants that were associated with fertility, 16 variants that were associated with health, and 29 variants that were associated with both health and fertility. Given that some variants can impact a trait by changing gene expression, we attempted to identify variations in RNA that were having this effect on heifer fertility. Although some variants were found to influence gene expression, we were unable to correlate these changes with fertility differences. However, in a different study, we were able to identify two genes (APMAP and DNAI7), as well as one protein (alpha-ketoglutarate-dependent-dioxygenase FTO), that differed significantly between fertile and sub-fertile heifers. Importantly, the results of this study allowed us to create a biological profile that was capable of accurately distinguishing 21/22 heifers based on their fertility potential. Finally, investigation of the metabolite profile revealed one metabolite (2-dehydro-D-gluconate) that was differentially abundant between fertile and sub-fertile heifers. Overall, this work sheds light on the complex nature of heifer fertility and provides several potential molecular that could be used to distinguish between heifers of varying reproductive potential.
105

Genomic imprinting: support for the concept from a study of Prader-Willi Syndrome patients

Robinett, Sheldon J. (Sheldon Jay) 12 1900 (has links)
In this study, nineteen cases of suspected or clinically diagnosed Prader-Willi Syndrome (PWS) were tested for molecular deletions by in situ hybridization with two DNA probes, IR4-3R and GABRB3. Both probes are specific for sequences within the chromosome region 15q11-13, with IR4-3R located within the putative PWS region and GABRB3 in the distal area associated with Angelman Syndrome.
106

Genomic selection in farm animals: accuracy of prediction and applications with imputed whole-genome sequencing data in chicken

Ni, Guiyan 10 February 2016 (has links)
Methoden zur genomischen Vorhersage basierend auf Genotypinformationen von Single Nucleotide Polymorphism (SNP)-Arrays mit unterschiedlicher Markeranzahl sind mittlerweile in vielen Zuchtprogrammen für Nutztiere fest implementiert. Mit der zunehmenden Verfügbarkeit von vollständigen Genomsequenzdaten, die auch kausale Mutationen enthalten, werden mehr und mehr Studien veröffentlicht, bei denen genomische Vorhersagen beruhend auf Sequenzdaten durchgeführt werden. Das Hauptziel dieser Arbeit war zu untersuchen, inwieweit SNP-Array-Daten mit statistischen Verfahren bis zum Sequenzlevel ergänzt werden können (sogenanntes „Imputing“) (Kapitel 2) und ob die genomische Vorhersage mit imputeten Sequenzdaten und zusätzlicher Information über die genetische Architektur eines Merkmals verbessert werden kann (Kapitel 3). Um die Genauigkeit der genomischen Vorhersage besser verstehen und eine neue Methode zur Approximation dieser Genauigkeit ableiten zu können, wurde außerdem eine Simulationsstudie durchgeführt, die den Grad der Überschätzung der Genauigkeit der genomischen Vorhersage verschiedener bereits bekannter Ansätze überprüfte (Kapitel 4). Der technische Fortschritt im letzten Jahrzehnt hat es ermöglicht, in relativ kurzer Zeit Millionen von DNA-Abschnitten zu sequenzieren. Mehrere auf unterschiedlichen Algorithmen basierende Software-Programme zur Auffindung von Sequenzvarianten (sogenanntes „Variant Calling“) haben sich etabliert und es möglich gemacht, SNPs in den vollständigen Genomsequenzdaten zu detektieren detektieren. Oft werden nur wenige Individuen einer Population vollständig sequenziert und die Genotypen der anderen Individuen, die mit einem SNP-Array an einer Teilmenge dieser SNPs typisiert wurden, imputet. In Kapitel 2 wurden deshalb anhand von 50 vollständig sequenzierten Weiß- und Braunleger-Individuen die mit drei unterschiedlichen Variant-Calling-Programmen (GATK, freebayes and SAMtools) detektierten Genomvarianten verglichen und die Qualität der Genotypen überprüft. Auf den untersuchten Chromosomen 3,6 und 26 wurden 1.741.573 SNPs von allen drei Variant Callers detektiert was 71,6% (81,6%, 88,0%) der Anzahl der von GATK (SAMtools, freebayes) detektierten Varianten entspricht. Die Kenngröße der Konkordanz der Genotypen („genotype concordance“), die durch den Anteil der Individuen definiert ist, deren Array-basierte Genotypen mit den Sequenz-basierten Genotypen an allen auch auf dem Array vorhandenen SNPs übereinstimmt, betrug 0,98 mit GATK, 0,98 mit SAMtools und 0,97 mit freebayes (Werte gemittelt über SNPs auf den untersuchten Chromosomen). Des Weiteren wiesen bei Nutzung von GATK (SAMtools, freebayes) 90% (88 %, 75%) der Varianten hohe Werte (>0.9) anderer Qualitätsmaße (non-reference sensitivity, non-reference genotype concordance und precision) auf. Die Leistung aller untersuchten Variant-Calling-Programme war im Allgemeinen sehr gut, besonders die von GATK und SAMtools. In dieser Studie wurde außerdem in einem Datensatz von ungefähr 1000 Individuen aus 6 Generationen die Güte des Imputings von einem hochdichten SNP-Array zum Sequenzlevel untersucht. Die Güte des Imputings wurde mit Hilfe der Korrelationen zwischen imputeten und wahren Genotypen pro SNP oder pro Individuum und der Anzahl an Mendelschen Konflikten bei Vater-Nachkommen-Paaren beschrieben. Drei unterschiedliche Imputing-Programme (Minimac, FImpute und IMPUTE2) wurden in unterschiedlichen Szenarien validiert. Bei allen Imputing-Programmen betrug die Korrelation zwischen wahren und imputeten Genotypen bei 1000 Array-SNPs, die zufällig ausgewählt und deren Genotypen im Imputing-Prozess als unbekannt angenommen wurden, durchschnittlich mehr als 0.95 sowie mehr als 0.85 bei einer Leave-One-Out-Kreuzvalidierung, die mit den sequenzierten Individuen durchgeführt wurde. Hinsichtlich der Genotypenkorrelation zeigten Minimac und IMPUTE2 etwas bessere Ergebnisse als FImpute. Dies galt besonders für SNPs mit niedriger Frequenz des selteneren Allels. FImpute wies jedoch die kleinste Anzahl von Mendelschen Konflikten in verfügbaren Vater-Nachkommen-Paaren auf. Die Korrelation zwischen wahren und imputeten Genotypen blieb auf hohem Niveau, auch wenn die Individuen, deren Genotypen imputet wurden, einige Generationen jünger waren als die sequenzierten Individuen. Zusammenfassend zeigte in dieser Studie GATK die beste Leistung unter den getesteten Variant-Calling-Programmen, während Minimac sich unter den untersuchten Imputing-Programmen als das beste erwies. Aufbauend auf den Ergebnissen aus Kapitel 2 wurden in Kapitel 3 Studien zur genomischen Vorhersage mit imputeten Sequenzdaten durchgeführt. Daten von 892 Individuen aus 6 Generationen einer kommerziellen Braunlegerlinie standen hierfür zur Verfügung. Diese Tiere waren alle mit einem hochdichten SNP-Array genotypisiert. Unter der Nutzung der Daten von 25 vollständig sequenzierten Individuen wurden jene Tiere ausgehend von den Array-Genotypen bis zum Sequenzlevel hin imputet. Das Imputing wurde mit Minimac3 durchgeführt, das bereits haplotypisierte Daten (in dieser Studie mit Beagle4 erzeugt) als Input benötigt. Die Genauigkeit der genomischen Vorhersage wurde durch die Korrelation zwischen de-regressierten konventionellen Zuchtwerten und direkt genomischen Zuchtwerten für die Merkmale Bruchfestigkeit, Futteraufnahme und Legerate gemessen. Neben dem Vergleich der Genauigkeit der auf SNP-Array-Daten und Sequenzdaten basierenden genomischen Vorhersage wurde in dieser Studie auch untersucht, wie sich die Verwendung verschiedener genomischer Verwandtschaftsmatrizen, die die genetische Architektur berücksichtigen, auf die Vorhersagegenauigkeit auswirkt. Hierbei wurden neben dem Basisszenario mit gleichgewichteten SNPs auch Szenarien mit Gewichtungsfaktoren, nämlich den -(〖log〗_10 P)-Werten eines t-Tests basierend auf einer genomweiten Assoziationsstudie und den quadrierten geschätzten SNP-Effekten aus einem Random Regression-BLUP-Modell, sowie die Methode BLUP|GA („best linear unbiased prediction given genetic architecture“) überprüft. Das Szenario GBLUP mit gleichgewichteten SNPs wurde sowohl mit einer Verwandtschaftsmatrix aus allen verfügbaren SNPs oder nur derer in Genregionen, jeweils ausgehend von der Grundmenge aller imputeten SNPs in der Sequenz oder der Array-SNPs, getestet. Gemittelt über alle untersuchten Merkmale war die Vorhersagegenauigkeit mit SNPs aus Genregionen, die aus den imputeten Sequenzdaten extrahiert wurden, mit 0,366 ± 0,075 am höchsten. Den zweithöchsten Wert erreichte die genomische Vorhersage mit SNPs aus Genregionen, die im SNP-Array erhalten sind (0,361 ± 0,072). Weder die Verwendung gewichteter genomischer Verwandtschaftsmatrizen noch die Anwendung von BLUP|GA führten im Vergleich zum normalen GBLUP-Ansatz zu höheren Vorhersagegenauigkeiten. Diese Beobachtung war unabhängig davon, ob SNP-Array- oder imputete Sequenzdaten verwendet wurden. Die Ergebnisse dieser Studie zeigten, dass kaum oder kein Zusatznutzen durch die Verwendung von imputeten Sequenzdaten generiert werden kann. Eine Erhöhung der Vorhersagegenauigkeit konnte jedoch erreicht werden, wenn die Verwandschaftsmatrix nur aus den SNPs in Genregionen gebildet wurde, die aus den Sequenzdaten extrahiert wurden. Die Auswahl der Selektionskandidaten erfolgt in genomischen Selektionsprogrammen mit Hilfe der geschätzten genomischen Zuchtwerte (GBVs). Die Genauigkeit des GBV ist hierbei ein relevanter Parameter, weil sie die Stabilität der geschätzten Zuchtwerte beschreibt und zeigen kann, wie sich der GBV verändern kann, wenn mehr Informationen verfügbar werden. Des Weiteren ist sie einer der entscheidenden Faktoren beim erwarteten Zuchtfortschritt (auch als so genannte „Züchtergleichung“ beschrieben). Diese Genauigkeit der genomischen Vorhersage ist jedoch in realen Daten schwer zu quantifizieren, da die wahren Zuchtwerte (TBV) nicht verfügbar sind. In früheren Studien wurden mehrere Methoden vorgeschlagen, die es ermöglichen, die Genauigkeit von GBV durch Populations- und Merkmalsparameter (z.B. effektive Populationsgröße, Sicherheit der verwendeten Quasi-Phänotypen, Anzahl der unabhängigen Chromosomen-Segmente) zu approximieren. Weiterhin kann die Genauigkeit bei Verwendung von gemischten Modellen mit Hilfe der Varianz des Vorhersagefehlers abgeleitet werden. In der Praxis wiesen die meisten dieser Ansätze eine Überschätzung der Genauigkeit der Vorhersage auf. Deshalb wurden in Kapitel 4 mehrere methodische Ansätze aus früheren Arbeiten in simulierten Daten mit unterschiedlichen Parametern, mit Hilfe derer verschiedene Tierzuchtprogramme (neben einem Basisszenario ein Rinder- und ein Schweinezuchtschema) abgebildet wurden, überprüft und die Höhe der Überschätzung gemessen. Außerdem wurde in diesem Kapitel eine neue und leicht rechenbare Methode zur Approximation der Genauigkeit vorgestellt Die Ergebnisse des Vergleichs der methodischen Ansätze in Kapitel 4 zeigten, dass die Genauigkeit der GBV durch den neuen Ansatz besser vorhergesagt werden kann. Der vorgestellte Ansatz besitzt immer noch einen unbekannten Parameter, für den jedoch eine Approximation möglich ist, wenn in einem geeigneten Datensatz Ergebnisse von Zuchtwertschätzungen zu zwei verschiedenen Zeitpunkten vorliegen. Zusammenfassend kann gesagt werden, dass diese neue Methode die Approximation der Genauigkeit des GBV in vielen Fällen verbessert.
107

Pesquisa de mercado para utilização da tecnologia de testes genômicos em bovinos leiteiros no Triângulo Mineiro / Market research of genomic tests in dairy cattle in the region of Triângulo Mineiro

Rodrigues, Ana Carolina Marques 27 April 2018 (has links)
O avanço na ciência trouxe novas descobertas sobre mecanismos celulares, o que permitiu o desenvolvimento de novos produtos e serviços acerca do melhoramento genético animal. A tecnologia genômica, embora recente, traz para a atividade leiteira a possibilidade de aumentar os ganhos em seleção genética e a rentabilidade econômica. No Brasil, já são comercializados testes genômicos que possibilitam, por meio da coleta de material genético, uma predição das características genéticas dos bovinos jovens com a identificação de características produtivas dos animais, antes da sua expressão. No entanto, por se tratar de uma tecnologia nova, ainda não se sabe a opinião dos pecuaristas sobre a tecnologia, sobre o melhoramento genético, nem se conhece o nível de conhecimento e tecnificação do produtor. Portanto, a presente pesquisa levantou informações sobre o perfil das propriedades de atividade leiteira passíveis de utilizarem os testes genômicos, identificando suas principais necessidades frente à utilização do serviço. Os dados foram coletados com questionários quantitativos e qualitativos, aplicados de forma aleatória a 100 propriedades produtoras de leite, distribuídas na região do Triângulo Mineiro. Após a coleta, foram realizados testes estatísticos de análise descritiva com tabelas de contingência e correlação, com os dados não paramétricos obtidos. Através destes, identificou-se que as propriedades são heterogêneas, com níveis de tecnificação crescentes e que contam cada vez mais com o apoio da assistência técnica. Parte dos pecuaristas ainda não utilizam ferramentas para melhoria da seleção genética como: inseminação artificial, IATF, TE ou outras e embora os testes genômicos já estejam disponíveis comercialmente, apenas 25% dos entrevistados dizem ter conhecimentos básicos sobre a técnica. Para que os testes genômicos atendam às necessidades do consumidor final e sejam comercializados, é importante que além do aumento na disseminação de conhecimento sobre o produto, o seu preço seja acessível, o produtor sinta confiança no resultado e que os grupo de animais a ser avaliado atenda aos requisitos mínimos exigidos pela tecnologia, para assim, se conseguir utilizar o produto com mais alta acurácia. / The breakthrough in science has brought new discoveries about cellular mechanisms, allowing the development of new products and services about animal genetic improvement. Genomic technology, although recent, brings to dairy activity the possibility of increasing gains in genetic selection and economic profitability. In Brazil, genomic tests are already commercialized, which make it possible, through the collection of genetic material, to predict the genetic characteristics of young cattle with the identification of productive characteristics of the animals before their expression. However, because it is a new technology, the opinion of cattle ranchers about technology, genetic improvement, and the level of knowledge and technification of the producer is not yet known. Therefore, the present research raised information on the profile of dairy activity properties that could use genomic tests, identifying their main needs in relation to the use of the service. The data were collected with quantitative and qualitative questionnaires, randomly applied to 100 milk properties, distributed in the Triângulo Mineiro region. After the collection, statistical tests of descriptive analysis and contingency and correlation tables were performed with the non-parametric data obtained. Through these, it was identified that the properties are heterogeneous, with increasing levels of technification and increasingly rely on the support of technical assistance. Some of the cattle ranchers still do not use tools to improve genetic selection such as artificial insemination, IATF, ET or others, and although genomic tests are already commercially available, only 25% of the interviewees have a basic knowledge of the technique. To sell the genomic tests, is important to understand the needs of the final consumer and to be marketed, it is important that the costumer increases the knowledge of the product, the price need to be accessible, the producer feels confidence in the result and that the groups of animals to be evaluated meet the minimum requirements required by the technology in order to be able to use the product with the highest accuracy.
108

Detection of QTLs associated to DBH in a Eucalyptus grandis x Eucalyptus Globulus monoprogeny / Detecção de QTL associado a DAP em Eucaliptus grandis x Eucaliptus Globulus monoprogênie

Torres-Dini, Diego Gabriel [UNESP] 03 February 2017 (has links)
Submitted by DIEGO GABRIEL TORRES DINI null (diego.torres.dini@gmail.com) on 2017-02-25T21:08:29Z No. of bitstreams: 1 Diego Torres Dini Tese Doutoral.pdf: 1232376 bytes, checksum: dc45dfe7c23a8fd647db24acb963c71c (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2017-03-07T13:42:52Z (GMT) No. of bitstreams: 1 torresdini_dg_dr_ilha.pdf: 1232376 bytes, checksum: dc45dfe7c23a8fd647db24acb963c71c (MD5) / Made available in DSpace on 2017-03-07T13:42:52Z (GMT). No. of bitstreams: 1 torresdini_dg_dr_ilha.pdf: 1232376 bytes, checksum: dc45dfe7c23a8fd647db24acb963c71c (MD5) Previous issue date: 2017-02-03 / Outra / In Uruguay, reforestation with Eucalyptus sp. is of fundamental importance to produce paper, pulp and wood. The productivity of these continually grows due to application of breeding techniques, such as hybridization. This study aimed to investigate genetic parameters, productivity, stability, adaptability and to identify SNP markers associated with the diameter breast height (DBH) for to select Eucalypts grandis x Eucalyptus globulus full-sibs hybrid clones. The study was conducted in a clonal test, repeated at two different soils, in the state of Rio Negro, Uruguay. The population was phenotypically characterized to the DBH at 48 months of age and cambium tissues of each individual were sampled for genotyping with EuCHIP60K chip. The mean growth in DBH was similar between both places. The genotype-environment interaction was the simple type, with high genotype correlation in clones’ performance between environments (0.708), indicating the possibility of the same clones being selected for both places. Mean heritability between clones (0.724), coefficient of individual genetic variation (10.9%) and relative variation (0.916), showed the possibility of obtaining gains by selecting clones with higher growth, which was estimated in 3.1% for both sites together. A total of 15,196 markers SNPs were used in the genomic selection for the DBH, but after cleaning of SNPs data, the number was reduced for 15,196 (23.5%). The predictive capacity was expected to be low or negative (-0.15) for this population given the population size (78 individuals). We used the model rrBLUP with a validation of Jackknife. The model do not showed precision to predict the DBH. These results were consistent with theoretical expectations, which indicate that it is necessary to have an improvement population of at least 1,000 phenotyped and genotyped individuals. The DBH is the most important trait in the breeding of the genus Eucalyptus. However its quantitative nature added to the time necessary for this phenotype to develop makes the early detection of this trait are difficult. The identification of molecular markers associated with quantitative phenotypes is a good choice for the identification of QTLs that will help the early detection of individuals with high DBH. Significant markers associated to DBH , were indentificated into the chromosome 6, suggesting the presence of a QTL in this chromosome. Since they are clones originated from vegetative propagation and a full-sibs single-progeny, they should preferably be used for reforestation based on their cloning, since mating between clones can generate endogamy by biparental inbreeding. The utilization of SNPs helped to confirm the degree of parentage between the clones as well as clonal identity control.
109

Investigation of Mechanics of Mutation and Selection by Comparative Sequencing

Zody, Michael C., January 2009 (has links)
Diss. (sammanfattning) Uppsala : Uppsala universitet, 2009.
110

Pan-génome du riz africain cultivé Oryza glaberrima et son ancêtre sauvage Oryza barthii / Pan-genome of cultivated african rice Oryza glaberrima and his wild ancestor Oryza barthii

Monat, Cécile 10 November 2016 (has links)
La diversité d’une espèce est représentée par la somme de la diversité de chacun des individus qui la compose. Elle peut être observée à différentes échelles : individuelle, organique, tissulaire, cellulaire, génomique, génique, ou bien à l’échelle de la base nucléotidique. L’étude de la diversité d’une espèce est importante pour mieux la comprendre et nous permettre de retracer son histoire évolutive, de la comparer avec d’autres espèces notamment entre espèces sauvages et cultivées. Nous nous intéressons aux processus de domestication, et particulièrement à leurs impacts sur la structure du pan-génome. Le pan-génome est divisé en trois compartiments : (i) le core-génome qui contient tous les gènes présents chez tous les individus de l’espèce ; (ii) le génome dispensable qui contient l’ensemble des gènes qui sont absents chez au moins un individu ; (iii) et enfin le génome individu-spécifique qui contient les gènes présents uniquement chez un individu.L’objectif de ce travail de thèse était de mettre au point une nouvelle méthode d’analyse pan-génomique applicable sur un grand nombre d’individus. Pour cela, nous avons travaillé sur un jeu de données de reséquençage massif du riz Africain cultivé O. glaberrima et de son ancêtre sauvage O. barthii. Dans un premier temps nous avons vérifié l’existence d’une structure pan-génomique sur notre modèle. Pour cela nous avons travaillé à petite échelle avec trois accessions de l’espèce cultivée. Elles ont d’abord été séquencées, assemblées, annotées puis nous avons cherché à détecter des séquences spécifiques à chacune de ces accessions.Dans un second temps nous avons mis au point notre méthode en travaillant avec près de 200 génomes des deux espèces.Ces génomes ont été séquencés grâce aux technologies NGS puis directement mappés sur un génome de référence externe, celui du riz Asiatique. Nous avons alors appliqué notre méthode d’analyse pan-génomique basée sur la déviation de la profondeur deséquençage pour chaque gène. Nous avons ensuite comparé les enrichissement d’ontologies par compartiments et par espèce dans le but d’identifier des différences liées aux processus de domestication. Enfin, nous avons étudié plus précisément les appartenances pan-génomiques des membres de famille de gènes.Parce que le pan-génome de l’espèce cultivé est plus petit que le core-génome de l’espèce sauvage nous avons confirmé la perte de diversité en terme de présence/ absence de gènes chez le riz Africain au cours du processus de domestication. Curieusement nous avons aussi mis en avant l’augmentation du nombre de gènes dispensable chez l’espèce cultivée par rapport à son relatif sauvage.Ainsi, malgré une forte réduction du pan-génome de l’espèce cultivé lors de la « première » sélection, les 1000 générations de processus de domestication ont suffit à réintroduire une forme de diversité à travers l’augmentation du nombre de gènes dispensables.Afin d’automatiser une grande partie des manipulations d’analyses de données NGS nous avons aussi développé un outil de génération de pipelines d’analyses. De part sa généricité et sa robustesse il pourra être utilisé dans différents domaines, pour plu-sieurs types de données. Grâce aux nombreux logiciels qui y sont intégrés et de par le suivi que l’équipe de développement entend poursuivre, il pourra être utilisé dans la caractérisation de plus en plus de choses. Par exemple les variations structurales, les associations génotypes-phénotypes, l’épigénétique et pourquoi pas la métagénomique.Ce travail a permis la mise au point d’une nouvelle méthode d’analyse des données pan-génomiques rapide de par sa vision globale plutôt que via des comparaisons deux-à-deux. Cette méthode s’adresse aux génomes grands et complexes comme ceux des plantes, mais aussi aux jeux de données massifs. / Species diversity is represented by the sum of the diversity of each of the individuals composing it. It can be seen at differents cales: individual, organic, tissular, cellular, genomic, gene, and even nucleotic. The study of the diversity of species is important to better understand and allow tracking its evolutionary history, comparing it to other species, in particular wild to cultivated. We focused on the domestication, and particularly its impact on the pan-genome structure.The pan-genome is divided into three compartments: (i) the core-genome containing all the genes present in all individuals of the species; (ii) the dispensable genome containing all genes absent in at least one individual; (iii) and finally the individual-specific genome containing genes present only in one individual.The objective of this thesis was to develop a new method for pan-genomic analysis that can apply to a large number of indi-viduals. For this, we worked on a massive resequencing data set of cultivated African rice O. glaberrima and of its wild ancestor O. barthii. At first we checked the existence of a pan-genomic structure on our model. For this we worked on a small scale, with three accessions of cultivated species. They were sequenced, assembled, annotated then analyzed to detect specific sequences for each accession.Secondly we developed our approach working with nearly 200 genomes of both species. These genomes were sequenced using Illumina technology and mapped to the external reference genome, of the Asian rice. We applied our pan-genomic method analysis based on the deviation of the depth of sequencing for each gene. We then compared the ontology enrichment compartments and species in order to identify differences related to the domestication process. Finally, we looked specifically to pan-genomic genes belonging to gene family. Because the pan-genome of the cultivated species is smaller than the core-genome of the wild one, we confirmed the loss ofdiversity in terms of presence/ absence of genes in African rice during the domestication process. Curiously we have also high lighted the increase in the number of dispensable genes in the crop from its wild relative. Thus, despite a sharp reduction of the pan-genomeof the species cultivated in the “first” selection, the 1,000 generations of domestication process were enough to reintroduce a formof diversity through increasing the number of dispensable genes.To automate much of the data analysis of NGS manipulations we have also developed a tool to generate analysis pipelines.Due to its generic and robustness it can be used in different areas, for several types of data. With many softwares integrated and by monitoring that the development team will continue, it may be used in the characterization of more and more things. For example,structural variations, genotype-phenotype associations, epigenetics and metagenomics. This work enabled the development of a new analytical method for rapid genome-wide data through its global vision ratherthan through two by two comparisons. This method is for large and complex genomes such as those of plants, but also to massivedata sets.

Page generated in 0.0728 seconds