Global ETD Search

1	Coeficientes de determinação, predição intrinsicamente multivariada e genética / Coefficient of determination, intrinsically multivariate and genetic prediction Higa, Carlos Henrique Aguena 21 December 2006 (has links) Esta dissertação de mestrado tem como finalidade descrever o trabalho realizado em uma pesquisa que envolve a análise de expressões gênicas provenientes de microarrays com o objetivo de encontrar genes importantes em um organismo ou em uma determinada doença, como o câncer. Acreditamos que a descoberta desses genes, que chamamos aqui de genes de predição intrinsicamente multivariada (genes IMP), possa levar a descobertas de importantes processos biológicos ainda não conhecidos na literatura. A busca por genes IMP foi realizada em conjunto com estudos de modelos e conceitos matemáticos e estatísticos como redes Booleanas, cadeias de Markov, Coeficiente de Determinação (CoD), Classificação em análise de expressões gênicas e métodos de estimação de erro. No modelo de redes Booleanas, introduzido na Biologia por Kauffman, as expressões gênicas são quantizadas em apenas dois níveis: \"ligado\'\' ou \"desligado\'\'. O nível de expressão (estado) de cada gene, está relacionado com o estado de alguns outros genes através de uma função lógica. Adicionando uma perturbação aleatória a este modelo, temos um modelo mais geral conhecido como redes Booleanas com perturbação. O sistema dinâmico representado pela rede é uma cadeia de Markov ergódica e existe então uma distribuição de probabilidade estacionária. Temos a hipótese de que os experimentos de microarray seguem esta distribuição estacionária. O CoD é uma medida normalizada de quanto a expressão de um gene alvo pode ser melhor predita observando-se a expressão de um conjunto de genes preditores. Uma determinada configuração de CoDs caracteriza um gene alvo como sendo um gene IMP. Podemos trabalhar não somente com genes alvo, mas também com fenótipos alvo, onde o fenótipo de um sistema biológico poderia ser representado por uma variável aleatória binária. Por exemplo, podemos estar interessados em saber quais genes estão relacionados ao fenótipo de vida/morte de uma célula. Como a distribuição de probabilidade das amostras de microarray é desconhecida, o estudo dos CoDs é feito através de estimativas. Entre os métodos de estimação de erro estudados para este propósito podemos citar: Holdout, Resubstituição, Cross-validation, Bootstrap e .632 Bootstrap. Os métodos foram implementados para calcular os CoDs, permitindo então a busca por genes IMP. Os programas implementados na pesquisa foram usados em conjunto com uma pesquisa realizada pelo Prof. Dr. Hugo A. Armelin do Instituto de Química da USP. Este estudo em particular envolve a busca de genes importantes relacionados à morte de células tumorigênicas de camundongo disparada por FGF2 (Fibroblast Growth Factor 2). Nesta pesquisa observamos sub-redes de genes envolvidos no processo biológico em questão e também encontramos genes que podem estar relacionados ao fenômeno de morte das células de camundongo ou que estão, de fato, participando de alguma via disparada pelo FGF2. Esta abordagem de análise de expressões gênicas, juntamente com a pesquisa realizada pelo Prof. Armelin, resulta em uma metodologia para buscas de genes envolvidos em novos mecanismos de células tumorigênicas, ativados pelo FGF2. Na realidade esta metodologia pode ser aplicada em qualquer processo biológico de interesse científico, desde que seja possível modelar o problema proposto no contexto de redes Booleanas, coeficientes de determinação e genes IMP. / This Master\'s degree dissertation describes a research that involves an analysis of gene expression data from microarray experiments with the purpose to find important genes in certain organisms or diseases such as cancer. We believe that these type of genes, called intrinsically multivariately predictive genes (IMP genes), can lead to the discovery of important biological process that are unknown in the literature. The search for IMP genes was done with the study of mathematical and statistical models such as Boolean Networks, Markov Chains, Coefficient of Determination (CoD), Classification and Error Estimation Methods. In the Boolean network model, introduced in Biology by Kauffman, the gene expression is quantized in only two levels: ON and OFF. The expression level (state) of each gene is related with the state of some other genes through a logical function. Adding a random perturbation to this model, we have a more general Boolean-type model called Boolean network with perturbation. The dynamical system represented by this network is an ergodic Markov chain and thereby it possesses a steady-state distribution. We have the hypothesis that the microarray experiments follow this steady-state distribution. The CoD is a normalized measure of how much a gene expression of a target gene can be better predicted observing the expression of a set of predictor genes. A certain configuration of CoDs characterizes a target gene as an IMP gene. We can deal not only with target genes, but also with target phenotypes, where the phenotype of a biological system could be represented by a binary random variable. For example, we could be interested in knowing which genes are related to a life/death cell phenotype. Since the joint probability distribution of the gene expressions is unknown, the CoDs must be computed through estimated values. Among the error estimation methods studied we can cite: Holdout, Resubstitution, Cross-validation, Bootstrap and .632 Bootstrap. Those methods were implemented as a software in order to compute the CoDs and thereby allowing us to search for IMP genes. The software we implemented in this research was used within a research developed by Professor Dr. Hugo A. Armelin from the Instituto de Química - University of Sao Paulo. This particular research involves the search for important genes related to the death of tumorigenic mouse cells triggered by FGF2 (Fibroblast Growth Factor 2). From this research cooperation, we built some gene subnetworks involved in the target biological process and we found some genes that could be related to the death phenotype of mouse cells. This approach of gene expression analysis, together with the research developed by Professor Armelin, results in a methodology to search for important genes that could be involved in new mechanisms of tumorigenic cells triggered by FGF2. Actually, this methodology can be applied to any biological process of scientific interest, if one can model the proposed problem in the context of Boolean Networks, Coefficient of Determination and IMP genes. Coefficient of determination Coeficientes de determinação gene regulatory networks microarray microarray redes de regulação gênica
2	The Correlation Research of Wind Field and Ocean Ambient Noise of Mien-Hua Submarine Canyon Hsu, Hsiu-Wei 26 December 2011 (has links) The ocean ambient noise is one of the important parameters in sonar equation. The ocean ambient noise includes diverse and complex sources like waves, marine life, ships, and etc. Using different ways to analyze are needed to understand the complicated properties of ambient noise. Empirical equation obtained from linear regression of wind speed and ambient noise data is a common method to predict noise level. In this article, the ambient noise data were collected from experiments at northeastern sea of Taiwan in 2007, 2008 and 2009. Applying corresponding wind speed data to observed noise level the time series, coefficient of determination is used to estimate how noise fit with wind speed data of regression. The K-S test and Sea States are used to determine the wind speed threshold. Although it is the same sea area in three years, the ocean ambient noise still has variations due to time and variance of sound sources, so it is important to be investigated. This study compares the statistical properties and distribution in ambient noise level and frequencies with corresponding wind speed in same season. K-S Test Ocean Ambient Noise Sea State Prediction Equation Linear Regress Coefficient of Determination
3	The Credibility Study of Ocean Ambient Noise Prediction Equation Wang, Chien-Jen 09 September 2009 (has links) Ocean Ambient Noise covers wide range except target signal in the sonar equation and is an influential parameter in sonar performance. Empirical equation obtained from linear regression of wind speed and ambient noise data is a common method to predict the noise level. Both ambient noise and wind speed data collected from experiments in southwest and northeast Taiwan sea were analyzed in statistics and time series. Experiment data was also used for prediction equations and further analysis. Coefficient of determination (r2) and F-test for the slope of the regression line were used to estimate how noise fit with wind speed data and the credibility of the regression. The result of the analysis was that the distribution of r2 changes with regions. The values of r2 calculated from northeast experiment data are higher than southwest because of the high percentage of high wind speed. The data from the northeast experiment is considered more appropriate for the prediction of noise level because the higher value of r2. All results of F-test showed the correlation between wind speed are statistically significant except the winter data in the southwest experiment. By using these two indicators, the credibility of the prediction equation can be realized and the prediction performance of sonar is promoted. Ocean Ambient Noise Coefficient of Determination Empirical Equation Linear Regression Statistical Properties
4	Lietuvos geriamojo vandens suvartojimo ir jo pokyčių įvertinimas didžiuosiuose miestuose / The drinking water consumption and assessment of its alteration in major Lithuanian cities Sirtautas, Danius 17 June 2014 (has links) Magistrantūros studijų baigiamajame darbe nagrinėjamos geriamojo vandens suvartojimo ir jo pokyčių tendencijos 1996–2012 m. laikotarpyje. Tyrimo objektu pasirinkti didieji Lietuvos miestai – Vilnius, Kaunas, Klaipėda. Tyrimo tikslas – išanalizuoti statistikos departamento duomenis apie vandens suvartojimą ir jo pokyčių tendencijas bei atlikti įvairių veiksnių, lemiančių vandens suvartojimą, analizę. Šiame darbe analizuojami veiksniai, kurie galėtų daryti įtaką išgaunamo ir sunaudojamo požeminio vandens kiekiui. Šie veiksniai gali būti įvardijami kaip – gyventojų skaičiaus mažėjimas, vandens tiekimo ir nuotekų surinkimo kainos, sumažėjusi pramonės gamyba, BVP kitimas, elektros kainos, būsto vartojimo išlaidos. Požeminio vandens gavyba nuo 1996 m. iki 2012 m. sumažėjo 45 %, todėl galima teigti, kad suprojektuotos vandenvietes ir vandentiekio tinklai dirba sumažintu pajėgumu, kas gali bloginti vandens kokybę dėl užsistovinčio vandens vamzdžiuose. Analizuojant statistikos departamento duomenis buvo ieškomas ryšys tarp suvartoto vandens kiekio ir gyventojų skaičiaus kitimo, elektros kainos, pinigų skiriamų būsto vartojimo išlaidoms vienam asmeniu bei BVP kitimo. Remiantis duomenimis paaiškėjo, kad laikotarpyje nuo 1996 m. iki 2012 m. Lietuvoje vienas žmogus per parą suvartojo apie 106 l vandens. Analizuojant statistinį ryšį tarp vandens suvartojimo ir pinigų sumos skiriamos vartojimo išlaidoms, Kaune, gautas aukštas determinacijos koeficientas R2=0,79, Vilniuje – 0,86, Klaipėdoje... [toliau žr. visą tekstą] / The tendencies of changes in water consumption are discussed in this final master’s studies research in 1996–2012 time periods. As research objects are chosen large Lithuanian cities - Vilnius, Kaunas, Klaipėda. The aim of research is to collect and analyze data on water consumption as well as the changes in trends and perform analysis of factors influencing water consumption. This research shows the factors that could affect the extracted and consumed amount of underground water. These factors can be identified as population, GDP changes, housing consumption expenditure. Groundwater extraction from 1996 to 2012 decreased by 45%, suggesting that the designed water supplies and water-supply system operates at a reduced capacity, which can lead to deterioration in water quality due to standing water in pipes. The analysis of the statistical data of the Department has been requested for the relationship between water consumption and population changes, electricity prices, the money allocated for housing consumption expenditure per capita and GDP changes. The statistical data showed that during the period from 1996 to 2012 one person consumed on average 106 liters of water per day in Lithuania. The analysis of the statistical relationship between the rate of water consumption and the amount of money allocated to consumer spending in Kaunas received a high coefficient of determination R2 = 0.79, Vilnius – 0,86, in Klaipėda much lower – 0,45. Civil Enginering Vandens suvartojimas Vandentiekio tinklai Determinacijos koeficientas Water consumption Water supply networks Coefficient of determination
5	Zdanění a doprava / Taxation and transport Kocsisová, Tereza January 2016 (has links) The aim of this diploma thesis is to find suitable regression models between the chosen statistical data of transport and GDP per capita and determine whether these models are statistically significant. The first part is a theoretical introduction to the problems of transport in terms of economics, as well as a description of methods of regression analysis, which is used in the practical part. The practical part draws data from Eurostat's website that provide for this thesis sufficient statistical basis. The data are graphically processed as scatter charts, based on these are determined mathematical equation of regression. The choice of suitable regression analysis is based on coefficient of determination and significance level is alpha = 0,05.
6	Coeficientes de determinação, predição intrinsicamente multivariada e genética / Coefficient of determination, intrinsically multivariate and genetic prediction Carlos Henrique Aguena Higa 21 December 2006 (has links) Esta dissertação de mestrado tem como finalidade descrever o trabalho realizado em uma pesquisa que envolve a análise de expressões gênicas provenientes de microarrays com o objetivo de encontrar genes importantes em um organismo ou em uma determinada doença, como o câncer. Acreditamos que a descoberta desses genes, que chamamos aqui de genes de predição intrinsicamente multivariada (genes IMP), possa levar a descobertas de importantes processos biológicos ainda não conhecidos na literatura. A busca por genes IMP foi realizada em conjunto com estudos de modelos e conceitos matemáticos e estatísticos como redes Booleanas, cadeias de Markov, Coeficiente de Determinação (CoD), Classificação em análise de expressões gênicas e métodos de estimação de erro. No modelo de redes Booleanas, introduzido na Biologia por Kauffman, as expressões gênicas são quantizadas em apenas dois níveis: \"ligado\'\' ou \"desligado\'\'. O nível de expressão (estado) de cada gene, está relacionado com o estado de alguns outros genes através de uma função lógica. Adicionando uma perturbação aleatória a este modelo, temos um modelo mais geral conhecido como redes Booleanas com perturbação. O sistema dinâmico representado pela rede é uma cadeia de Markov ergódica e existe então uma distribuição de probabilidade estacionária. Temos a hipótese de que os experimentos de microarray seguem esta distribuição estacionária. O CoD é uma medida normalizada de quanto a expressão de um gene alvo pode ser melhor predita observando-se a expressão de um conjunto de genes preditores. Uma determinada configuração de CoDs caracteriza um gene alvo como sendo um gene IMP. Podemos trabalhar não somente com genes alvo, mas também com fenótipos alvo, onde o fenótipo de um sistema biológico poderia ser representado por uma variável aleatória binária. Por exemplo, podemos estar interessados em saber quais genes estão relacionados ao fenótipo de vida/morte de uma célula. Como a distribuição de probabilidade das amostras de microarray é desconhecida, o estudo dos CoDs é feito através de estimativas. Entre os métodos de estimação de erro estudados para este propósito podemos citar: Holdout, Resubstituição, Cross-validation, Bootstrap e .632 Bootstrap. Os métodos foram implementados para calcular os CoDs, permitindo então a busca por genes IMP. Os programas implementados na pesquisa foram usados em conjunto com uma pesquisa realizada pelo Prof. Dr. Hugo A. Armelin do Instituto de Química da USP. Este estudo em particular envolve a busca de genes importantes relacionados à morte de células tumorigênicas de camundongo disparada por FGF2 (Fibroblast Growth Factor 2). Nesta pesquisa observamos sub-redes de genes envolvidos no processo biológico em questão e também encontramos genes que podem estar relacionados ao fenômeno de morte das células de camundongo ou que estão, de fato, participando de alguma via disparada pelo FGF2. Esta abordagem de análise de expressões gênicas, juntamente com a pesquisa realizada pelo Prof. Armelin, resulta em uma metodologia para buscas de genes envolvidos em novos mecanismos de células tumorigênicas, ativados pelo FGF2. Na realidade esta metodologia pode ser aplicada em qualquer processo biológico de interesse científico, desde que seja possível modelar o problema proposto no contexto de redes Booleanas, coeficientes de determinação e genes IMP. / This Master\'s degree dissertation describes a research that involves an analysis of gene expression data from microarray experiments with the purpose to find important genes in certain organisms or diseases such as cancer. We believe that these type of genes, called intrinsically multivariately predictive genes (IMP genes), can lead to the discovery of important biological process that are unknown in the literature. The search for IMP genes was done with the study of mathematical and statistical models such as Boolean Networks, Markov Chains, Coefficient of Determination (CoD), Classification and Error Estimation Methods. In the Boolean network model, introduced in Biology by Kauffman, the gene expression is quantized in only two levels: ON and OFF. The expression level (state) of each gene is related with the state of some other genes through a logical function. Adding a random perturbation to this model, we have a more general Boolean-type model called Boolean network with perturbation. The dynamical system represented by this network is an ergodic Markov chain and thereby it possesses a steady-state distribution. We have the hypothesis that the microarray experiments follow this steady-state distribution. The CoD is a normalized measure of how much a gene expression of a target gene can be better predicted observing the expression of a set of predictor genes. A certain configuration of CoDs characterizes a target gene as an IMP gene. We can deal not only with target genes, but also with target phenotypes, where the phenotype of a biological system could be represented by a binary random variable. For example, we could be interested in knowing which genes are related to a life/death cell phenotype. Since the joint probability distribution of the gene expressions is unknown, the CoDs must be computed through estimated values. Among the error estimation methods studied we can cite: Holdout, Resubstitution, Cross-validation, Bootstrap and .632 Bootstrap. Those methods were implemented as a software in order to compute the CoDs and thereby allowing us to search for IMP genes. The software we implemented in this research was used within a research developed by Professor Dr. Hugo A. Armelin from the Instituto de Química - University of Sao Paulo. This particular research involves the search for important genes related to the death of tumorigenic mouse cells triggered by FGF2 (Fibroblast Growth Factor 2). From this research cooperation, we built some gene subnetworks involved in the target biological process and we found some genes that could be related to the death phenotype of mouse cells. This approach of gene expression analysis, together with the research developed by Professor Armelin, results in a methodology to search for important genes that could be involved in new mechanisms of tumorigenic cells triggered by FGF2. Actually, this methodology can be applied to any biological process of scientific interest, if one can model the proposed problem in the context of Boolean Networks, Coefficient of Determination and IMP genes. Coeficientes de determinação microarray redes de regulação gênica Coefficient of determination gene regulatory networks microarray
7	Um algoritmo eficiente para o crescimento de redes sobre o grafo probabilístico completo do sistema de regulação gênica considerado / An efficient algorithm for growing networks on the regulatory gene system complete random graph Lima, Leandro de Araujo 10 August 2009 (has links) Sabe-se biologicamente que o nível de expressão dos genes está entre os fatores podem indicar o quanto estes estão em atividade em determinado momento. Avanços na tecnologia de microarray têm possibilitado medir os níveis de expressão de milhares de genes ao mesmo tempo. Esses dados podem ser medidos de maneira a formarem uma série temporal, que pode ser tratada estatisticamente para serem obtidas informações sobre as relações entre os genes. Já foram propostos vários modelos para tratar redes gênicas matematicamente. Esses modelos têm evoluído de forma a agregarem cada vez mais características das redes reais. Neste trabalho, será feita uma revisão de modelos discretos para redes de regulação gênica, primeiramente com as redes Booleanas, modelo determinístico, e depois as redes Booleanas probabilísticas e as redes genéticas probabilísticas, modelos que tratam o problema estocasticamente. Usando o último modelo citado, serão mostrados dois métodos para estimar o nível de predição entre os genes, coeficiente de determinação e informação mútua. Além de se estimar essas relações, foram desenvolvidas algumas técnicas para construir redes a partir de genes específicos, que são chamados sementes. Também serão apresentados dois desses métodos de crescimento de redes e, baseado neles, um terceiro método que foi desenvolvido neste trabalho. Foi criado um algoritmo que realiza o crescimento da rede mudando as sementes a cada iteração, agrupando estes genes em grupos com diferentes níveis de confiança, chamados camadas. O algoritmo também usa outros critérios para agregar novos genes à rede. Após a explanação desses métodos, será mostrado um software que, a partir de dados temporais de expressão gênica, estima as dependências entre os genes e executa o crescimento da rede em torno de genes que se deseje estudar. Também serão mostradas as melhorias feitas no programa. Ao final, serão apresentados alguns testes feitos com dados do Plasmodium falciparum, parasita causador da malária. / It\'s known that gene expression levels are among the factors that can show how genes are active in certain moment. Advances in microarray technology have given the possibility to measure expression levels of thousands of genes in a certain instant of time. These data constitute time series that we can treat statistically in order to get information genes relationships. Many models were proposed to treat gene networks mathematically. These models have evolved to aggregate more and more real networks features. In this work, it is made a brief review of discrete models of regulatory genetic networks, initially Boolean networks, a deterministic model, and then probabilistic Boolean networks and probabilistic genetic networks, models that treat the problem stochastically. Using the last model cited, two methods to estimate the prediction level among genes are shown, coefficient of determination and mutual information. Besides estimating these relations, some techniques have been developed to construct networks from specific genes, that are called seeds. It will be also shown two methods of network growth and, based on these, a third method that was developed during this work. An algorithm was created, such that it grows the network changing the seeds in each iteration, grouping these genes in groups with different level of confidence, called layers. The algorithm also uses other criteria to add new genes to the network. After studying these methods, it will be shown a software that, using time series gene expression data, estimates dependences among genes and runs the network growing process around chosen genes. It is also presented the improvements made in the program. Finally, some tests using data of Plasmodium falciparum, malaria parasite, are shown. coefficient of determination coeficiente de determinação crescimento de redes informação mútua média Mean mutual information network growth Probabilistic genetic networks Redes gênicas probabilísticas
8	Um algoritmo eficiente para o crescimento de redes sobre o grafo probabilístico completo do sistema de regulação gênica considerado / An efficient algorithm for growing networks on the regulatory gene system complete random graph Leandro de Araujo Lima 10 August 2009 (has links) Sabe-se biologicamente que o nível de expressão dos genes está entre os fatores podem indicar o quanto estes estão em atividade em determinado momento. Avanços na tecnologia de microarray têm possibilitado medir os níveis de expressão de milhares de genes ao mesmo tempo. Esses dados podem ser medidos de maneira a formarem uma série temporal, que pode ser tratada estatisticamente para serem obtidas informações sobre as relações entre os genes. Já foram propostos vários modelos para tratar redes gênicas matematicamente. Esses modelos têm evoluído de forma a agregarem cada vez mais características das redes reais. Neste trabalho, será feita uma revisão de modelos discretos para redes de regulação gênica, primeiramente com as redes Booleanas, modelo determinístico, e depois as redes Booleanas probabilísticas e as redes genéticas probabilísticas, modelos que tratam o problema estocasticamente. Usando o último modelo citado, serão mostrados dois métodos para estimar o nível de predição entre os genes, coeficiente de determinação e informação mútua. Além de se estimar essas relações, foram desenvolvidas algumas técnicas para construir redes a partir de genes específicos, que são chamados sementes. Também serão apresentados dois desses métodos de crescimento de redes e, baseado neles, um terceiro método que foi desenvolvido neste trabalho. Foi criado um algoritmo que realiza o crescimento da rede mudando as sementes a cada iteração, agrupando estes genes em grupos com diferentes níveis de confiança, chamados camadas. O algoritmo também usa outros critérios para agregar novos genes à rede. Após a explanação desses métodos, será mostrado um software que, a partir de dados temporais de expressão gênica, estima as dependências entre os genes e executa o crescimento da rede em torno de genes que se deseje estudar. Também serão mostradas as melhorias feitas no programa. Ao final, serão apresentados alguns testes feitos com dados do Plasmodium falciparum, parasita causador da malária. / It\'s known that gene expression levels are among the factors that can show how genes are active in certain moment. Advances in microarray technology have given the possibility to measure expression levels of thousands of genes in a certain instant of time. These data constitute time series that we can treat statistically in order to get information genes relationships. Many models were proposed to treat gene networks mathematically. These models have evolved to aggregate more and more real networks features. In this work, it is made a brief review of discrete models of regulatory genetic networks, initially Boolean networks, a deterministic model, and then probabilistic Boolean networks and probabilistic genetic networks, models that treat the problem stochastically. Using the last model cited, two methods to estimate the prediction level among genes are shown, coefficient of determination and mutual information. Besides estimating these relations, some techniques have been developed to construct networks from specific genes, that are called seeds. It will be also shown two methods of network growth and, based on these, a third method that was developed during this work. An algorithm was created, such that it grows the network changing the seeds in each iteration, grouping these genes in groups with different level of confidence, called layers. The algorithm also uses other criteria to add new genes to the network. After studying these methods, it will be shown a software that, using time series gene expression data, estimates dependences among genes and runs the network growing process around chosen genes. It is also presented the improvements made in the program. Finally, some tests using data of Plasmodium falciparum, malaria parasite, are shown. coeficiente de determinação crescimento de redes informação mútua média Redes gênicas probabilísticas coefficient of determination Mean mutual information network growth Probabilistic genetic networks
9	Contributions to genomic selection and association mapping in structured and admixed populations : application to maize / Contributions à la sélection génomique et à la génétique d'association en populations structurées et admixées : application au maïs Rio, Simon 26 April 2019 (has links) L'essor des marqueurs moléculaires (SNPs) a révolutionné les méthodes de génétique quantitative en permettant l'identification de régions impliquées dans le déterminisme génétique des caractères (QTLs) via la génétique d'association (GWAS), ou encore la prédiction des performances d'individus sur la base de leur information génomique (GS). La stratification des populations en groupes génétiques est courante en sélection animale et végétale. Cette structure peut impacter les méthodes de GWAS et de GS via des différences de fréquence et d'effets des allèles des QTL, ainsi que par des différences de déséquilibre de liaison (LD) entre SNP et QTL selon les groupes.Pendant cette thèse, deux panels de diversité de maïs ont été utilisés, présentant des niveaux différents de structuration: le panel “Amaizing Dent” représentant les lignées dentées utilisées en Europe et le panel “Flint-Dent” incluant des lignées dentées, cornées européennes, ainsi que des lignées admixées entre ces deux groupes.En GS, l'impact de la structure génétique sur la qualité des prédictions a été évalué au sein du premier panel pour des caractères de productivité et de phénologie. Cette étude a mis en évidence l'intérêt d'une population d'entraînement (TS) dont la constitution en matière de groupes génétiques est similaire à celle de la population à prédire. Assembler les différents groupes au sein d'un TS multi-groupe apparaît comme une solution efficace pour prédire un large spectre de diversité génétique. Des indicateurs a priori de la précision des prédictions génomiques, basés sur le coefficient de détermination, ont également été évalués, mettant en évidence une efficacité variable selon le groupe et le caractère étudié.Une nouvelle méthodologie GWAS a ensuite été développée pour étudier l'hétérogénéité des effets capturés par les SNPs selon les groupes. L'intégration des individus admixés à l'analyse permet de séparer les effets des facteurs responsables de l'hétérogénéité des effets alléliques: différence génomique locale (liée au LD ou à une mutation spécifique d'un groupe) ou interactions épistatiques entre le QTL et le fonds génétique. Cette méthodologie a été appliquée au panel “Flint-Dent” pour la précocité de floraison. Des QTL ont été détéctés comme présentant des effets groupe-spécifiques interagissant ou non avec le fonds génétique. De nombreux QTL présentant un profil original ont pu être mis en évidence, incluant des locus connus tels que Vgt1, Vgt2 ou Vgt3. Une importante épistasie directionnelle a aussi été mise en évidence grâce aux individus admixés, confortant l'existence d'interactions épistatiques avec le fonds génétique pour ce caractère.Sachant l'existence de cette hétérogénéité d’effets alléliques, nous avons développé deux modèles de prédictions génomiques nommées Multi-group Admixed GBLUP (MAGBLUP). Ceux-ci modélisent des effets groupe-spécifiques aux QTLs et sont adaptés à la prédiction d'individus admixés. Le premier permet d'identifier la variance génétique additionnelle créée par l'admixture (variance de ségrégation), alors que le second permet d'évaluer le degré de conservation des effets alléliques entre groupes. Ces deux modèles ont montré un intérêt certain par rapport à des modèles standards pour prédire des caractères simulés, mais plus limité sur des caractères réels.Enfin, l'intérêt des individus admixés dans la constitution de TS multi-groupes a été évalué à l'aide du second panel. Si leur intérêt a clairement été mis en évidence pour des caractères simulés, des résultats plus variables ont été observés avec les caractères réels, pouvant s'expliquer par la présence d'interactions avec le fonds génétique.Les nouvelles méthodes et l'utilisation d'individus admixés ouvrent des pistes de recherches intéressantes pour les études de génétique quantitative en population structurée. / The advent of molecular markers (SNPs) has revolutionized quantitative genetics methods by enabling the identification of regions involved in the genetic determinism of traits (QTLs) thanks to association studies (GWAS), or the prediction of the performance of individuals using genomic information (GS). The stratification of populations into genetic groups is common in animal and plant breeding. This structure can impact GWAS and GS methods through group differences in QTL allele frequencies and effects, as well as in linkage disequilibrium (LD) between SNP and QTL.During this thesis, two maize diversity panels were used, presenting different levels of structuration: the "Amaizing Dent" panel representing the diversity of dent lines used in Europe and the "Flint-Dent" panel including dent, flint and admixed lines between these two groups.In GS, the impact of genetic structure on genomic prediction accuracy was evaluated in the first panel for productivity and phenology traits. This study highlighted the interest of a training population (TS) whose constitution in terms of genetic groups is similar to that of the population to be predicted. Assembling the different groups within a multi-group TS appears as an effective solution to predict a broad spectrum of genetic diversity. A priori indicators of genomic prediction accuracy, based on the coefficient of determination, were also evaluated and highlighted a variable efficiency depending on the group and the trait.A new GWAS methodology was then developed to study the heterogeneity of the allele effects captured by SNPs depending on the group. The integration of admixed individuals to such analyses allows to disentangle the factors causing the heterogeneity of allele effects across groups: local genomic difference (related to LD or group-specific mutation) or epistatic interactions between the QTL and the genetic background. This methodology was applied to the "Flint-Dent" panel for flowering time. QTLs have been detected as presenting group-specific effects interacting or not with the genetic background. QTLs with an original profile have been highlighted, including known loci such as Vgt1, Vgt2 or Vgt3. Significant directional epistasis has also been demonstrated using admixed individuals and supported the existence of epistatic interactions with the genetic background for this trait.Based on the existence of such heterogeneity of allele effects, we have developed two genomic prediction models named Multi-group Admixed GBLUP (MAGBLUP). Both model group-specific QTL effects and are suited to the prediction of admixed individuals. The first allows the identification the additional genetic variance created by the admixture (segregation variance), while the second allows the evaluations of the degree of conservation of SNP allele effects across groups. These two models showed a certain interest compared to standard models to predict simulated traits, but it was more limited on real traits.Finally, the interest of admixed individuals in multi-group TS was evaluated using the second panel. Although their interest has been clearly demonstrated for simulated traits, more variable results have been observed with the real traits, which can be explained by the presence of interactions with the genetic background.The new methods and the use of admixed individuals open interesting lines of research for quantitative genetics studies in structured population. Admixture Coefficient de Détermination (CD) Prédiction génomique Structure génétique Gwas Epistasie Admixture Coefficient of Determination Genomic Prediction Genetic Structure Gwas Epistasis
10	Comprehension and Interpretation of Common Language Effect Size Displays Moracz, Kelle 27 November 2019 (has links) No description available. Statistics Effect sizes

Search results