Global ETD Search

41	Modélisations de la dispersion du pollen et estimation à partir de marqueurs génétiques. / Modellings of pollen dispersal and estimation from genetic markers Carpentier, Florence 29 June 2010 (has links) La dispersion du pollen est une composante majeure des flux de gènes chez les plantes, contribuant à la diversité génétique et à sa structure spatiale. Son étude à l'échelle d'un épisode de reproduction permet de comprendre l'impact des changements actuels (fragmentation, anthropisation....) et de proposer des politiques de conservation. Deux types de méthodes basées sur les marqueurs microsatellites estiment la fonction de dispersion du pollen: (i) les méthodes directes (e.g. mating model) basées sur l'assignation de paternité et nécessitant un échantillonnage exhaustif (position et génotype des individus du site étudié, génotypes de graines échantillonnées sur des mères); (ii) les méthodes indirectes (e.g. TwoGener), nécessitant un échantillonnage réduit (génotypes des graines, génotypes et positions des mères) et résumant les données en indices génétiques. Nous proposons la formalisation statistique de ces deux types de méthodes et montrons qu'elles utilisent des fonctions de dispersion différentes: les méthodes directes estiment une fonction forward potentielle (déplacement du pollen depuis le père), les méthodes indirectes une fonction backward intégrative (de la fécondation jusqu'à l'existence du père). Nous explicitons le lien entre fonctions backward et forward, des hypothèses menant à leur équivalence, et des contraintes affectant les fonctions backward. Nous développons enfin une méthode de calcul bayésien approché qui permet (i) une estimation forward, (ii) avec des intervalles de crédibilité, (iii) à partir d'un jeu de données non exhaustif et d'informations partielles (e.g. positions sans génotype) et (iv) l'utilisation de différents modèles de dispersion. / Pollen dispersal is a major component of gene flow in plants. It determines to genetic diversity and spatial genetic structure.Studying it at the scale of a single reproduction event enables to understand the impact of current changes (fragmentation, anthropization ...) and to propose conservation practices.Two types of methods, based on microsatellite markers, estimate pollen dispersal functions : (i) direct methods (e.g. mating model) based on paternity assignment require exhaustif sampling (position and genotype of individuals in the study plot, genotypes of seeds harvested on mothers); (ii) indirect methods (e.g. TwoGener), require a weaker sampling (seeds genotypes, genotypes and positions of their mothers) and summarize data through genetic indices.We propose a statistical formalization of both types of methods and show that they rely on different dispersal functions : the direct methods estimate a potential forward function (pollen transfer from the father), whereas the indirect methods estimate an integrative backward one (from fecondation to father existence). We exhibit the link between forward and backward functions, assumptions leading to their equivalence and constrains affecting the backward functions.Finally, we develop an Approximate Bayesian Computation method, which enable (i) a forward estimation, (ii) with credibility intervals, (iii) from a non exhaustive dataset and partial information (e.g. positions without genotypes) and (iv) the use of different dispersal models. Flux de gènes Noyau de dispersion Fonction backward Fonction forward Calcul bayésien approché Méthodes indirectes Gene flow Dispersal kernel Backward function Forward function Approximate bayesian computation Indirect methods
42	Estimação do índice de memória em processos estocásticos com memória longa: uma abordagem via ABC / Estimation of the memory index of stochastic processes with long memory: an ABC approach Andrade, Plinio Lucas Dias 28 March 2016 (has links) Neste trabalho propomos o uso de um método Bayesiano para estimar o parâmetro de memória de um processo estocástico com memória longa quando sua função de verossimilhança é intratável ou não está disponível. Esta abordagem fornece uma aproximação para a distribuição a posteriori sobre a memória e outros parâmetros e é baseada numa aplicação simples do método conhecido como computação Bayesiana aproximada (ABC). Alguns estimadores populares para o parâmetro de memória serão revisados e comparados com esta abordagem. O emprego de nossa proposta viabiliza a solução de problemas complexos sob o ponto de vista Bayesiano e, embora aproximativa, possui um desempenho muito satisfatório quando comparada com métodos clássicos. / In this work we propose the use of a Bayesian method for estimating the memory parameter of a stochastic process with long-memory when its likelihood function is intractable or unavailable. Such approach provides an approximation for the posterior distribution on the memory and other parameters and it is based on a simple application of the so-called approximate Bayesian computation (ABC). Some popular existing estimators for the memory parameter are reviewed and compared to this method. The use of our proposal allows for the solution of complex problems under a Bayesian point of view and this proposal, although approximative, has a satisfactory performance when compared to classical methods. ABC ABC Approximate Bayesian computation Bayesian inference Computação bayesiana aproximada Inferência bayesiana Long memory stochastic process Processo estocástico com memória longa
43	Applied State Space Modelling of Non-Gaussian Time Series using Integration-based Kalman-filtering Frühwirth-Schnatter, Sylvia January 1993 (has links) (PDF) The main topic of the paper is on-line filtering for non-Gaussian dynamic (state space) models by approximate computation of the first two posterior moments using efficient numerical integration. Based on approximating the prior of the state vector by a normal density, we prove that the posterior moments of the state vector are related to the posterior moments of the linear predictor in a simple way. For the linear predictor Gauss-Hermite integration is carried out with automatic reparametrization based on an approximate posterior mode filter. We illustrate how further topics in applied state space modelling such as estimating hyperparameters, computing model likelihoods and predictive residuals, are managed by integration-based Kalman-filtering. The methodology derived in the paper is applied to on-line monitoring of ecological time series and filtering for small count data. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
44	Integration-based Kalman-filtering for a Dynamic Generalized Linear Trend Model Schnatter, Sylvia January 1991 (has links) (PDF) The topic of the paper is filtering for non-Gaussian dynamic (state space) models by approximate computation of posterior moments using numerical integration. A Gauss-Hermite procedure is implemented based on the approximate posterior mode estimator and curvature recently proposed in 121. This integration-based filtering method will be illustrated by a dynamic trend model for non-Gaussian time series. Comparision of the proposed method with other approximations ([15], [2]) is carried out by simulation experiments for time series from Poisson, exponential and Gamma distributions. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
45	Méthodes d'inférence statistique pour champs de Gibbs / Statistical inference methods for Gibbs random fields Stoehr, Julien 29 October 2015 (has links) La constante de normalisation des champs de Markov se présente sous la forme d'une intégrale hautement multidimensionnelle et ne peut être calculée par des méthodes analytiques ou numériques standard. Cela constitue une difficulté majeure pour l'estimation des paramètres ou la sélection de modèle. Pour approcher la loi a posteriori des paramètres lorsque le champ de Markov est observé, nous remplaçons la vraisemblance par une vraisemblance composite, c'est à dire un produit de lois marginales ou conditionnelles du modèle, peu coûteuses à calculer. Nous proposons une correction de la vraisemblance composite basée sur une modification de la courbure au maximum afin de ne pas sous-estimer la variance de la loi a posteriori. Ensuite, nous proposons de choisir entre différents modèles de champs de Markov cachés avec des méthodes bayésiennes approchées (ABC, Approximate Bayesian Computation), qui comparent les données observées à de nombreuses simulations de Monte-Carlo au travers de statistiques résumées. Afin de pallier l'absence de statistiques exhaustives pour ce choix de modèle, des statistiques résumées basées sur les composantes connexes des graphes de dépendance des modèles en compétition sont introduites. Leur efficacité est étudiée à l'aide d'un taux d'erreur conditionnel original mesurant la puissance locale de ces statistiques à discriminer les modèles. Nous montrons alors que nous pouvons diminuer sensiblement le nombre de simulations requises tout en améliorant la qualité de décision, et utilisons cette erreur locale pour construire une procédure ABC qui adapte le vecteur de statistiques résumés aux données observées. Enfin, pour contourner le calcul impossible de la vraisemblance dans le critère BIC (Bayesian Information Criterion) de choix de modèle, nous étendons les approches champs moyens en substituant la vraisemblance par des produits de distributions de vecteurs aléatoires, à savoir des blocs du champ. Le critère BLIC (Block Likelihood Information Criterion), que nous en déduisons, permet de répondre à des questions de choix de modèle plus large que les méthodes ABC, en particulier le choix conjoint de la structure de dépendance et du nombre d'états latents. Nous étudions donc les performances de BLIC dans une optique de segmentation d'images. / Due to the Markovian dependence structure, the normalizing constant of Markov random fields cannot be computed with standard analytical or numerical methods. This forms a central issue in terms of parameter inference or model selection as the computation of the likelihood is an integral part of the procedure. When the Markov random field is directly observed, we propose to estimate the posterior distribution of model parameters by replacing the likelihood with a composite likelihood, that is a product of marginal or conditional distributions of the model easy to compute. Our first contribution is to correct the posterior distribution resulting from using a misspecified likelihood function by modifying the curvature at the mode in order to avoid overly precise posterior parameters.In a second part we suggest to perform model selection between hidden Markov random fields with approximate Bayesian computation (ABC) algorithms that compare the observed data and many Monte-Carlo simulations through summary statistics. To make up for the absence of sufficient statistics with regard to this model choice, we introduce summary statistics based on the connected components of the dependency graph of each model in competition. We assess their efficiency using a novel conditional misclassification rate that evaluates their local power to discriminate between models. We set up an efficient procedure that reduces the computational cost while improving the quality of decision and using this local error rate we build up an ABC procedure that adapts the summary statistics to the observed data.In a last part, in order to circumvent the computation of the intractable likelihood in the Bayesian Information Criterion (BIC), we extend the mean field approaches by replacing the likelihood with a product of distributions of random vectors, namely blocks of the lattice. On that basis, we derive BLIC (Block Likelihood Information Criterion) that answers model choice questions of a wider scope than ABC, such as the joint selection of the dependency structure and the number of latent states. We study the performances of BLIC in terms of image segmentation. Méthodes de Monte-Carlo Champs de Markov Statistique bayésienne Sélection de modèle Méthodes ABC Vraisemblances composites Monte-Carlo methods Markov random fields Bayesian statistics Model selection Approximate Bayesian computation Composite likelihood
46	Estimação do índice de memória em processos estocásticos com memória longa: uma abordagem via ABC / Estimation of the memory index of stochastic processes with long memory: an ABC approach Plinio Lucas Dias Andrade 28 March 2016 (has links) Neste trabalho propomos o uso de um método Bayesiano para estimar o parâmetro de memória de um processo estocástico com memória longa quando sua função de verossimilhança é intratável ou não está disponível. Esta abordagem fornece uma aproximação para a distribuição a posteriori sobre a memória e outros parâmetros e é baseada numa aplicação simples do método conhecido como computação Bayesiana aproximada (ABC). Alguns estimadores populares para o parâmetro de memória serão revisados e comparados com esta abordagem. O emprego de nossa proposta viabiliza a solução de problemas complexos sob o ponto de vista Bayesiano e, embora aproximativa, possui um desempenho muito satisfatório quando comparada com métodos clássicos. / In this work we propose the use of a Bayesian method for estimating the memory parameter of a stochastic process with long-memory when its likelihood function is intractable or unavailable. Such approach provides an approximation for the posterior distribution on the memory and other parameters and it is based on a simple application of the so-called approximate Bayesian computation (ABC). Some popular existing estimators for the memory parameter are reviewed and compared to this method. The use of our proposal allows for the solution of complex problems under a Bayesian point of view and this proposal, although approximative, has a satisfactory performance when compared to classical methods. ABC Computação bayesiana aproximada Inferência bayesiana Processo estocástico com memória longa ABC Approximate Bayesian computation Bayesian inference Long memory stochastic process
47	The Population Ecology, Molecular Ecology, and Phylogeography of the Diamondback Terrapin (Malaclemys terrapin) Converse, Paul E. 19 September 2016 (has links) No description available. Genetics Biology population genetics Malaclemys terrapin conservation genetics phylogeography turtle gene flow population structure approximate Bayesian computation mutation rate population divergence Chesapeake Bay coalescent theory
48	Scalable Estimation and Testing for Complex, High-Dimensional Data Lu, Ruijin 22 August 2019 (has links) With modern high-throughput technologies, scientists can now collect high-dimensional data of various forms, including brain images, medical spectrum curves, engineering signals, etc. These data provide a rich source of information on disease development, cell evolvement, engineering systems, and many other scientific phenomena. To achieve a clearer understanding of the underlying mechanism, one needs a fast and reliable analytical approach to extract useful information from the wealth of data. The goal of this dissertation is to develop novel methods that enable scalable estimation, testing, and analysis of complex, high-dimensional data. It contains three parts: parameter estimation based on complex data, powerful testing of functional data, and the analysis of functional data supported on manifolds. The first part focuses on a family of parameter estimation problems in which the relationship between data and the underlying parameters cannot be explicitly specified using a likelihood function. We introduce a wavelet-based approximate Bayesian computation approach that is likelihood-free and computationally scalable. This approach will be applied to two applications: estimating mutation rates of a generalized birth-death process based on fluctuation experimental data and estimating the parameters of targets based on foliage echoes. The second part focuses on functional testing. We consider using multiple testing in basis-space via p-value guided compression. Our theoretical results demonstrate that, under regularity conditions, the Westfall-Young randomization test in basis space achieves strong control of family-wise error rate and asymptotic optimality. Furthermore, appropriate compression in basis space leads to improved power as compared to point-wise testing in data domain or basis-space testing without compression. The effectiveness of the proposed procedure is demonstrated through two applications: the detection of regions of spectral curves associated with pre-cancer using 1-dimensional fluorescence spectroscopy data and the detection of disease-related regions using 3-dimensional Alzheimer's Disease neuroimaging data. The third part focuses on analyzing data measured on the cortical surfaces of monkeys' brains during their early development, and subjects are measured on misaligned time markers. In this analysis, we examine the asymmetric patterns and increase/decrease trend in the monkeys' brains across time. / Doctor of Philosophy / With modern high-throughput technologies, scientists can now collect high-dimensional data of various forms, including brain images, medical spectrum curves, engineering signals, and biological measurements. These data provide a rich source of information on disease development, engineering systems, and many other scientific phenomena. The goal of this dissertation is to develop novel methods that enable scalable estimation, testing, and analysis of complex, high-dimensional data. It contains three parts: parameter estimation based on complex biological and engineering data, powerful testing of high-dimensional functional data, and the analysis of functional data supported on manifolds. The first part focuses on a family of parameter estimation problems in which the relationship between data and the underlying parameters cannot be explicitly specified using a likelihood function. We introduce a computation-based statistical approach that achieves efficient parameter estimation scalable to high-dimensional functional data. The second part focuses on developing a powerful testing method for functional data that can be used to detect important regions. We will show nice properties of our approach. The effectiveness of this testing approach will be demonstrated using two applications: the detection of regions of the spectrum that are related to pre-cancer using fluorescence spectroscopy data and the detection of disease-related regions using brain image data. The third part focuses on analyzing brain cortical thickness data, measured on the cortical surfaces of monkeys’ brains during early development. Subjects are measured on misaligned time-markers. By using functional data estimation and testing approach, we are able to: (1) identify asymmetric regions between their right and left brains across time, and (2) identify spatial regions on the cortical surface that reflect increase or decrease in cortical measurements over time. Functional data testing randomization method basis decomposition approximate Bayesian computation (ABC) wavelet decomposition Gaussian Process surrogate model fluctuation analysis mutation probability estimation birth-death process model reg
49	Sélection bayésienne de variables et méthodes de type Parallel Tempering avec et sans vraisemblance Baragatti, Meïli 10 November 2011 (has links) Cette thèse se décompose en deux parties. Dans un premier temps nous nous intéressons à la sélection bayésienne de variables dans un modèle probit mixte.L'objectif est de développer une méthode pour sélectionner quelques variables pertinentes parmi plusieurs dizaines de milliers tout en prenant en compte le design d'une étude, et en particulier le fait que plusieurs jeux de données soient fusionnés. Le modèle de régression probit mixte utilisé fait partie d'un modèle bayésien hiérarchique plus large et le jeu de données est considéré comme un effet aléatoire. Cette méthode est une extension de la méthode de Lee et al. (2003). La première étape consiste à spécifier le modèle ainsi que les distributions a priori, avec notamment l'utilisation de l'a priori conventionnel de Zellner (g-prior) pour le vecteur des coefficients associé aux effets fixes (Zellner, 1986). Dans une seconde étape, nous utilisons un algorithme Metropolis-within-Gibbs couplé à la grouping (ou blocking) technique de Liu (1994) afin de surmonter certaines difficultés d'échantillonnage. Ce choix a des avantages théoriques et computationnels. La méthode développée est appliquée à des jeux de données microarray sur le cancer du sein. Cependant elle a une limite : la matrice de covariance utilisée dans le g-prior doit nécessairement être inversible. Or il y a deux cas pour lesquels cette matrice est singulière : lorsque le nombre de variables sélectionnées dépasse le nombre d'observations, ou lorsque des variables sont combinaisons linéaires d'autres variables. Nous proposons donc une modification de l'a priori de Zellner en y introduisant un paramètre de type ridge, ainsi qu'une manière de choisir les hyper-paramètres associés. L'a priori obtenu est un compromis entre le g-prior classique et l'a priori supposant l'indépendance des coefficients de régression, et se rapproche d'un a priori précédemment proposé par Gupta et Ibrahim (2007).Dans une seconde partie nous développons deux nouvelles méthodes MCMC basées sur des populations de chaînes. Dans le cas de modèles complexes ayant de nombreux paramètres, mais où la vraisemblance des données peut se calculer, l'algorithme Equi-Energy Sampler (EES) introduit par Kou et al. (2006) est apparemment plus efficace que l'algorithme classique du Parallel Tempering (PT) introduit par Geyer (1991). Cependant, il est difficile d'utilisation lorsqu'il est couplé avec un échantillonneur de Gibbs, et nécessite un stockage important de valeurs. Nous proposons un algorithme combinant le PT avec le principe d'échanges entre chaînes ayant des niveaux d'énergie similaires dans le même esprit que l'EES. Cette adaptation appelée Parallel Tempering with Equi-Energy Moves (PTEEM) conserve l'idée originale qui fait la force de l'algorithme EES tout en assurant de bonnes propriétés théoriques et une utilisation facile avec un échantillonneur de Gibbs.Enfin, dans certains cas complexes l'inférence peut être difficile car le calcul de la vraisemblance des données s'avère trop coûteux, voire impossible. De nombreuses méthodes sans vraisemblance ont été développées. Par analogie avec le Parallel Tempering, nous proposons une méthode appelée ABC-Parallel Tempering, basée sur la théorie des MCMC, utilisant une population de chaînes et permettant des échanges entre elles. / This thesis is divided into two main parts. In the first part, we propose a Bayesian variable selection method for probit mixed models. The objective is to select few relevant variables among tens of thousands while taking into account the design of a study, and in particular the fact that several datasets are merged together. The probit mixed model used is considered as part of a larger hierarchical Bayesian model, and the dataset is introduced as a random effect. The proposed method extends a work of Lee et al. (2003). The first step is to specify the model and prior distributions. In particular, we use the g-prior of Zellner (1986) for the fixed regression coefficients. In a second step, we use a Metropolis-within-Gibbs algorithm combined with the grouping (or blocking) technique of Liu (1994). This choice has both theoritical and practical advantages. The method developed is applied to merged microarray datasets of patients with breast cancer. However, this method has a limit: the covariance matrix involved in the g-prior should not be singular. But there are two standard cases in which it is singular: if the number of observations is lower than the number of variables, or if some variables are linear combinations of others. In such situations we propose to modify the g-prior by introducing a ridge parameter, and a simple way to choose the associated hyper-parameters. The prior obtained is a compromise between the conditional independent case of the coefficient regressors and the automatic scaling advantage offered by the g-prior, and can be linked to the work of Gupta and Ibrahim (2007).In the second part, we develop two new population-based MCMC methods. In cases of complex models with several parameters, but whose likelihood can be computed, the Equi-Energy Sampler (EES) of Kou et al. (2006) seems to be more efficient than the Parallel Tempering (PT) algorithm introduced by Geyer (1991). However it is difficult to use in combination with a Gibbs sampler, and it necessitates increased storage. We propose an algorithm combining the PT with the principle of exchange moves between chains with same levels of energy, in the spirit of the EES. This adaptation which we are calling Parallel Tempering with Equi-Energy Move (PTEEM) keeps the original idea of the EES method while ensuring good theoretical properties and a practical use in combination with a Gibbs sampler.Then, in some complex models whose likelihood is analytically or computationally intractable, the inference can be difficult. Several likelihood-free methods (or Approximate Bayesian Computational Methods) have been developed. We propose a new algorithm, the Likelihood Free-Parallel Tempering, based on the MCMC theory and on a population of chains, by using an analogy with the Parallel Tempering algorithm. Sélection bayésienne de variables Modèle probit mixte A priori de Zellner Paramètre ridge Monte Carlo Markov Chains Parallel Tempering Equi-Energy Sampler Approximate Bayesian Computation Méthodes sans vraisemblance Bayesian variable selection Probit mixed model Zellner g-prior Ridge parameter Monte Carlo Markov Chains Parallel Tempering Equi-Energy Sampler Approximate Bayesian Computation Likelihood-Free methods
50	Dynamique évolutive de la durée du cycle de mil : effet des flux de gènes et des pratiques paysannes / Dynamic evolution of pearl millet cycle length : effect of gene flow and farmers’ practices Lakis, Ghayas 17 September 2012 (has links) La domestication du mil (Pennisetum glaucum), dans le Sahel, a engendré une large gamme de variétés, très diversifiées pour de nombreuses caractéristiques agronomiques. En particulier, la diversité de la durée du cycle des variétés locales de mil est une composante essentielle des stratégies mises en œuvre par les agriculteurs pour faire face aux fluctuations des précipitations et assurer une certaine stabilité de la production. Au cours des dernières décennies, des évolutions dans les pratiques agricoles ont été observées, en réponse à des changements écologiques et sociaux. Une des conséquences de ces évolutions pourrait être l’existence de flux de gènes entre variétés à cycle court et variétés à cycle long du fait de l’émergence de situations de parapatrie entre ces deux types de variétés, naguère isolées. Par ailleurs, l’existence de recouvrement des périodes des floraisons de ces deux types variétaux a déjà été préalablement observée. Une telle situation amène donc à s’interroger sur la dynamique évolutive passée et actuelle de la diversité de la longueur du cycle du mil dans le Sahel. Dans la première partie de ma thèse, j’ai évalué les possibilités d’occurrence de flux de gènes entre variétés précoces et tardives de mil dans le Sud-ouest du Niger, en utilisant une approche comparative entre situations contrastées pour la distribution spatiale de ces deux types de variétés. J’ai réalisé : 1) une étude des périodes de floraison de deux variétés de mil (précoce (Haïni Kiré) : 75 à 95 jours entre le semis et la maturité et tardive (Somno) : 105 à 125 jours de durée de cycle) dans plusieurs champs paysans, et dans deux villages. 2) une analyse moléculaire à l’aide de 15 marqueurs microsatellites qui a permis l’estimation des niveaux de différenciation génétique entre populations de mils précoces et tardifs échantillonnés dans 4 villages (incluant les deux villages déjà cités) de la même région.Les résultats ont montré la possibilité effective de flux de pollen et l’existence d’introgressions génétiques entre variétés précoces et tardives. Les mécanismes qui pourraient permettre un maintien sur le long terme d’une différenciation phénologique entre les deux types variétaux malgré l’existence de ces flux de gènes, sont discutés.Dans la deuxième partie, j’ai utilisé une approche « gène candidat » combinée à une démarche de génétique des populations, pour tenter d’identifier des gènes qui auraient pu contribuer à la diversité de la durée de cycle chez le mil. Je me suis focalisé sur trois gènes du contrôle de la transition florale PgHd3a, PgDwarf8 et PgPHYC. Leur implication dans la diversité de la durée de cycle chez plusieurs espèces a déjà été montrée. J’ai estimé les niveaux de différenciation génétique entre les mils domestiques et sauvages, précoces et tardifs pour ces trois gènes J'ai aussi cherché à mettre en évidence, au sein de ces gènes, les empreintes éventuelles d’évènements sélectifs passés. Afin de prendre en compte l’histoire démographique des mils dans les tests de neutralité sélective, j’ai utilisé les données de polymorphisme nucléotidiques de 8 séquences témoins dans le cadre d’une approche Bayésienne.Les résultats obtenus suggèrent fortement que PgHd3a et PgDwarf8 ont été ciblés par la sélection durant la domestication. Cependant, les données ne soutiennent pas l’hypothèse d’un rôle potentiel des trois gènes candidats dans la différenciation de la durée de cycle entre les variétés locales précoces et tardives. L’approc / Domestication of pearl millet (Pennisetum glaucum) in the Sahel of Africa has produced a wide range of diversity in cycle duration of landraces. This diversity allows Sahelian farmers to outface the precipitation fluctuation and to ensure regularity in grain production. Due to ecological and social recent changes, modifications of farmer’s practices could be a factor promoting gene flow between the early and late flowering varieties by increasing the opportunity of neighboring and flowering overlap between them. Such a situation raises questions about the past and current evolutionary dynamics of phenological diversity in this crop.In the first part of my thesis I tried to evaluate the possibility of gene flow between pearl millet varieties in South-West Niger, through a comparative approach among contrasting situations pertaining to the spatial distribution of early and late landraces. Therefore I conducted: 1) a field study where we observed flowering periods, for two types of varieties (early type (Haïni Kiré): 75 to 95 days and late type (Somno): 105 to 125 days of cycle length) in several pearl millet fields, and in two villages 2) a molecular study that allows the assessment of the level of genetic differentiation between late and early flowering populations sampled from four villages (including the two where the field study was conducted) of the same region (Dallol Bosso), using microsatellite markers. I was able to demonstrate the occurrence of pollen flow between the two types of landraces and I also showed evidence of genetic introgression between early and semi-late landraces. Potential mechanisms that would allow for the maintenance of the phenological differentiation between these two varieties and despite the gene flow are discussed.In the second part of this work I used a candidate gene and a population genetics approach, to try to identify genes that may have contributed to the cycle length diversity in pearl millet. I focused on three flowering candidate genes, PgHd3a, PgDwarf8 and PgPHYC which have been shown to be involved in the cycle length genetic diversity in several species, in order to estimate the differentiation between wild and domestic pearl millets and between early and late landraces, on the basis of theses candidate genes. I also tried to track for the fingerprint of eventual past selective events within these candidate genes. To be able to distinguish the effects of selection from the effect of demographic events that occurred during the domestication process, I used 8 neutral STS loci and an Approximate Bayesian Computation approach.My results strongly suggest that PgHd3a and PgDwarf8 were likely targeted by selection during domestication. However, a potential role of any of the three candidate genes in the phenological differentiation between early and late landraces was not supported by our data. The Bayesian approach confirmed the idea, suggested by many authors, that the gene flow from the wild to the domestic genetic pool has contributed significantly to the genetic diversity of the domestic pearl millet. Mil Domestication Flux de gènes Gènes de floraison Durée du cycle Variétés locales Pratiques paysannes et savoirs locaux Histoire démographique Agro-biodiversité Ressources génétiques Diversité nucléotidique Balayage sélectif Coalescence Approximate Bayesian Computation Pearl millet Domestication Gene flow Flowering genes Cycle length Local landraces Farmers’ practices and local knowledge Demographic history Agro-biodiversity Genetic resources Nucleotide diversity Selective sweep Coalescence Approximate Bayesian Computation

Search results