• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 13
  • 12
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 75
  • 75
  • 33
  • 33
  • 26
  • 26
  • 26
  • 23
  • 18
  • 14
  • 14
  • 14
  • 11
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Análise bayesiana do modelo fatorial dinâmico para um vetor de séries temporais usando distribuições elípticas. / Bayesian Analysis of the dynamic factorial models for a time series vector using elliptical distribuitions.

Livia Costa Borges 27 May 2008 (has links)
A análise fatorial é uma importante ferramenta estatística que tem amplas aplicações práticas e explica a correlação entre um grande número de variáveis observáveis em termos de um pequeno número de variáveis não observáveis, conhecidas como variáveis latentes. A proposta deste trabalho é fazer a análise Bayesiana, que incorpora à análise o conhecimento que se tenha sobre os parâmetros antes da coleta dos dados, do modelo fatorial dinâmico na classe de modelos elípticos multivariados, assumindo que a um vetor de q séries temporais pode-se ajustar um modelo fatorial com k < q fatores mais um ruído branco, e que a parte latente segue um modelo vetorial auto-regressivo. A classe de modelos elípticos citada acima é rica em distribuições simétricas com caudas mais pesadas que as da distribuição normal, característica importante na análise de séries financeiras. Essa classe inclui as distribuições t de Student, exponencial potência, normal contaminada, entre outras. A inferência sobre os parâmetros foi feita utilizando métodos de Monte Carlo via Cadeias de Markov, com os algoritmos Metropolis-Hastings e Griddy-Gibbs, através da obtenção das distribuições a posteriori dos parâmetros e dos fatores. A determinação da convergência do processo foi feita por técnicas gráficas e pelos métodos de Geweke (1992), de Heidelberger e Welch (1983) e Half-Width. O método foi ilustrado usando dados reais e simulados. / The factor analysis is an important statistical tool that has wide practical applications and it explains the correlation among a large number of observable variables in terms of a small number of unobservable variables, known as latent variables. The proposal of this work is the Bayesian analysis, which incorporates the information we have concerning the parameters before collecting data into the analysis of a dynamical factor model in the class of multivariate elliptical models, where the factors follow a multivariate autoregressive model, assuming that a vector of q time series can be adjusted with k < q factors and a white noise. The class of elliptical models is rich in symmetrical distributions with heavier tails than the normal distribution, which is an important characteristic in financial series analysis. This class includes t-Student, power exponential, contaminated normal and other distributions. The parameters inference was made through Monte Carlo Markov Chain methods, with Metropolis-Hastings and Griddy-Gibbs algorithms, by obtaining the parameters and factors posteriori distributions. The convergence process was made through graphical technics and by Geweke (1992) and by Heidelberger and Welch (1983) and Half- Width methods. The method was illustrated using simulated and real data.
12

Abordagem clássica e Bayesiana em modelos simétricos transformados aplicados à estimativa de crescimento em altura de Eucalyptus urophylla no Polo Gesseiro do Araripe-PE

BARROS, Kleber Napoleão Nunes de Oliveira 22 February 2010 (has links)
Submitted by (ana.araujo@ufrpe.br) on 2016-08-01T17:35:24Z No. of bitstreams: 1 Kleber Napoleao Nunes de Oliveira Barros.pdf: 2964667 bytes, checksum: a3c757cb7ed16fc9c38b7834b6e0fa29 (MD5) / Made available in DSpace on 2016-08-01T17:35:24Z (GMT). No. of bitstreams: 1 Kleber Napoleao Nunes de Oliveira Barros.pdf: 2964667 bytes, checksum: a3c757cb7ed16fc9c38b7834b6e0fa29 (MD5) Previous issue date: 2010-02-22 / It is presented in this work the growth model nonlinear Chapman-Richards with distribution of errors following the new class of symmetric models processed and Bayesian inference for the parameters. The objective was to apply this structure, via Metropolis-Hastings algorithm, in order to select the equation that best predicted heights of clones of Eucalyptus urophilla experiment established at the Agronomic Institute of Pernambuco (IPA) in the city of Araripina . The Gypsum Pole of Araripe is an industrial zone, located on the upper interior of Pernambuco, which consumes large amount of wood from native vegetation (caatinga) for calcination of gypsum. In this scenario, there is great need for a solution, economically and environmentally feasible that allows minimizing the pressure on native vegetation. The generus Eucalyptus presents itself as an alternative for rapid development and versatility. The height has proven to be an important factor in prognosis of productivity and selection of clones best adapted. One of the main growth curves, is the Chapman-Richards model with normal distribution for errors. However, some alternatives have been proposed in order to reduce the influence of atypical observations generated by this model. The data were taken from a plantation, with 72 months. Were performed inferences and diagnostics for processed and unprocessed model with many distributions symmetric. After selecting the best equation, was shown some convergence of graphics and other parameters that show the fit to the data model transformed symmetric Student’s t with 5 degrees of freedom in the parameters using Bayesian inference. / É abordado neste trabalho o modelo de crescimento não linear de Chapman-Richards com distribuição dos erros seguindo a nova classe de modelos simétricos transformados e inferência Bayesiana para os parâmetros. O objetivo foi aplicar essa estrutura, via algoritmo de Metropolis-Hastings, afim de selecionar a equação que melhor estimasse as alturas de clones de Eucalyptus urophilla provenientes de experimento implantado no Instituto Agronômico de Pernambuco (IPA), na cidade de Araripina. O Polo Gesseiro do Araripe é uma zona industrial, situada no alto sertão pernambucano, que consume grande quantidade de lenha proveniente da vegetação nativa (caatinga) para calcinação da gipsita. Nesse cenário, há grande necessidade de uma solução, econômica e ambientalmente, viável que possibilite uma minimização da pressão sobre a flora nativa. O gênero Eucalyptus se apresenta como alternativa, pelo seu rápido desenvolvimento e versatilidade. A altura tem se revelado fator importante na prognose de produtividade e seleção de clones melhores adaptados. Uma das principais curvas de crescimento, é o modelo de Chapman- Richards com distribuição normal para os erros. No entanto, algumas alternativas tem sido propostas afim de reduzir a influência de observações atípicas geradas por este modelo. Os dados foram retirados de uma plantação, com 72 meses. Foram realizadas as inferências e diagnósticos para modelo transformado e não transformado com diversas distribuições simétricas. Após a seleção da melhor equação, foram mostrados alguns gráficos da convergência dos parâmetros e outros que comprovam o ajuste aos dados do modelo simétrico transformado t de Student com 5 graus de liberdade utilizando inferência Bayesiana nos parâmetros.
13

Régression bayésienne sous contraintes de régularité et de forme. / Bayesian regression under shape and smoothness restriction.

Khadraoui, Khader 08 December 2011 (has links)
Nous étudions la régression bayésienne sous contraintes de régularité et de forme. Pour cela,on considère une base de B-spline pour obtenir une courbe lisse et nous démontrons que la forme d'une spline engendrée par une base de B-spline est contrôlée par un ensemble de points de contrôle qui ne sont pas situés sur la courbe de la spline. On propose différents types de contraintes de forme (monotonie, unimodalité, convexité, etc). Ces contraintes sont prises en compte grâce à la loi a priori. L'inférence bayésienne a permis de dériver la distribution posteriori sous forme explicite à une constante près. En utilisant un algorithme hybride de type Metropolis-Hastings avec une étape de Gibbs, on propose des simulations suivant la distribution a posteriori tronquée. Nous estimons la fonction de régression par le mode a posteriori. Un algorithme de type recuit simulé a permis de calculer le mode a posteriori. La convergence des algorithmes de simulations et du calcul de l'estimateur est prouvée. En particulier, quand les noeuds des B-splines sont variables, l'analyse bayésienne de la régression sous contrainte devient complexe. On propose des schémas de simulations originaux permettant de générer suivant la loi a posteriori lorsque la densité tronquée des coefficients de régression prend des dimensions variables. / We investigate the Bayesian regression under shape and smoothness constraints. We first elicita Bayesian method for regression under shape restrictions and smoothness conditions. Theregression function is built from B-spline basis that controls its regularity. Then we show thatits shape can be controlled simply from its coefficients in the B-spline basis. This is achievedthrough the control polygon whose definition and some properties are given in this article.The regression function is estimated by the posterior mode. This mode is calculated by asimulated annealing algorithm which allows to take into account the constraints of form inthe proposal distribution. A credible interval is obtained from simulations using Metropolis-Hastings algorithm with the same proposal distribution as the simulated annealing algorithm.The convergence of algorithms for simulations and calculation of the estimator is proved. Inparticular, in the case of Bayesian regression under constraints and with free knots, Bayesiananalysis becomes complex. we propose original simulation schemes which allows to simulatefrom the truncated posterior distribution with free dimension.
14

Fully bayesian structure learning of bayesian networks and their hypergraph extensions / Estimation bayésienne de la structure des réseaux bayésiens puis d'hypergraphes

Datta, Sagnik 07 July 2016 (has links)
Dans cette thèse, j’aborde le problème important de l’estimation de la structure des réseaux complexes, à l’aide de la classe des modèles stochastiques dits réseaux Bayésiens. Les réseaux Bayésiens permettent de représenter l’ensemble des relations d’indépendance conditionnelle. L’apprentissage statistique de la structure de ces réseaux complexes par les réseaux Bayésiens peut révéler la structure causale sous-jacente. Il peut également servir pour la prédiction de quantités qui sont difficiles, coûteuses, ou non éthiques comme par exemple le calcul de la probabilité de survenance d’un cancer à partir de l’observation de quantités annexes, plus faciles à obtenir. Les contributions de ma thèse consistent en : (A) un logiciel développé en langage C pour l’apprentissage de la structure des réseaux bayésiens; (B) l’introduction d’un nouveau "jumping kernel" dans l’algorithme de "Metropolis-Hasting" pour un échantillonnage rapide de réseaux; (C) l’extension de la notion de réseaux Bayésiens aux structures incluant des boucles et (D) un logiciel spécifique pour l’apprentissage des structures cycliques. Notre principal objectif est l’apprentissage statistique de la structure de réseaux complexes représentée par un graphe et par conséquent notre objet d’intérêt est cette structure graphique. Un graphe est constitué de nœuds et d’arcs. Tous les paramètres apparaissant dans le modèle mathématique et différents de ceux qui caractérisent la structure graphique sont considérés comme des paramètres de nuisance. / In this thesis, I address the important problem of the determination of the structure of complex networks, with the widely used class of Bayesian network models as a concrete vehicle of my ideas. The structure of a Bayesian network represents a set of conditional independence relations that hold in the domain. Learning the structure of the Bayesian network model that represents a domain can reveal insights into its underlying causal structure. Moreover, it can also be used for prediction of quantities that are difficult, expensive, or unethical to measure such as the probability of cancer based on other quantities that are easier to obtain. The contributions of this thesis include (A) a software developed in C language for structure learning of Bayesian networks; (B) introduction a new jumping kernel in the Metropolis-Hasting algorithm for faster sampling of networks (C) extending the notion of Bayesian networks to structures involving loops and (D) a software developed specifically to learn cyclic structures. Our primary objective is structure learning and thus the graph structure is our parameter of interest. We intend not to perform estimation of the parameters involved in the mathematical models.
15

Bayesian Estimation of Sea Clutter Parameters for Radar - A Stochastic Approach / Bayesiansk estimering av sjöklutterparametrar för radar - en stokastisk approach

Öijar Jansson, Emma January 2023 (has links)
Radars operating at sea encounter a common phenomenon known as sea clutter, characterized by undesired reflections originating from the sea surface. This phenomenon can significantly impair the radar’s capacity to detect small, slow-moving targets. Therefore, it is crucial to gain a comprehensive understanding of the statistical attributes that describes the sea clutter. This comprehension is pivotal for the development of efficient signal processing strategies. The core of this work revolves around the imperative requirement for accurate statistical models to characterize sea clutter. Within this context, this work particularly explores the application of Field’s model. Field’s model describes the sea clutter process using three stochastic differential equations that form the dynamical process of the complex reflectivity of the sea surface. One equation describes the radar cross section, which is given by a Cox-Ingersoll-Ross process, parameterized by the parameters A and α. The other equations describe the speckle process, which is a complex Ornstein-Uhlenbeck process parameterized by B. The aim of this thesis is to explore the possibilities in estimating the parameters A, α and B in Field’s model through the application of Bayesian inference. To achieve this objective, Metropolis-Hastings and Sequential Monte Carlo methods are employed. The clutter data, represented by the complex reflectivity, is synthetically generated by using the Euler-Maruyma and Milstein schemes. Three algorithms are designed for estimating the sea clutter parameters. Two algorithms require 300 seconds of data and are based on the approach suggested by Clement Roussel in his PhD thesis [1]. Specifically, these algorithms employ the Metropolis-Hastings method for estimating A, α and B, respectively. As input data to the algorithms, estimators of the Cox-Ingersoll-Ross process and the real part of the Ornstein-Uhlenbeck process are utilized. In contrast, the last algorithm describes an approach that employs only 3 seconds of data. This algorithm is a Metropolis-Hastings method that incorporates a particle filter for approximation of likelihoods. For evaluation of the algorithms, two distinct sets of parameters are considered, leading to varying characteristics of the complex reflectivity. The two algorithms that require 300 seconds of data are ex- ecuted ten times for each parameter set. Evidently, the algorithm designed for estimating B generates values that closely aligns with the true values while the algorithm designed for estimating A and α does not yield as satisfactory results. Due to time constraints and the computational demands of the simulations, the last algorithm, requiring 3 seconds of data, is executed only twice for each parameter set. Remarkably, this algorithm generates estimates that agree with the true values, indicating strong performance. Nonetheless, additional simulations are required to conclusively confirm its robustness. To conclude, it is possible to estimate sea clutter parameters within Field’s model by using the applied methods of Bayesian inference. However, it is important to analyze the applicability of these methods for a large quantity of diverse clutter data. Moreover, their computational demands pose challenges in real-world applications. Future research should address the need for more computation- ally efficient methods to overcome this challenge. / Radar som verkar till havs behöver hantera ett fenomen som kallas för sjöklutter, vilket är oönskade reflektioner från havsytan. Detta fenomen kan avsevärt försämra radarns förmåga att upptäcka långsamt rörliga mål. Det är därför viktigt att erhålla en förståelse för den statistik som beskriver sjökluttret. Denna förståelse är avgörande för utvecklingen av effektiva signalbehandlingsstrategier. Detta arbete fokuserar på den viktiga aspekten av att använda korrekta statistiska modeller för att beskriva sjöklutter. Specifikt undersöker detta arbete Field’s modell, som beskriver den komplexa reflektiviteten från havsytan med hjälp av tre stokastiska differentialekvationer. En ekvation beskriver radarmålarean (radar cross section) som är en Cox-Ingersoll-Ross-process, parametriserad av A och α. De andra ekvationerna beskriver speckle-processen som är en komplex Ornstein-Uhlenbeck-process, parametriserad av B. Syftet med denna uppsats är att utforska möjligheter för att estimera parametrarna A, α och B i Field’s modell genom tillämpning av Bayesiansk inferens. För att uppnå detta, används Metropolis-Hastings-algoritmer samt sekventiella Monte-Carlo- metoder. Klotterdatan som representeras av den komplexa reflektiviteten genereras med hjälp av Euler-Maruyma- och Milstein-scheman. Sammanlagt designas tre algoritmer för att estimera sjöklutter- parametrarna. Två algoritmer behöver 300 sekunder av data och är baserade på tidigare arbeten av C. Rousell [1]. Dessa algoritmer använder Metropolis-Hastings för att uppskata B, respektive A och α. Som indata till algoritmerna används estimatorer för Cox-Ingersoll-Ross-processen samt den reella delen av Ornstein-Uhlenbeck-processen. Den sista algoritmen beskriver istället ett tillvägagångssätt som endast kräver 3 sekunders data. Denna algoritm är en Metropolis-Hastings-algoritm som använder ett partikelfilter för approximering av likelihoods. För utvärdering av algoritmerna beaktas två olika parameteruppsättningar, vilka genererar olika komplexa reflektiviteter. De två algoritmerna som kräver 300 sekunder av data körs tio gånger för varje parameteruppsättning. Algoritmen designad för att uppskatta B genererar värden som är nära de sanna värdena medan algoritmen designad för att uppskatta A och α inte ger lika tillfredsställande resultat. På grund av tidsbrist och den långa simuleringstiden, körs den sista algoritmen, som kräver 3 sekunder av data, endast två gånger för varje parameteruppsättning. Anmärkningsvärt är att denna algoritm genererar uppskattningar som faktiskt stämmer ganska väl med de sanna värdena, vilket indikerar på stark prestanda. Dock krävs ytterligare simuleringar för att bekräfta detta. Sammanfattningsvis är det möjligt att uppskatta sjöklutterparametrarna i Field’s model med de Bayesianska inferensmetoderna som tillämpas i detta arbete. Det är dock viktigt att beakta hur användbara dessa metoder är för en variation av klotterdata. Dessutom innebär den långa beräkningstiden utmaningar i verkliga tillämpningar. Framtida studier bör adressera behovet av mer beräkningsmässigt effektiva metoder för att övervinna denna utmaning.
16

Exploring the optimal Transformation for Volatility

Volfson, Alexander 29 April 2010 (has links)
This paper explores the fit of a stochastic volatility model, in which the Box-Cox transformation of the squared volatility follows an autoregressive Gaussian distribution, to the continuously compounded daily returns of the Australian stock index. Estimation was difficult, and over-fitting likely, because more variables are present than data. We developed a revised model that held a couple of these variables fixed and then, further, a model which reduced the number of variables significantly by grouping trading days. A Metropolis-Hastings algorithm was used to simulate the joint density and derive estimated volatilities. Though autocorrelations were higher with a smaller Box-Cox transformation parameter, the fit of the distribution was much better.
17

Bayesian Logistic Regression Model with Integrated Multivariate Normal Approximation for Big Data

Fu, Shuting 28 April 2016 (has links)
The analysis of big data is of great interest today, and this comes with challenges of improving precision and efficiency in estimation and prediction. We study binary data with covariates from numerous small areas, where direct estimation is not reliable, and there is a need to borrow strength from the ensemble. This is generally done using Bayesian logistic regression, but because there are numerous small areas, the exact computation for the logistic regression model becomes challenging. Therefore, we develop an integrated multivariate normal approximation (IMNA) method for binary data with covariates within the Bayesian paradigm, and this procedure is assisted by the empirical logistic transform. Our main goal is to provide the theory of IMNA and to show that it is many times faster than the exact logistic regression method with almost the same accuracy. We apply the IMNA method to the health status binary data (excellent health or otherwise) from the Nepal Living Standards Survey with more than 60,000 households (small areas). We estimate the proportion of Nepalese in excellent health condition for each household. For these data IMNA gives estimates of the household proportions as precise as those from the logistic regression model and it is more than fifty times faster (20 seconds versus 1,066 seconds), and clearly this gain is transferable to bigger data problems.
18

Algoritmos para o encaixe de moldes com formato irregular em tecidos listrados

Alves, Andressa Schneider January 2016 (has links)
Esta tese tem como objetivo principal a proposição de solução para o problema do encaixe de moldes em tecidos listrados da indústria do vestuário. Os moldes são peças com formato irregular que devem ser dispostos sobre a matéria-prima, neste caso o tecido, para a etapa posterior de corte. No problema específico do encaixe em tecidos listrados, o local em que os moldes são posicionados no tecido deve garantir que, após a confecção da peça, as listras apresentem continuidade. Assim, a fundamentação teórica do trabalho abrange temas relacionados à moda e ao design do vestuário, como os tipos e padronagens de tecidos listrados, e as possibilidades de rotação e colocação dos moldes sobre tecidos listrados. Na fundamentação teórica também são abordados temas da pesquisa em otimização combinatória como: características dos problemas bidimensionais de corte e encaixe e algoritmos utilizados por diversos autores para solucionar o problema. Ainda na parte final da fundamentação teórica são descritos o método Cadeia de Markov Monte Carlo e o algoritmo de Metropolis-Hastings. Com base na pesquisa bibliográfica, foram propostos dois algoritmos distintos para lidar com o problema de encaixe de moldes em tecidos listrados: algoritmo com pré-processamento e algoritmo de busca do melhor encaixe utilizando o algoritmo de Metropolis-Hastings. Ambos foram implementados no software Riscare Listrado, que é uma continuidade do software Riscare para tecidos lisos desenvolvido em Alves (2010). Para testar o desempenho dos dois algoritmos foram utilizados seis problemas benchmarks da literatura e proposto um novo problema denominado de camisa masculina. Os problemas benchmarks da literatura foram propostos para matéria-prima lisa e o problema camisa masculina especificamente para tecidos listrados. Entre os dois algoritmos desenvolvidos, o algoritmo de busca do melhor encaixe apresentou resultados com melhores eficiências de utilização do tecido para todos os problemas propostos. Quando comparado aos melhores resultados publicados na literatura para matéria-prima lisa, o algoritmo de busca do melhor encaixe apresentou encaixes com eficiências inferiores, porém com resultados superiores ao recomendado pela literatura específica da área de moda para tecidos estampados. / This thesis proposes the solution for the packing problem of patterns on striped fabric in clothing industry. The patterns are pieces with irregular form that should be placed on raw material which is, in this case, the fabric. This fabric is cut after packing. In the specific problem of packing on striped fabric, the position that patterns are put in the fabric should ensure that, after the clothing sewing, the stripes should present continuity. Thus, the theoretical foundation of this project includes subjects about fashion and clothing design, such as types and rapports of striped fabric, and the possibilities of rotation and the correct place to put the patterns on striped fabric. In the theoretical foundation, there are also subjects about research in combinatorial optimization as: characteristics about bi-dimensional packing and cutting problems and algorithms used for several authors to solve the problem. In addition, the Markov Chain Monte Carlo method and the Metropolis-Hastings algorithm are described at end of theoretical foundation. Based on the bibliographic research, two different algorithms for the packing problem with striped fabric are proposed: algorithm with pre-processing step and algorithm of searching the best packing using the Metropolis-Hastings algorithm. Both algorithms are implemented in the Striped Riscare software, which is a continuity of Riscare software for clear fabrics developed in the Masters degree of the author. Both algorithms performances are tested with six literature benchmark problems and a new problem called “male shirt” is proposed here. The benchmark problems of literature were iniatially proposed for clear raw material and the male shirt problem, specifically for striped fabrics. Between the two developed algorithms, the algorithm of searching the best packing has shown better results with better efficiencies of the fabric usage for all the problems tested. When compared to the best results published in the literature for clear raw material, the algorithm of searching the best packing has shown packings with lower efficiencies. However, it showed results higher than recommended for the specific literature of fashion design for patterned fabrics.
19

Méthodes quasi-Monte Carlo et Monte Carlo : application aux calculs des estimateurs Lasso et Lasso bayésien / Monte Carlo and quasi-Monte Carlo methods : application to calculations the Lasso estimator and the Bayesian Lasso estimator

Ounaissi, Daoud 02 June 2016 (has links)
La thèse contient 6 chapitres. Le premier chapitre contient une introduction à la régression linéaire et aux problèmes Lasso et Lasso bayésien. Le chapitre 2 rappelle les algorithmes d’optimisation convexe et présente l’algorithme FISTA pour calculer l’estimateur Lasso. La statistique de la convergence de cet algorithme est aussi donnée dans ce chapitre en utilisant l’entropie et l’estimateur de Pitman-Yor. Le chapitre 3 est consacré à la comparaison des méthodes quasi-Monte Carlo et Monte Carlo dans les calculs numériques du Lasso bayésien. Il sort de cette comparaison que les points de Hammersely donne les meilleurs résultats. Le chapitre 4 donne une interprétation géométrique de la fonction de partition du Lasso bayésien et l’exprime en fonction de la fonction Gamma incomplète. Ceci nous a permis de donner un critère de convergence pour l’algorithme de Metropolis Hastings. Le chapitre 5 présente l’estimateur bayésien comme la loi limite d’une équation différentielle stochastique multivariée. Ceci nous a permis de calculer le Lasso bayésien en utilisant les schémas numériques semi implicite et explicite d’Euler et les méthodes de Monte Carlo, Monte Carlo à plusieurs couches (MLMC) et l’algorithme de Metropolis Hastings. La comparaison des coûts de calcul montre que le couple (schéma semi-implicite d’Euler, MLMC) gagne contre les autres couples (schéma, méthode). Finalement dans le chapitre 6 nous avons trouvé la vitesse de convergence du Lasso bayésien vers le Lasso lorsque le rapport signal/bruit est constant et le bruit tend vers 0. Ceci nous a permis de donner de nouveaux critères pour la convergence de l’algorithme de Metropolis Hastings. / The thesis contains 6 chapters. The first chapter contains an introduction to linear regression, the Lasso and the Bayesian Lasso problems. Chapter 2 recalls the convex optimization algorithms and presents the Fista algorithm for calculating the Lasso estimator. The properties of the convergence of this algorithm is also given in this chapter using the entropy estimator and Pitman-Yor estimator. Chapter 3 is devoted to comparison of Monte Carlo and quasi-Monte Carlo methods in numerical calculations of Bayesian Lasso. It comes out of this comparison that the Hammersely points give the best results. Chapter 4 gives a geometric interpretation of the partition function of the Bayesian lasso expressed as a function of the incomplete Gamma function. This allowed us to give a convergence criterion for the Metropolis Hastings algorithm. Chapter 5 presents the Bayesian estimator as the law limit a multivariate stochastic differential equation. This allowed us to calculate the Bayesian Lasso using numerical schemes semi-implicit and explicit Euler and methods of Monte Carlo, Monte Carlo multilevel (MLMC) and Metropolis Hastings algorithm. Comparing the calculation costs shows the couple (semi-implicit Euler scheme, MLMC) wins against the other couples (scheme method). Finally in chapter 6 we found the Lasso convergence rate of the Bayesian Lasso when the signal / noise ratio is constant and when the noise tends to 0. This allowed us to provide a new criteria for the convergence of the Metropolis algorithm Hastings.
20

Inférences sur l'histoire des populations à partir de leur diversité génétique : étude de séquences démographiques de type fondation-explosion

Calmet, Claire 16 December 2002 (has links) (PDF)
L'étude de la démographie dans une perspective historique participe à la compréhension des processus évolutifs. Les données de diversité génétique sont potentiellement informatives quant au passé démographique des populations: en effet, ce passé est enregistré avec perte d'information par les marqueurs moléculaires, par l'intermédiaire de leur histoire généalogique et mutationnelle. L'acquisition de données de diversité génétique est de plus en plus rapide et aisée, et concerne potentiellement n'importe quel organisme d'intérêt. D'où un effort dans la dernière décennie pour développer les outils statistiques permettant d'extraire l'information démographique des données de typage génétique.<br />La présente thèse propose une extension de la méthode d'inférence bayésienne développée en 1999 par M. Beaumont. Comme la méthode originale, (i) elle est basée sur le coalescent de Kingman avec variations d'effectif, (ii) elle utilise l'algorithme de Metropolis-Hastings pour échantillonner selon la loi a posteriori des paramètres d'intérêt et (iii) elle permet de traiter des données de typage à un ou plusieurs microsatellites indépendants. La version étendue généralise les modèles démographique et mutationnel supposés dans la méthode initiale: elle permet d'inférer les paramètres d'un modèle de fondation-explosion pour la population échantillonnée et d'un modèle mutationnel à deux phases, pour les marqueurs microsatellites typés. C'est la première fois qu'une méthode probabiliste exacte incorpore pour les microsatellites un modèle mutationnel autorisant des sauts.<br />Le modèle démographique et mutationnel est exploré. L'analyse de jeux de données simulés permet d'illustrer et de comparer la loi a posteriori des paramètres pour des scénarios historiques: par exemple une stabilité démographique, une croissance exponentielle et une fondation-explosion. Une typologie des lois a posteriori est proposée. Des recommandations sur l'effort de typage dans les études empiriques sont données: un unique marqueur microsatellite peut conduire à une loi a posteriori très structurée. Toutefois, les zones de forte densité a posteriori représentent des scénarios de différents types. 50 génomes haploides typés à 5 marqueurs microsatellites suffisent en revanche à détecter avec certitude (99% de la probabilité a posteriori) une histoire de fondation-explosion tranchée. Les conséquences de la violation des hypothèses du modèle démographique sont discutées, ainsi que les interactions entre processus et modèle mutationnel. En particulier, il est établi que le fait de supposer un processus mutationnel conforme au modèle SMM, alors que ce processus est de type TPM, peut générer un faux signal de déséquilibre génétique. La modélisation des sauts mutationnels permet de supprimer ce faux signal.<br />La méthode est succinctement appliquée à l'étude de deux histoires de fondation-explosion: l'introduction du chat Felis catus sur les îles Kerguelen et celle du surmulot Rattus norvegicus sur les îles du large de la Bretagne. Il est d'abord montré que la méthode fréquentiste développée par Cornuet et Luikart (1996) ne permet pas de détecter les fondations récentes et drastiques qu'ont connu ces populations. Cela est vraisemblablement dû à des effets contraires de la fondation et de l'explosion, sur les statistiques utilisées dans cette méthode.<br />La méthode bayésienne ne détecte pas non plus la fondation si l'on force une histoire démographique en marche d'escalier, pour la même raison. La fondation et l'explosion deviennent détectables si le modèle démographique les autorise. Toutefois, les dépendances entre les paramètres du modèle empêchent de les inférer marginalement avec précision. Toute information a priori sur un paramètre contraint fortement les valeurs des autres paramètres. Ce constat confirme le potentiel de populations d'histoire documentée pour l'estimation indirecte des paramètres d'un modèle de mutation des marqueurs.

Page generated in 0.1 seconds