Global ETD Search

1	Estimation and Testing of Higher-Order Spatial Autoregressive Panel Data Error Component Models Badinger, Harald, Egger, Peter 10 1900 (has links) (PDF) This paper develops an estimator for higher-order spatial autoregressive panel data error component models with spatial autoregressive disturbances, SARAR(R,S). We derive the moment conditions and optimal weighting matrix without distributional assumptions for a generalized moments (GM) estimation procedure of the spatial autoregressive parameters of the disturbance process and define a generalized two-stage least squares estimator for the regression parameters of the model. We prove consistency of the proposed estimators, derive their joint asymptotic distribution, and provide Monte Carlo evidence on their small sample performance. JEL C13, C21, C23
2	Fixed Effects and Random Effects Estimation of Higher-Order Spatial Autoregressive Models with Spatial Autoregressive and Heteroskedastic Disturbances Badinger, Harald, Egger, Peter 04 1900 (has links) (PDF) This paper develops a unified framework for fixed and random effects estimation of higher-order spatial autoregressive panel data models with spatial autoregressive disturbances and heteroskedasticity of unknown form in the idiosyncratic error component. We derive the moment conditions and optimal weighting matrix without distributional assumptions for a generalized moments (GM) estimation procedure of the spatial autoregressive parameters of the disturbance process and define both a random effects and a fixed effects spatial generalized two-stage least squares estimator for the regression parameters of the model. We prove consistency of the proposed estimators and derive their joint asymptotic distribution, which is robust to heteroskedasticity of unknown form in the idiosyncratic error component. Finally, we derive a robust Hausman-test of the spatial random against the spatial fixed effects model. (authors' abstract) / Series: Department of Economics Working Paper Series JEL C13, C21, C23
3	Estimation par tests / Estimation via testing Sart, Mathieu 25 November 2013 (has links) Cette thèse porte sur l'estimation de fonctions à l'aide de tests dans trois cadres statistiques différents. Nous commençons par étudier le problème de l'estimation des intensités de processus de Poisson avec covariables. Nous démontrons un théorème général de sélection de modèles et en déduisons des bornes de risque non-asymptotiques sous des hypothèses variées sur la fonction à estimer. Nous estimons ensuite la densité de transition d'une chaîne de Markov homogène et proposons pour cela deux procédures. La première, basée sur la sélection d'estimateurs constants par morceaux, permet d'établir une inégalité de type oracle sous des hypothèses minimales sur la chaîne de Markov. Nous en déduisons des vitesses de convergence uniformes sur des boules d'espaces de Besov inhomogènes et montrons que l'estimateur est adaptatif par rapport à la régularité de la densité de transition. La performance de l'estimateur est aussi évalué en pratique grâce à des simulations numériques. La seconde procédure peut difficilement être implémenté en pratique mais permet d'obtenir un résultat général de sélection de modèles et d'en déduire des vitesses de convergence sous des hypothèses plus générales sur la densité de transition. Finalement, nous proposons un nouvel estimateur paramétrique d'une densité. Son risque est contrôlé sous des hypothèses pour lesquelles la méthode du maximum de vraisemblance peut ne pas fonctionner. Les simulations montrent que ces deux estimateurs sont très proches lorsque le modèle est vrai et suffisamment régulier. Il est cependant robuste, contrairement à l'estimateur du maximum de vraisemblance. / This thesis deals with the estimation of functions from tests in three statistical settings. We begin by studying the problem of estimating the intensities of Poisson processes with covariates. We prove a general model selection theorem from which we derive non-asymptotic risk bounds under various assumptions on the target function. We then propose two procedures to estimate the transition density of an homogeneous Markov chain. The first one selects an estimator among a collection of piecewise constant estimators. The selected estimator is shown to satisfy an oracle-type inequality under minimal assumptions on the Markov chain which allows us to deduce uniform rates of convergence over balls of inhomogeneous Besov spaces. Besides, the estimator is adaptive with respect to the smoothness of the transition density. We also evaluate the performance of the estimator in practice by carrying out numerical simulations. The second procedure is only of theoretical interest but yields a general model selection theorem from which we derive rates of convergence under more general assumptions on the transition density. Finally, we propose a new parametric estimator of a density. We upper-bound its risk under assumptions for which the maximum likelihood method may not work. The simulations show that these two estimators are very close when the model is true and regular enough. However, contrary to the maximum likelihood estimator, this estimator is robust. Estimation paramétrique Chaîne de Markov Sélection de modèles Statistiques non-asymptotiques Processus de Poisson Robustesse Sélection d'estimateurs T-estimateur Estimator selection Markov chain Model selection Non-asymptotic statistics Poisson processes T-estimator
4	Modèles de mélange et de Markov caché non-paramétriques : propriétés asymptotiques de la loi a posteriori et efficacité / Non Parametric Mixture Models and Hidden Markov Models : Asymptotic Behaviour of the Posterior Distribution and Efficiency Vernet, Elodie, Edith 15 November 2016 (has links) Les modèles latents sont très utilisés en pratique, comme en génomique, économétrie, reconnaissance de parole... Comme la modélisation paramétrique des densités d’émission, c’est-à-dire les lois d’une observation sachant l’état latent, peut conduire à de mauvais résultats en pratique, un récent intérêt pour les modèles latents non paramétriques est apparu dans les applications. Or ces modèles ont peu été étudiés en théorie. Dans cette thèse je me suis intéressée aux propriétés asymptotiques des estimateurs (dans le cas fréquentiste) et de la loi a posteriori (dans le cadre Bayésien) dans deux modèles latents particuliers : les modèles de Markov caché et les modèles de mélange. J’ai tout d’abord étudié la concentration de la loi a posteriori dans les modèles non paramétriques de Markov caché. Plus précisément, j’ai étudié la consistance puis la vitesse de concentration de la loi a posteriori. Enfin je me suis intéressée à l’estimation efficace du paramètre de mélange dans les modèles semi paramétriques de mélange. / Latent models have been widely used in diverse fields such as speech recognition, genomics, econometrics. Because parametric modeling of emission distributions, that is the distributions of an observation given the latent state, may lead to poor results in practice, in particular for clustering purposes, recent interest in using non parametric latent models appeared in applications. Yet little thoughts have been given to theory in this framework. During my PhD I have been interested in the asymptotic behaviour of estimators (in the frequentist case) and the posterior distribution (in the Bayesian case) in two particuliar non parametric latent models: hidden Markov models and mixture models. I have first studied the concentration of the posterior distribution in non parametric hidden Markov models. More precisely, I have considered posterior consistency and posterior concentration rates. Finally, I have been interested in efficient estimation of the mixture parameter in semi parametric mixture models. Statistiques asymptotiques Chaines de Markov cachés Modèle de mélange Statistique non paramétriques Statistiques Bayésienne Asymptotic statistics Hidden Markov model Mixture model Non parametrics Bayesian statistics
5	GENERAL-PURPOSE STATISTICAL INFERENCE WITH DIFFERENTIAL PRIVACY GUARANTEES Zhanyu Wang (13893375) 06 December 2023 (has links) <p dir="ltr">Differential privacy (DP) uses a probabilistic framework to measure the level of privacy protection of a mechanism that releases data analysis results to the public. Although DP is widely used by both government and industry, there is still a lack of research on statistical inference under DP guarantees. On the one hand, existing DP mechanisms mainly aim to extract dataset-level information instead of population-level information. On the other hand, DP mechanisms introduce calibrated noises into the released statistics, which often results in sampling distributions more complex and intractable than the non-private ones. This dissertation aims to provide general-purpose methods for statistical inference, such as confidence intervals (CIs) and hypothesis tests (HTs), that satisfy the DP guarantees. </p><p dir="ltr">In the first part of the dissertation, we examine a DP bootstrap procedure that releases multiple private bootstrap estimates to construct DP CIs. We present new DP guarantees for this procedure and propose to use deconvolution with DP bootstrap estimates to derive CIs for inference tasks such as population mean, logistic regression, and quantile regression. Our method achieves the nominal coverage level in both simulations and real-world experiments and offers the first approach to private inference for quantile regression.</p><p dir="ltr">In the second part of the dissertation, we propose to use the simulation-based ``repro sample'' approach to produce CIs and HTs based on DP statistics. Our methodology has finite-sample guarantees and can be applied to a wide variety of private inference problems. It appropriately accounts for biases introduced by DP mechanisms (such as by clamping) and improves over other state-of-the-art inference methods in terms of the coverage and type I error of the private inference. </p><p dir="ltr">In the third part of the dissertation, we design a debiased parametric bootstrap framework for DP statistical inference. We propose the adaptive indirect estimator, a novel simulation-based estimator that is consistent and corrects the clamping bias in the DP mechanisms. We also prove that our estimator has the optimal asymptotic variance among all well-behaved consistent estimators, and the parametric bootstrap results based on our estimator are consistent. Simulation studies show that our framework produces valid DP CIs and HTs in finite sample settings, and it is more efficient than other state-of-the-art methods.</p> Data and information privacy Applied statistics Statistical theory differential privacy confidence intervals hypothesis tests simulation-based inference asymptotic statistics Gaussian differential privacy resampling distribution-free inference indirect inference
6	Optimization tools for non-asymptotic statistics in exponential families Le Priol, Rémi 04 1900 (has links) Les familles exponentielles sont une classe de modèles omniprésente en statistique. D'une part, elle peut modéliser n'importe quel type de données. En fait la plupart des distributions communes en font partie : Gaussiennes, variables catégoriques, Poisson, Gamma, Wishart, Dirichlet. D'autre part elle est à la base des modèles linéaires généralisés (GLM), une classe de modèles fondamentale en apprentissage automatique. Enfin les mathématiques qui les sous-tendent sont souvent magnifiques, grâce à leur lien avec la dualité convexe et la transformée de Laplace. L'auteur de cette thèse a fréquemment été motivé par cette beauté. Dans cette thèse, nous faisons trois contributions à l'intersection de l'optimisation et des statistiques, qui tournent toutes autour de la famille exponentielle. La première contribution adapte et améliore un algorithme d'optimisation à variance réduite appelé ascension des coordonnées duales stochastique (SDCA), pour entraîner une classe particulière de GLM appelée champ aléatoire conditionnel (CRF). Les CRF sont un des piliers de la prédiction structurée. Les CRF étaient connus pour être difficiles à entraîner jusqu'à la découverte des technique d'optimisation à variance réduite. Notre version améliorée de SDCA obtient des performances favorables comparées à l'état de l'art antérieur et actuel. La deuxième contribution s'intéresse à la découverte causale. Les familles exponentielles sont fréquemment utilisées dans les modèles graphiques, et en particulier dans les modèles graphique causaux. Cette contribution mène l'enquête sur une conjecture spécifique qui a attiré l'attention dans de précédents travaux : les modèles causaux s'adaptent plus rapidement aux perturbations de l'environnement. Nos résultats, obtenus à partir de théorèmes d'optimisation, soutiennent cette hypothèse sous certaines conditions. Mais sous d'autre conditions, nos résultats contredisent cette hypothèse. Cela appelle à une précision de cette hypothèse, ou à une sophistication de notre notion de modèle causal. La troisième contribution s'intéresse à une propriété fondamentale des familles exponentielles. L'une des propriétés les plus séduisantes des familles exponentielles est la forme close de l'estimateur du maximum de vraisemblance (MLE), ou maximum a posteriori (MAP) pour un choix naturel de prior conjugué. Ces deux estimateurs sont utilisés presque partout, souvent sans même y penser. (Combien de fois calcule-t-on une moyenne et une variance pour des données en cloche sans penser au modèle Gaussien sous-jacent ?) Pourtant la littérature actuelle manque de résultats sur la convergence de ces modèles pour des tailles d'échantillons finis, lorsque l'on mesure la qualité de ces modèles avec la divergence de Kullback-Leibler (KL). Pourtant cette divergence est la mesure de différence standard en théorie de l'information. En établissant un parallèle avec l'optimisation, nous faisons quelques pas vers un tel résultat, et nous relevons quelques directions pouvant mener à des progrès, tant en statistiques qu'en optimisation. Ces trois contributions mettent des outil d'optimisation au service des statistiques dans les familles exponentielles : améliorer la vitesse d'apprentissage de GLM de prédiction structurée, caractériser la vitesse d'adaptation de modèles causaux, estimer la vitesse d'apprentissage de modèles omniprésents. En traçant des ponts entre statistiques et optimisation, cette thèse fait progresser notre maîtrise de méthodes fondamentales d'apprentissage automatique. / Exponential families are a ubiquitous class of models in statistics. On the one hand, they can model any data type. Actually, the most common distributions are exponential families: Gaussians, categorical, Poisson, Gamma, Wishart, or Dirichlet. On the other hand, they sit at the core of generalized linear models (GLM), a foundational class of models in machine learning. They are also supported by beautiful mathematics thanks to their connection with convex duality and the Laplace transform. This beauty is definitely responsible for the existence of this thesis. In this manuscript, we make three contributions at the intersection of optimization and statistics, all revolving around exponential families. The first contribution adapts and improves a variance reduction optimization algorithm called stochastic dual coordinate ascent (SDCA) to train a particular class of GLM called conditional random fields (CRF). CRF are one of the cornerstones of structured prediction. CRF were notoriously hard to train until the advent of variance reduction techniques, and our improved version of SDCA performs favorably compared to the previous state-of-the-art. The second contribution focuses on causal discovery. Exponential families are widely used in graphical models, and in particular in causal graphical models. This contribution investigates a specific conjecture that gained some traction in previous work: causal models adapt faster to perturbations of the environment. Using results from optimization, we find strong support for this assumption when the perturbation is coming from an intervention on a cause, and support against this assumption when perturbation is coming from an intervention on an effect. These pieces of evidence are calling for a refinement of the conjecture. The third contribution addresses a fundamental property of exponential families. One of the most appealing properties of exponential families is its closed-form maximum likelihood estimate (MLE) and maximum a posteriori (MAP) for a natural choice of conjugate prior. These two estimators are used almost everywhere, often unknowingly -- how often are mean and variance computed for bell-shaped data without thinking about the Gaussian model they underly? Nevertheless, literature to date lacks results on the finite sample convergence property of the information (Kulback-Leibler) divergence between these estimators and the true distribution. Drawing on a parallel with optimization, we take some steps towards such a result, and we highlight directions for progress both in statistics and optimization. These three contributions are all using tools from optimization at the service of statistics in exponential families: improving upon an algorithm to learn GLM, characterizing the adaptation speed of causal models, and estimating the learning speed of ubiquitous models. By tying together optimization and statistics, this thesis is taking a step towards a better understanding of the fundamentals of machine learning. Apprentissage automatique famille exponentielle divergence de Bregman statistiques non-asymptotiques taux de convergence dualité convexe optimisation stochastique réduction de variance prédiction structurée causalité Machine learning exponential families Bregman divergence non-asymptotic statistics sample complexity, convex duality stochastic optimization variance reduction structured prediction causality
7	Extremes of log-correlated random fields and the Riemann zeta function, and some asymptotic results for various estimators in statistics Ouimet, Frédéric 05 1900 (has links) No description available. extreme value theory Gaussian free field branching random walk inhomogeneous environment Riemann zeta function Gibbs measure Ghirlanda-Guerra identities ultrametricity large deviations asymptotic statistics complete monotonicity multinomial probabilities Bernstein estimators uniform law of large numbers Laplace distribution goodness-of-fit tests théorie des valeurs extrêmes champ libre gaussien marche aléatoire branchante environnements inhomogènes fonction zêta de Riemann mesure de Gibbs identités de Ghirlanda-Guerra ultramétricité grandes déviations statistique asymptotique monotonicité complète probabilités multinomiales estimateurs de Bernstein loi uniforme des grands nombres loi de Laplace tests d'ajustements probability probabilité statistics statistique champs log-corrélés log-correlated fields mathematics mathématiques Gaussian fields champs gaussiens

1

Page generated in 0.0768 seconds