Global ETD Search

11	Properties of tests for mis-specification in non-stationary autoregressions Sohkanen, Jouni S. January 2012 (has links) We are interested in the stochastic properties, individual and joint, of mis- specification testing when the data are generated by an autoregressive process. Good mis-specification tests are invariant to the dynamic properties of the pro- cess summarized by its characteristic roots, and to irrelevant misspecifications. Invariance in parameter space obviates inference prior to mis-specification test- ing. This is important as the latter is used to validate the former. Mutual independence of the tests allows calibration of the overall significance level. Es- tablishing such results requires work on individual tests and on their stochastic interactions. In Chapter 2, we derive the asymptotic distribution of two types of CUSUM of squares test, one implemented with standardized one-step-ahead OLS pre- diction errors and another implemented with OLS residuals. The latter is found to be valid in all but singular explosive cases, but the former only in purely non-explosive, or regular explosive cases with all roots in the explosive region of the parameter space; in Chapter 3, we show that a nuisance term arises in the mixed case. In Chapter 4, we derive numerically a finite sam- ple correction to render the tests implementable into software, and Chapter 1 contains two examples of applications. In Chapter 5, we consider inference on the parameters associated with the stationary part of the process, together with tests for a unit root, lag length, variance constancy, and normality of the regression innovations. In character- izing the joint distribution of these tests, we rely on asymptotic theory, and show independence in the limit. A simulation experiment suggests that finite sample correlations between some of the tests are statistically significant but small. Asymptotically, then, control of the overall significance level of the test procedure is feasible, and there is no reason to discount inference for the use of these mis-specification tests in model selection. 519.536
12	Μοντέλα παλινδρόμησης του Mincer καθώς και επεκτάσεις αυτών, για την εκτίμηση του εισοδήματος από απασχόληση στην Ελλάδα Καϊμάκη, Αθανασία 01 September 2010 (has links) - / - 519.536 Regression analysis
13	Model-based covariable decorrelation in linear regression (CorReg) : application to missing data and to steel industry / Décorrélation de covariables à base de modèles en régression linéaire (CorReg) : application aux données manquantes et à l’industrie sidérurgique Théry, Clément 08 July 2015 (has links) Les travaux effectués durant cette thèse ont pour but de pallier le problème des corrélations au sein des bases de données, particulièrement fréquentes dans le cadre industriel. Une modélisation explicite des corrélations par un système de sous-régressions entre covariables permet de pointer les sources des corrélations et d'isoler certaines variables redondantes. Il en découle une pré-sélection de variables sans perte significative d'information et avec un fort potentiel explicatif (la structure de sous-régression est explicite et simple). Un algorithme MCMC (Monte-Carlo Markov Chain) de recherche de structure de sous-régressions est proposé, basé sur un modèle génératif complet sur les données. Ce prétraitement ne dépend pas de la variable réponse et peut donc être utilisé de manière générale pour toute problématique de corrélations. Par la suite, un estimateur plug-in pour la régression linéaire est proposé pour ré-injecter l'information résiduelle de manière séquentielle sans souffrir des corrélations entre covariables. Enfin, le modèle génératif complet peut être utilisé pour gérer des valeurs manquantes dans les données. Cela permet l'imputation multiple des données manquantes, préalable à l'utilisation de méthodes classiques incompatibles avec la présence de valeurs manquantes. Le package R intitulé CorReg implémente les méthodes développées durant cette thèse. / This thesis was motivated by correlation issues in real datasets, in particular industrial datasets. The main idea stands in explicit modeling of the correlations between covariates by a structure of sub-regressions, that simply is a system of linear regressions between the covariates. It points out redundant covariates that can be deleted in a pre-selection step to improve matrix conditioning without significant loss of information and with strong explicative potential because this pre-selection is explained by the structure of sub-regressions, itself easy to interpret. An algorithm to find the sub-regressions structure inherent to the dataset is provided, based on a full generative model and using Monte-Carlo Markov Chain (MCMC) method. This pre-treatment does not depend on a response variable and thus can be used in a more general way with any correlated datasets. In a second part, a plug-in estimator is defined to get back the redundant covariates sequentially. Then all the covariates are used but the sequential approach acts as a protection against correlations. Finally, the generative model defined here allows, as a perspective, to manage missing values both during the MCMC and then for imputation. Then we are able to use classical methods that are not compatible with missing datasets. Once again, linear regression is used to illustrate the benefits of this method but it remains a pre-treatment that can be used in other contexts, like clustering and so on. The R package CorReg implements the methods created during this thesis. Modèles génératifs Variables redondantes 519.536
14	Étude du compromis précision statistique-temps de calcul / Study of the trade-off between statistic accuracy and computation time Brunin, Maxime 16 January 2018 (has links) Dans le contexte actuel, il est nécessaire de concevoir des algorithmes capables de traiter des données volumineuses en un minimum de temps de calcul. Par exemple, la programmation dynamique appliquée au problème de détection de ruptures ne permet pas de traiter rapidement des données ayant une taille d'échantillon supérieure à $10^{6}$. Les algorithmes itératifs fournissent une famille ordonnée d'estimateurs indexée par le nombre d'itérations. Dans cette thèse, nous avons étudié statistiquement cette famille d'estimateurs afin de sélectionner un estimateur ayant de bonnes performances statistiques et peu coûteux en temps de calcul. Pour cela, nous avons suivi l'approche utilisant les règles d'arrêt pour proposer un tel estimateur dans le cadre du problème de détection de ruptures dans la distribution et le problème de régression linéaire. Il est d'usage de faire un grand nombre d'itérations pour calculer un estimateur usuel. Une règle d'arrêt est l'itération à laquelle nous stoppons l'algorithme afin de limiter le phénomène de surapprentissage dont souffre ces estimateurs usuels. En stoppant l'algorithme plus tôt, les règles d'arrêt permettent aussi d'économiser du temps de calcul. Lorsque le budget de temps est limité, il se peut que nous n'ayons pas le temps d'itérer jusqu'à la règle d'arrêt. Dans ce contexte, nous avons étudié le choix optimal du nombre d'itérations et de la taille d'échantillon pour atteindre une précision statistique optimale. Des simulations ont mis en évidence un compromis entre le nombre d'itérations et la taille d'échantillon pour atteindre une précision statistique optimale à budget de temps limité. / In the current context, we need to develop algorithms which are able to treat voluminous data with a short computation time. For instance, the dynamic programming applied to the change-point detection problem in the distribution can not treat quickly data with a sample size greater than $10^{6}$. The iterative algorithms provide an ordered family of estimators indexed by the number of iterations. In this thesis, we have studied statistically this family of estimators in oder to select one of them with good statistics performance and a low computation cost. To this end, we have followed the approach using the stopping rules to suggest an estimator within the framework of the change-point detection problem in the distribution and the linear regression problem. We use to do a lot of iterations to compute an usual estimator. A stopping rule is the iteration to which we stop the algorithm in oder to limit overfitting whose some usual estimators suffer from. By stopping the algorithm earlier, the stopping rules enable also to save computation time. Under time constraint, we may have no time to iterate until the stopping rule. In this context, we have studied the optimal choice of the number of iterations and the sample size to reach an optimal accuracy. Simulations highlight the trade-off between the number of iterations and the sample size in order to reach an optimal accuracy under time constraint. Règle d’arrêt Estimateur sous contrainte de temps 519.536
15	Contribution à la classification de variables dans les modèles de régression en grande dimension / Contribution to variable clusteringin high dimensional linear regression models Yengo, Loïc 28 May 2014 (has links) Cette thèse propose une contribution originale au domaine de la classification de variables en régression linéaire. Cette contribution se base sur une modélisation hiérarchique des coefficients de régression. Cette modélisation permet de considérer ces derniers comme des variables aléatoires distribuées selon un mélange de lois Gaussiennes ayant des centres différents mais des variances égales. Nous montrons dans cette thèse que l'algorithme EM, communément utilisé pour estimer les paramètres d'un modèle hiérarchique ne peut s'appliquer. En effet, l'étape E de l'algorithme n'est pas explicite pour notre modèle.Nous avons donc proposé une approche plus efficace pour l'estimation des paramètres grâce à l'utilisation de l'algorithme SEM-Gibbs. En plus de cette amélioration computationnelle, nous avons introduit une contrainte dans le modèle pour permettre d'effectuer une sélection de variables simultanément. Notre modèle présente de très bonnes qualités prédictives relativement aux approches classiques pour la réduction de la dimension en régression linéaire. Cette thèse présente aussi une extension de notre méthodologie dans le cadre de la régression Probit pour données binaires. Notre modèle modèle a de plus été généralisé en relâchant l'hypothèse de l'égalité des variances pour les composantes du mélange Gaussien. Les performances de ce modèle généralisé ont été comparées à celles du modèle initial à travers différents scénarios de simulations. Ce travail de recherche a conduit au développement du package R clere. Ce dernier package met en œuvre tous les algorithmes décrits dans cette thèse. / We proposed in this thesis an original contribution to the field of variable clustering in linear regression through a model-based approach. This contribution was made via a hierarchical modeling of the regression coefficients as random variables drawn from a mixture of Gaussian distributions with equal variances. Parameter estimation in the proposed model was shown to be challenging since the classical EM algorithm could not apply. We then developped a more efficient algorithm for parameter estimation, through the use of the SEM-Gibbs algorithm. Along with this computational improvement, we also enhanced our model to allow variable selection. Given the good predictive performances of the CLERE method compared to standard techniques for dimension reduction, we considred an extension of the latter to binary response data. This extension was studied in the context of Probit regression. We generalized our model by relaxing the assumption of equal variance for the components in the mixture of Gaussians. The performances of this generalization were compared to those of the initial model under different scenarios on simulated data. This research led to the development of the R package clere which implements most of the algorithms described in this thesis. Modèles de mélanges gaussiens Régression binaire 519.536
16	Contribution à l’économétrie spatiale et l’analyse de données fonctionnelles / Contribution to spatial econometric and functional data analysis Gharbi, Zied 24 June 2019 (has links) Ce mémoire de thèse touche deux champs de recherche importants en statistique inférentielle, notamment l’économétrie spatiale et l’analyse de données fonctionnelles. Plus précisément, nous nous sommes intéressés à l’analyse de données réelles spatiales ou spatio-fonctionnelles en étendant certaines méthodes inférentielles pour prendre en compte une éventuelle dépendance spatiale. Nous avons d’abord considéré l’estimation d’un modèle autorégressif spatiale (SAR) ayant une variable dépendante fonctionnelle et une variable réponse réelle à l’aide d’observations sur une unité géographique donnée. Il s’agit d’un modèle de régression avec la spécificité que chaque observation de la variable indépendante collectée dans un emplacement géographique dépend d’observations de la même variable dans des emplacements voisins. Cette relation entre voisins est généralement mesurée par une matrice carrée nommée matrice de pondération spatiale et qui mesure l’effet d’interaction entre les unités spatiales voisines. Cette matrice est supposée exogène c’est-à-dire la métrique utilisée pour la construire ne dépend pas des mesures de variables explicatives du modèle. L’apport de cette thèse sur ce modèle réside dans le fait que la variable explicative est de nature fonctionnelle, à valeurs dans un espace de dimension infinie. Notre méthodologie d’estimation est basée sur une réduction de la dimension de la variable explicative fonctionnelle, par l’analyse en composantes principales fonctionnelles suivie d’une maximisation de la vraisemblance tronquée du modèle. Des propriétés asymptotiques des estimateurs, des illustrations des performances des estimateurs via une étude de Monte Carlo et une application à des données réelles environnementales ont été considérées. Dans la deuxième contribution, nous reprenons le modèle SAR fonctionnel étudié dans la première partie en considérant une structure endogène de la matrice de pondération spatiale. Au lieu de se baser sur un critère géographique pour calculer les dépendances entre localisations voisines, nous calculons ces dernières via un processus endogène, c’est-à-dire qui dépend des variables à expliquées. Nous appliquons la même approche d’estimation à deux étapes décrite ci-dessus, nous étudions aussi les performances de l’estimateur proposé pour des échantillons à taille finie et discutons le cadre asymptotique. Dans la troisième partie de cette contribution, nous nous intéressons à l’hétéroscédasticité dans les modèles partiellement linéaires pour variables exogènes réelles et variable réponse binaire. Nous proposons un modèle Probit spatial contenant une partie non-paramétrique. La dépendance spatiale est introduite au niveau des erreurs (perturbations) du modèle considéré. L’estimation des parties paramétrique et non paramétrique du modèle est récursive et consiste à fixer d’abord les composants paramétriques et à estimer la partie non paramétrique à l’aide de la méthode de vraisemblance pondérée puis utiliser cette dernière estimation pour construire un profil de la vraisemblance pour estimer la partie paramétrique. La performance de la méthode proposée est étudiée via une étude Monte Carlo. La contribution finit par une étude empirique sur la relation entre la croissance économique et la qualité environnementale en Suède à l’aide d’outils de l’économétrie spatiale. / This thesis covers two important fields of research in inferential statistics, namely spatial econometrics and functional data analysis. More precisely, we have focused on the analysis of real spatial or spatio-functional data by extending certain inferential methods to take into account a possible spatial dependence. We first considered the estimation of a spatial autoregressive model (SAR) with a functional dependent variable and a real response variable using observations on a given geographical unit. This is a regression model with the specificity that each observation of the independent variable collected in a geographical location depends on observations of the same variable in neighboring locations. This relationship between neighbors is generally measured by a square matrix called the spatial weighting matrix, which measures the interaction effect between neighboring spatial units. This matrix is assumed to be exogenous, i.e. the metric used to construct it does not depend on the explanatory variable. The contribution of this thesis to this model lies in the fact that the explanatory variable is of a functional nature, with values in a space of infinite dimension. Our estimation methodology is based on a dimension reduction of the functional explanatory variable through functional principal component analysis followed by maximization of the truncated likelihood of the model. Asymptotic properties of the estimators, illustrations of the performance of the estimators via a Monte Carlo study and an application to real environmental data were considered. In the second contribution, we use the functional SAR model studied in the first part by considering an endogenous structure of the spatial weighting matrix. Instead of using a geographical criterion to calculate the dependencies between neighboring locations, we calculate them via an endogenous process, i.e. one that depends on explanatory variables. We apply the same two-step estimation approach described above and study the performance of the proposed estimator for finite or infinite-tending samples. In the third part of this thesis we focus on heteroskedasticity in partially linear models for real exogenous variables and binary response variable. We propose a spatial Probit model containing a non-parametric part. Spatial dependence is introduced at the level of errors (perturbations) of the model considered. The estimation of the parametric and non-parametric parts of the model is recursive and consists of first setting the parametric parameters and estimating the non-parametric part using the weighted likelihood method and then using the latter estimate to construct a likelihood profile to estimate the parametric part. The performance of the proposed method is investigated via a Monte-Carlo study. An empirical study on the relationship between economic growth and environmental quality in Sweden using some spatial econometric tools finishes the document. Données fonctionnelles Matrice de pondération spatiale Estimateur à noyau Hétéroscédasticité spatiale 519.536
17	Méthodes quasi-Monte Carlo et Monte Carlo : application aux calculs des estimateurs Lasso et Lasso bayésien / Monte Carlo and quasi-Monte Carlo methods : application to calculations the Lasso estimator and the Bayesian Lasso estimator Ounaissi, Daoud 02 June 2016 (has links) La thèse contient 6 chapitres. Le premier chapitre contient une introduction à la régression linéaire et aux problèmes Lasso et Lasso bayésien. Le chapitre 2 rappelle les algorithmes d’optimisation convexe et présente l’algorithme FISTA pour calculer l’estimateur Lasso. La statistique de la convergence de cet algorithme est aussi donnée dans ce chapitre en utilisant l’entropie et l’estimateur de Pitman-Yor. Le chapitre 3 est consacré à la comparaison des méthodes quasi-Monte Carlo et Monte Carlo dans les calculs numériques du Lasso bayésien. Il sort de cette comparaison que les points de Hammersely donne les meilleurs résultats. Le chapitre 4 donne une interprétation géométrique de la fonction de partition du Lasso bayésien et l’exprime en fonction de la fonction Gamma incomplète. Ceci nous a permis de donner un critère de convergence pour l’algorithme de Metropolis Hastings. Le chapitre 5 présente l’estimateur bayésien comme la loi limite d’une équation différentielle stochastique multivariée. Ceci nous a permis de calculer le Lasso bayésien en utilisant les schémas numériques semi implicite et explicite d’Euler et les méthodes de Monte Carlo, Monte Carlo à plusieurs couches (MLMC) et l’algorithme de Metropolis Hastings. La comparaison des coûts de calcul montre que le couple (schéma semi-implicite d’Euler, MLMC) gagne contre les autres couples (schéma, méthode). Finalement dans le chapitre 6 nous avons trouvé la vitesse de convergence du Lasso bayésien vers le Lasso lorsque le rapport signal/bruit est constant et le bruit tend vers 0. Ceci nous a permis de donner de nouveaux critères pour la convergence de l’algorithme de Metropolis Hastings. / The thesis contains 6 chapters. The first chapter contains an introduction to linear regression, the Lasso and the Bayesian Lasso problems. Chapter 2 recalls the convex optimization algorithms and presents the Fista algorithm for calculating the Lasso estimator. The properties of the convergence of this algorithm is also given in this chapter using the entropy estimator and Pitman-Yor estimator. Chapter 3 is devoted to comparison of Monte Carlo and quasi-Monte Carlo methods in numerical calculations of Bayesian Lasso. It comes out of this comparison that the Hammersely points give the best results. Chapter 4 gives a geometric interpretation of the partition function of the Bayesian lasso expressed as a function of the incomplete Gamma function. This allowed us to give a convergence criterion for the Metropolis Hastings algorithm. Chapter 5 presents the Bayesian estimator as the law limit a multivariate stochastic differential equation. This allowed us to calculate the Bayesian Lasso using numerical schemes semi-implicit and explicit Euler and methods of Monte Carlo, Monte Carlo multilevel (MLMC) and Metropolis Hastings algorithm. Comparing the calculation costs shows the couple (semi-implicit Euler scheme, MLMC) wins against the other couples (scheme method). Finally in chapter 6 we found the Lasso convergence rate of the Bayesian Lasso when the signal / noise ratio is constant and when the noise tends to 0. This allowed us to provide a new criteria for the convergence of the Metropolis algorithm Hastings. Lasso et lasso bayésien Méthode quasi Monte-Carlo Algorithme de Metropolis Hastings Algorithme FISTA Schémas numériques 519.536
18	Intermittent demand forecasting with integer autoregressive moving average models Mohammadipour, Maryam January 2009 (has links) This PhD thesis focuses on using time series models for counts in modelling and forecasting a special type of count series called intermittent series. An intermittent series is a series of non-negative integer values with some zero values. Such series occur in many areas including inventory control of spare parts. Various methods have been developed for intermittent demand forecasting with Croston’s method being the most widely used. Some studies focus on finding a model underlying Croston’s method. With none of these studies being successful in demonstrating an underlying model for which Croston’s method is optimal, the focus should now shift towards stationary models for intermittent demand forecasting. This thesis explores the application of a class of models for count data called the Integer Autoregressive Moving Average (INARMA) models. INARMA models have had applications in different areas such as medical science and economics, but this is the first attempt to use such a model-based method to forecast intermittent demand. In this PhD research, we first fill some gaps in the INARMA literature by finding the unconditional variance and the autocorrelation function of the general INARMA(p,q) model. The conditional expected value of the aggregated process over lead time is also obtained to be used as a lead time forecast. The accuracy of h-step-ahead and lead time INARMA forecasts are then compared to those obtained by benchmark methods of Croston, Syntetos-Boylan Approximation (SBA) and Shale-Boylan-Johnston (SBJ). The results of the simulation suggest that in the presence of a high autocorrelation in data, INARMA yields much more accurate one-step ahead forecasts than benchmark methods. The degree of improvement increases for longer data histories. It has been shown that instead of identification of the autoregressive and moving average order of the INARMA model, the most general model among the possible models can be used for forecasting. This is especially useful for short history and high autocorrelation in data. The findings of the thesis have been tested on two real data sets: (i) Royal Air Force (RAF) demand history of 16,000 SKUs and (ii) 3,000 series of intermittent demand from the automotive industry. The results show that for sparse data with long history, there is a substantial improvement in using INARMA over the benchmarks in terms of Mean Square Error (MSE) and Mean Absolute Scaled Error (MASE) for the one-step ahead forecasts. However, for series with short history the improvement is narrower. The improvement is greater for h-step ahead forecasts. The results also confirm the superiority of INARMA over the benchmark methods for lead time forecasts. 519.536
19	Γραμμικά μοντέλα παλινδρόμησης και μοντέλα συσχέτισης Αθανασοπούλου, Ανδριάνα 12 June 2015 (has links) Τα μοντέλα παλινδρόμησης χρησιμοποιούνται ευρέως σήμερα στη διοίκηση των επιχειρήσεων, στην οικονομία, στη μηχανική, στην υγεία, τη βιολογία και τις κοινωνικές επιστήμες. Στη στατιστική, η ανάλυση παλινδρόμησης είναι μία στατιστική διαδικασία για την εκτίμηση των σχέσεων μεταξύ διαφόρων μεταβλητών. Περιέχει πολλές τεχνικές για τη μοντελοποίηση και την ανάλυση των μεταβλητών αυτών, ενώ επικεντρώνεται συνήθως στη σχέση μεταξύ μιας εξαρτημένης και μιας ή περισσοτέρων ανεξαρτήτων μεταβλητών. Η παρούσα εργασία επιδιώκει να παρουσιάσει το θεωρητικό πλαίσιο της ανάλυσης παλινδρόμησης, ξεκινώντας από το απλό μοντέλο και επεκτείνοντας την ανάλυση στο πολλαπλό, για να καταλήξει και να επικεντρωθεί στα μοντέλα συσχέτισης και συγκεκριμένα στους συντελεστές συσχέτισης και στους ελέγχους υποθέσεων αυτών. / Correlation models are widely used in social sciences biology and engineering. In this dissertation we present the theoretical framework of regression analysis and correlation models and finally we present results in real problems and applications. Μοντέλα συσχέτισης 519.536 Regression models Borrelation models Correlation coefficient
20	Θεωρητικό υπόβαθρο & εφαρμοσμένη ανάλυση παλινδρόμησης (Οικονομετρία) με χρήση SPSS 12.0 Πλαγιανάκος, Κυριάκος 20 October 2010 (has links) - / - Προγράμματα (SPSS) 519.536 Regression analysis Computers Programs (SPSS)

Search results