Global ETD Search

91	High-dimensional VAR analysis of regional house prices in United States / Analýza regionálních cen nemovitostí ve Spojených státech pomocí vysokodimenzionálního VAR modelu Krčál, Adam January 2015 (has links) In this thesis the heterogeneity of regional real estate prices in United States is investigated. A high dimensional VAR model with additional exogenous predictors, originally introduced by \cite{fan11}, is adopted. In this framework, the common factor in regional house prices dynamics is explained by exogenous predictors and the spatial dependencies are captured by lagged house prices in other regions. For the purpose of estimation and variable selection under high-dimensional setting the concept of Penalized Least Squares (PLS) with different penalty functions (e.g. LASSO penalty) is studied in detail and implemented. Moreover, clustering methods are employed to identify subsets of statistical regions with similar house prices dynamics. It is demonstrated that these clusters are well geographically defined and contribute to a better interpretation of the VAR model. Next, we make use of the LASSO variable selection property in order to construct the impulse response functions and to simulate the prices behavior when a shock occurs. And last but not least, one-period-ahead forecasts from VAR model are compared to those from the Diffusion Index Factor Model by \cite{stock02}, a commonly used model for forecasts.
92	Model independent searches for New Physics using Machine Learning at the ATLAS experiment / Recherche de Nouvelle Physique indépendante d'un modèle en utilisant l’apprentissage automatique sur l’experience ATLAS Jimenez, Fabricio 16 September 2019 (has links) Nous abordons le problème de la recherche indépendante du modèle pour la Nouvelle Physique (NP), au Grand Collisionneur de Hadrons (LHC) en utilisant le détecteur ATLAS. Une attention particulière est accordée au développement et à la mise à l'essai de nouvelles techniques d'apprentissage automatique à cette fin. Le présent ouvrage présente trois résultats principaux. Tout d'abord, nous avons mis en place un système de surveillance automatique des signatures génériques au sein de TADA, un outil logiciel d'ATLAS. Nous avons exploré plus de 30 signatures au cours de la période de collecte des données de 2017 et aucune anomalie particulière n'a été observée par rapport aux simulations des processus du modèle standard. Deuxièmement, nous proposons une méthode collective de détection des anomalies pour les recherches de NP indépendantes du modèle au LHC. Nous proposons l'approche paramétrique qui utilise un algorithme d'apprentissage semi-supervisé. Cette approche utilise une probabilité pénalisée et est capable d'effectuer simultanément une sélection appropriée des variables et de détecter un comportement anormal collectif possible dans les données par rapport à un échantillon de fond donné. Troisièmement, nous présentons des études préliminaires sur la modélisation du bruit de fond et la détection de signaux génériques dans des spectres de masse invariants à l'aide de processus gaussiens (GPs) sans information préalable moyenne. Deux méthodes ont été testées dans deux ensembles de données : une procédure en deux étapes dans un ensemble de données tiré des simulations du modèle standard utilisé pour ATLAS General Search, dans le canal contenant deux jets à l'état final, et une procédure en trois étapes dans un ensemble de données simulées pour le signal (Z′) et le fond (modèle standard) dans la recherche de résonances dans le cas du spectre de masse invariant de paire supérieure. Notre étude est une première étape vers une méthode qui utilise les GPs comme outil de modélisation qui peut être appliqué à plusieurs signatures dans une configuration plus indépendante du modèle. / We address the problem of model-independent searches for New Physics (NP), at the Large Hadron Collider (LHC) using the ATLAS detector. Particular attention is paid to the development and testing of novel Machine Learning techniques for that purpose. The present work presents three main results. Firstly, we put in place a system for automatic generic signature monitoring within TADA, a software tool from ATLAS. We explored over 30 signatures in the data taking period of 2017 and no particular discrepancy was observed with respect to the Standard Model processes simulations. Secondly, we propose a collective anomaly detection method for model-independent searches for NP at the LHC. We propose the parametric approach that uses a semi-supervised learning algorithm. This approach uses penalized likelihood and is able to simultaneously perform appropriate variable selection and detect possible collective anomalous behavior in data with respect to a given background sample. Thirdly, we present preliminary studies on modeling background and detecting generic signals in invariant mass spectra using Gaussian processes (GPs) with no mean prior information. Two methods were tested in two datasets: a two-step procedure in a dataset taken from Standard Model simulations used for ATLAS General Search, in the channel containing two jets in the final state, and a three-step procedure from a simulated dataset for signal (Z′) and background (Standard Model) in the search for resonances in the top pair invariant mass spectrum case. Our study is a first step towards a method that takes advantage of GPs as a modeling tool that can be applied to several signatures in a more model independent setup. Grand collisionneur de hadrons ATLAS Large Hadron Collider Standard Model Beyond the Standard Model ATLAS New Physics Machine Learning Anomaly Detection Semi-supervised Penalized likelihood Gaussian Processes
93	Estimation et sélection pour les modèles additifs et application à la prévision de la consommation électrique / Estimation and selection in additive models and application to load demand forecasting Thouvenot, Vincent 17 December 2015 (has links) L'électricité ne se stockant pas aisément, EDF a besoin d'outils de prévision de consommation et de production efficaces. Le développement de nouvelles méthodes automatiques de sélection et d'estimation de modèles de prévision est nécessaire. En effet, grâce au développement de nouvelles technologies, EDF peut étudier les mailles locales du réseau électrique, ce qui amène à un nombre important de séries chronologiques à étudier. De plus, avec les changements d'habitude de consommation et la crise économique, la consommation électrique en France évolue. Pour cette prévision, nous adoptons ici une méthode semi-paramétrique à base de modèles additifs. L'objectif de ce travail est de présenter des procédures automatiques de sélection et d'estimation de composantes d'un modèle additif avec des estimateurs en plusieurs étapes. Nous utilisons du Group LASSO, qui est, sous certaines conditions, consistant en sélection, et des P-Splines, qui sont consistantes en estimation. Nos résultats théoriques de consistance en sélection et en estimation sont obtenus sans nécessiter l'hypothèse classique que les normes des composantes non nulles du modèle additif soient bornées par une constante non nulle. En effet, nous autorisons cette norme à pouvoir converger vers 0 à une certaine vitesse. Les procédures sont illustrées sur des applications pratiques de prévision de consommation électrique nationale et locale.Mots-clés: Group LASSO, Estimateurs en plusieurs étapes, Modèle Additif, Prévision de charge électrique, P-Splines, Sélection de variables / French electricity load forecasting encounters major changes since the past decade. These changes are, among others things, due to the opening of electricity market (and economical crisis), which asks development of new automatic time adaptive prediction methods. The advent of innovating technologies also needs the development of some automatic methods, because we have to study thousands or tens of thousands time series. We adopt for time prediction a semi-parametric approach based on additive models. We present an automatic procedure for covariate selection in a additive model. We combine Group LASSO, which is selection consistent, with P-Splines, which are estimation consistent. Our estimation and model selection results are valid without assuming that the norm of each of the true non-zero components is bounded away from zero and need only that the norms of non-zero components converge to zero at a certain rate. Real applications on local and agregate load forecasting are provided.Keywords: Additive Model, Group LASSO, Load Forecasting, Multi-stage estimator, P-Splines, Variables selection Statistique Modèle additif Méthode pénalisée Estimateurs en plusieurs étapes Prévision de consommation électrique Selection Statistic Additive model Penalized method Multi-Step estimator Electricity load forecasting Selection
94	Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data Kusiak, Caroline 25 October 2018 (has links) Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely underreported data. We compare penalized regression approaches to seasonal baseline models and illustrate that incorporation of search data can improve prediction error. This builds on previous research show- ing that search data and recent surveillance data together can be used to create accurate forecasts for diseases such as influenza and dengue fever. This work shows that even in settings where timely surveillance data is not available, using search data in real-time can produce more accurate short-term forecasts than a seasonal baseline prediction. However, forecast accuracy degrades the further into the future the forecasts go. The relative accuracy of these forecasts compared to a seasonal average forecast varies depending on location. Overall, these data and models can improve short-term public health situational awareness and should be incorporated into larger real-time forecasting efforts. Forecasting Dengue Penalized Google LASSO Prediction Applied Statistics Bioinformatics Biostatistics Disease Modeling Other Mathematics Statistical Methodology Statistical Models Survival Analysis Vital and Health Statistics
95	Prediction with Penalized Logistic Regression : An Application on COVID-19 Patient Gender based on Case Series Data Schwarz, Patrick January 2021 (has links) The aim of the study was to evaluate dierent types of logistic regression to find the optimal model to predict the gender of hospitalized COVID-19 patients. The models were based on COVID-19 case series data from Pakistan using a set of 18 explanatory variables out of which patient age and BMI were numerical and the rest were categorical variables, expressing symptoms and previous health issues. Compared were a logistic regression using all variables, a logistic regression that used stepwise variable selection with 4 explanatory variables, a logistic Ridge regression model, a logistic Lasso regression model and a logistic Elastic Net regression model. Based on several metrics assessing the goodness of fit of the models and the evaluation of predictive power using the area under the ROC curve the Elastic Net that was only using the Lasso penalty had the best result and was able to predict 82.5% of the test cases correctly. Covid-19 Logistic Regression Penalized Regression Ridge Regression Elastic Net Classification Predictive Modeling Statistical Modeling glmnet Probability Theory and Statistics Sannolikhetsteori och statistik
96	Identification de biomarqueurs prédictifs de la survie et de l'effet du traitement dans un contexte de données de grande dimension / Identification of biomarkers predicting the outcome and the treatment effect in presence of high-dimensional data Ternes, Nils 05 October 2016 (has links) Avec la révolution récente de la génomique et la médecine stratifiée, le développement de signatures moléculaires devient de plus en plus important pour prédire le pronostic (biomarqueurs pronostiques) ou l’effet d’un traitement (biomarqueurs prédictifs) de chaque patient. Cependant, la grande quantité d’information disponible rend la découverte de faux positifs de plus en plus fréquente dans la recherche biomédicale. La présence de données de grande dimension (nombre de biomarqueurs ≫ taille d’échantillon) soulève de nombreux défis statistiques tels que la non-identifiabilité des modèles, l’instabilité des biomarqueurs sélectionnés ou encore la multiplicité des tests.L’objectif de cette thèse a été de proposer et d’évaluer des méthodes statistiques pour l’identification de ces biomarqueurs et l’élaboration d’une prédiction individuelle des probabilités de survie pour des nouveaux patients à partir d’un modèle de régression de Cox. Pour l’identification de biomarqueurs en présence de données de grande dimension, la régression pénalisée lasso est très largement utilisée. Dans le cas de biomarqueurs pronostiques, une extension empirique de cette pénalisation a été proposée permettant d’être plus restrictif sur le choix du paramètre λ dans le but de sélectionner moins de faux positifs. Pour les biomarqueurs prédictifs, l’intérêt s’est porté sur les interactions entre le traitement et les biomarqueurs dans le contexte d’un essai clinique randomisé. Douze approches permettant de les identifier ont été évaluées telles que le lasso (standard, adaptatif, groupé ou encore ridge+lasso), le boosting, la réduction de dimension des effets propres et un modèle implémentant les effets pronostiques par bras. Enfin, à partir d’un modèle de prédiction pénalisé, différentes stratégies ont été évaluées pour obtenir une prédiction individuelle pour un nouveau patient accompagnée d’un intervalle de confiance, tout en évitant un éventuel surapprentissage du modèle. La performance des approches ont été évaluées au travers d’études de simulation proposant des scénarios nuls et alternatifs. Ces méthodes ont également été illustrées sur différents jeux de données, contenant des données d’expression de gènes dans le cancer du sein. / With the recent revolution in genomics and in stratified medicine, the development of molecular signatures is becoming more and more important for predicting the prognosis (prognostic biomarkers) and the treatment effect (predictive biomarkers) of each patient. However, the large quantity of information has rendered false positives more and more frequent in biomedical research. The high-dimensional space (i.e. number of biomarkers ≫ sample size) leads to several statistical challenges such as the identifiability of the models, the instability of the selected coefficients or the multiple testing issue.The aim of this thesis was to propose and evaluate statistical methods for the identification of these biomarkers and the individual predicted survival probability for new patients, in the context of the Cox regression model. For variable selection in a high-dimensional setting, the lasso penalty is commonly used. In the prognostic setting, an empirical extension of the lasso penalty has been proposed to be more stringent on the estimation of the tuning parameter λ in order to select less false positives. In the predictive setting, focus has been given to the biomarker-by-treatment interactions in the setting of a randomized clinical trial. Twelve approaches have been proposed for selecting these interactions such as lasso (standard, adaptive, grouped or ridge+lasso), boosting, dimension reduction of the main effects and a model incorporating arm-specific biomarker effects. Finally, several strategies were studied to obtain an individual survival prediction with a corresponding confidence interval for a future patient from a penalized regression model, while limiting the potential overfit.The performance of the approaches was evaluated through simulation studies combining null and alternative scenarios. The methods were also illustrated in several data sets containing gene expression data in breast cancer. Médecine stratifiée Données de grande dimension Régression pénalisée Biomarqueurs pronostiques Biomarqueurs prédictifs Prédiction individuelle Stratified medicine High-Dimensional data Penalized regression Prognostic biomarkers Predictive biomarkers Individual prediction
97	Approches nouvelles des modèles GARCH multivariés en grande dimension / New approaches for high-dimensional multivariate GARCH models Poignard, Benjamin 15 June 2017 (has links) Ce document traite du problème de la grande dimension dans des processus GARCH multivariés. L'auteur propose une nouvelle dynamique vine-GARCH pour des processus de corrélation paramétrisés par un graphe non dirigé appelé "vine". Cette approche génère directement des matrices définies-positives et encourage la parcimonie. Après avoir établi des résultats d'existence et d'unicité pour les solutions stationnaires du modèle vine-GARCH, l'auteur analyse les propriétés asymptotiques du modèle. Il propose ensuite un cadre général de M-estimateurs pénalisés pour des processus dépendants et se concentre sur les propriétés asymptotiques de l'estimateur "adaptive Sparse Group Lasso". La grande dimension est traitée en considérant le cas où le nombre de paramètres diverge avec la taille de l'échantillon. Les résultats asymptotiques sont illustrés par des expériences simulées. Enfin dans ce cadre l'auteur propose de générer la sparsité pour des dynamiques de matrices de variance covariance. Pour ce faire, la classe des modèles ARCH multivariés est utilisée et les processus correspondants à celle-ci sont estimés par moindres carrés ordinaires pénalisés. / This document contributes to high-dimensional statistics for multivariate GARCH processes. First, the author proposes a new dynamic called vine-GARCH for correlation processes parameterized by an undirected graph called vine. The proposed approach directly specifies positive definite matrices and fosters parsimony. The author provides results for the existence and uniqueness of stationary solution of the vine-GARCH model and studies its asymptotic properties. He then proposes a general framework for penalized M-estimators with dependent processes and focuses on the asymptotic properties of the adaptive Sparse Group Lasso regularizer. The high-dimensionality setting is studied when considering a diverging number of parameters with the sample size. The asymptotic properties are illustrated through simulation experiments. Finally, the author proposes to foster sparsity for multivariate variance covariance matrix processes within the latter framework. To do so, the multivariate ARCH family is considered and the corresponding parameterizations are estimated thanks to penalized ordinary least square procedures. Corrélations partielles Estimateur du QMV M-Estimateurs pénalisés Propriété oracle Stationnarité Vine régulière Oracle property Partial correlations Penalized M-Estimators QML estimator Regular vine Stationarity 519.5
98	Evaluating Time-varying Effect in Single-type and Multi-type Semi-parametric Recurrent Event Models Chen, Chen 11 December 2015 (has links) This dissertation aims to develop statistical methodologies for estimating the effects of time-fixed and time-varying factors in recurrent events modeling context. The research is motivated by the traffic safety research question of evaluating the influence of crash on driving risk and driver behavior. The methodologies developed, however, are general and can be applied to other fields. Four alternative approaches based on various data settings are elaborated and applied to 100-Car Naturalistic Driving Study in the following Chapters. Chapter 1 provides a general introduction and background of each method, with a sketch of 100-Car Naturalistic Driving Study. In Chapter 2, I assessed the impact of crash on driving behavior by comparing the frequency of distraction events in per-defined windows. A count-based approach based on mixed-effect binomial regression models was used. In Chapter 3, I introduced intensity-based recurrent event models by treating number of Safety Critical Incidents and Near Crash over time as a counting process. Recurrent event models fit the natural generation scheme of the data in this study. Four semi-parametric models are explored: Andersen-Gill model, Andersen-Gill model with stratified baseline functions, frailty model, and frailty model with stratified baseline functions. I derived model estimation procedure and and conducted model comparison via simulation and application. The recurrent event models in Chapter 3 are all based on proportional assumption, where effects are constant. However, the change of effects over time is often of primary interest. In Chapter 4, I developed time-varying coefficient model using penalized B-spline function to approximate varying coefficients. Shared frailty terms was used to incorporate correlation within subjects. Inference and statistical test are also provided. Frailty representation was proposed to link time-varying coefficient model with regular frailty model. In Chapter 5, I further extended framework to accommodate multi-type recurrent events with time-varying coefficient. Two types of recurrent-event models were developed. These models incorporate correlation among intensity functions from different type of events by correlated frailty terms. Chapter 6 gives a general review on the contributions of this dissertation and discussion of future research directions. / Ph. D. Frailty Model Generalized Linear Mixed Model Multi-type Recurrent Event Naturalistic Driving Study Penalized B-Spline Proportional Intensity Function Stratification Time-varying Coefficient Transportation Safety
99	Estimation and Inference in Special Nonparametric Models with Applications to Topics in Development Economics / Schätzung und Inferenz in speziellen nichtparametrischen Modellen mit Andwendungen in der Entwicklungsökonomie Wiesenfarth, Manuel 11 May 2012 (has links) No description available. 310 Statistik EGCG 080 Economics adaptive Glättung additive Modelle bayesianische P-splines Instrumentalvariablen Penalized Splines Simultane Konfidenzbänder Spezifikationstest Unterernährung adaptive smoothing additive model Bayesian P-splines instrumental variables penalized splines sample selection model simultaneous confidence bands specification test undernutrition 31.73 83.03 83.46
100	Advances on the Birnbaum-Saunders distribution / Avanços na distribuição Birnbaum-Saunders Nakamura, Luiz Ricardo 26 August 2016 (has links) The Birnbaum-Saunders (BS) distribution is the most popular model used to describe lifetime process under fatigue. Throughout the years, this distribution has received a wide ranging of applications, demanding some more flexible extensions to solve more complex problems. One of the most well-known extensions of the BS distribution is the generalized Birnbaum- Saunders (GBS) family of distributions that includes the Birnbaum-Saunders special-case (BSSC) and the Birnbaum-Saunders generalized t (BSGT) models as special cases. Although the BS-SC distribution was previously developed in the literature, it was never deeply studied and hence, in this thesis, we provide a full Bayesian study and develop a tool to generate random numbers from this distribution. Further, we develop a very flexible regression model, that admits different degrees of skewness and kurtosis, based on the BSGT distribution using the generalized additive models for location, scale and shape (GAMLSS) framework. We also introduce a new extension of the BS distribution called the Birnbaum-Saunders power (BSP) family of distributions, which contains several special or limiting cases already published in the literature, including the GBS family. The main feature of the new family is that it can produce both unimodal and bimodal shapes depending on its parameter values. We also introduce this new family of distributions into the GAMLSS framework, in order to model any or all the parameters of the distribution using parametric linear and/or nonparametric smooth functions of explanatory variables. Throughout this thesis we present five different applications in real data sets in order to illustrate the developed theoretical results. / A distribuição Birnbaum-Saunders (BS) é o modelo mais popular utilizado para descrever processos de fadiga. Ao longo dos anos, essa distribuição vem recebendo aplicações nas mais diversas áreas, demandando assim algumas extensões mais flexíveis para resolver problemas mais complexos. Uma das extensões mais conhecidas na literatura é a família de distribuições Birnbaum-Saunders generalizada (GBS), que inclui as distribuições Birnbaum-Saunders casoespecial (BS-SC) e Birnbaum-Saunders t generalizada (BSGT) como modelos especiais. Embora a distribuição BS-SC tenha sido previamente desenvolvida na literatura, nunca foi estudada mais profundamente e, assim, nesta tese, um estudo bayesiano é desenvolvido acerca da mesma além de um novo gerador de números aleatórios dessa distribuição ser apresentado. Adicionalmente, um modelo de regressão baseado na distribuição BSGT é desenvolvido utilizando-se os modelos aditivos generalizados para locação, escala e forma (GAMLSS), os quais apresentam grande flexibilidade tanto para a assimetria como para a curtose. Uma nova extensão da distribuição BS também é apresentada, denominada família de distribuições Birnbaum-Saunders potência (BSP), que contém inúmeros casos especiais ou limites já publicados na literatura, incluindo a família GBS. A principal característica desta nova família é que ela é capaz de produzir formas tanto uni como bimodais dependendo do valor de seus parâmetros. Esta nova família também é introduzida na estrutura dos modelos GAMLSS para fornecer uma ferramenta capaz de modelar todos os parâmetros da distribuição como funções lineares e/ou não-lineares suavizadas de variáveis explicativas. Ao longo desta tese são apresentadas cinco diferentes aplicações em conjuntos de dados reais para ilustrar os resultados teóricos obtidos. GAMLSS GAMLSS Generalized additive models Modelos aditivos generalizados Non-parametric regression Penalized splines R software Regressão não-paramétrica Software R Splines penalizados

Search results