Global ETD Search

41	Sélection de variables pour des processus ponctuels spatiaux / Feature selection for spatial point processes Choiruddin, Achmad 15 September 2017 (has links) Les applications récentes telles que les bases de données forestières impliquent des observations de données spatiales associées à l'observation de nombreuses covariables spatiales. Nous considérons dans cette thèse le problème de l'estimation d'une forme paramétrique de la fonction d'intensité dans un tel contexte. Cette thèse développe les procédures de sélection des variables et donne des garanties quant à leur validité. En particulier, nous proposons deux approches différentes pour la sélection de variables : les méthodes de type lasso et les procédures de type Sélecteur de Dantzig. Pour les méthodes envisageant les techniques de type lasso, nous dérivons les propriétés asymptotiques des estimations obtenues par les fontions d'estimation dérivées par les vraisemblances de la Poisson et de la régression logistique pénalisées par une grande classe de pénalités. Nous prouvons que les estimations obtenues par de ces procédures satisfont la consistance, sparsité et la normalité asymptotique. Pour la partie sélecteur de Dantzig, nous développons une version modifiée du sélecteur de Dantzig, que nous appelons le sélecteur Dantzig linéaire adaptatif (ALDS), pour obtenir les estimations d'intensité. Plus précisément, les estimations ALDS sont définies comme la solution à un problème d'optimisation qui minimise la somme des coefficients des estimations soumises à une approximation linéaire du vecteur score comme une contrainte. Nous constatons que les estimations obtenues par de ces méthodes ont des propriétés asymptotiques semblables à celles proposées précédemment à l'aide de méthode régularisation du lasso adaptatif. Nous étudions les aspects computationnels des méthodes développées en utilisant les procédures de type lasso et de type Sélector Dantzig. Nous établissons des liens entre l'estimation de l'intensité des processus ponctuels spatiaux et les modèles linéaires généralisés (GLM), donc nous n'avons qu'à traiter les procédures de la sélection des variables pour les GLM. Ainsi, des procédures de calcul plus faciles sont implémentées et un algorithme informatique rapide est proposé. Des études de simulation sont menées pour évaluer les performances des échantillons finis des estimations de chacune des deux approches proposées. Enfin, nos méthodes sont appliquées pour modéliser les emplacements spatiaux, une espèce d'arbre dans la forêt observée avec un grand nombre de facteurs environnementaux. / Recent applications such as forestry datasets involve the observations of spatial point pattern data combined with the observation of many spatial covariates. We consider in this thesis the problem of estimating a parametric form of the intensity function in such a context. This thesis develops feature selection procedures and gives some guarantees on their validity. In particular, we propose two different feature selection approaches: the lasso-type methods and the Dantzig selector-type procedures. For the methods considering lasso-type techniques, we derive asymptotic properties of the estimates obtained from estimating functions derived from Poisson and logistic regression likelihoods penalized by a large class of penalties. We prove that the estimates obtained from such procedures satisfy consistency, sparsity, and asymptotic normality. For the Dantzig selector part, we develop a modified version of the Dantzig selector, which we call the adaptive linearized Dantzig selector (ALDS), to obtain the intensity estimates. More precisely, the ALDS estimates are defined as the solution to an optimization problem which minimizes the sum of coefficients of the estimates subject to linear approximation of the score vector as a constraint. We find that the estimates obtained from such methods have asymptotic properties similar to the ones proposed previously using an adaptive lasso regularization term. We investigate the computational aspects of the methods developped using either lasso-type procedures or the Dantzig selector-type approaches. We make links between spatial point processes intensity estimation and generalized linear models (GLMs), so we only have to deal with feature selection procedures for GLMs. Thus, easier computational procedures are implemented and computationally fast algorithm are proposed. Simulation experiments are conducted to highlight the finite sample performances of the estimates from each of two proposed approaches. Finally, our methods are applied to model the spatial locations a species of tree in the forest observed with a large number of environmental factors. Théorème de Campbell Sélecteur de Dantzig Lasso Vraisemblance du Poisson Campbell theorem Dantzig selector Lasso Logistic regression likelihood Poisson likelihood 510
42	Contributions to Structured Variable Selection Towards Enhancing Model Interpretation and Computation Efficiency Shen, Sumin 07 February 2020 (has links) The advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques, such as the best subset selection and the Lasso, often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. Specifically, this thesis proposal consists of three parts: an additive heredity model with coefficients incorporating the multi-level data, a regularized dynamic generalized linear model with piecewise constant functional coefficients, and a structured variable selection method within the best subset selection framework. In Chapter 2, an additive heredity model is proposed for analyzing mixture-of-mixtures (MoM) experiments. The MoM experiment is different from the classical mixture experiment in that the mixture component in MoM experiments, known as the major component, is made up of sub-components, known as the minor components. The proposed model considers an additive structure to inherently connect the major components with the minor components. To enable a meaningful interpretation for the estimated model, we apply the hierarchical and heredity principles by using the nonnegative garrote technique for model selection. The performance of the additive heredity model was compared to several conventional methods in both unconstrained and constrained MoM experiments. The additive heredity model was then successfully applied in a real problem of optimizing the Pringlestextsuperscript{textregistered} potato crisp studied previously in the literature. In Chapter 3, we consider the dynamic effects of variables in the generalized linear model such as logistic regression. This work is motivated from the engineering problem with varying effects of process variables to product quality caused by equipment degradation. To address such challenge, we propose a penalized dynamic regression model which is flexible to estimate the dynamic coefficient structure. The proposed method considers modeling the functional coefficient parameter as piecewise constant functions. Specifically, under the penalized regression framework, the fused lasso penalty is adopted for detecting the changes in the dynamic coefficients. The group lasso penalty is applied to enable a sparse selection of variables. Moreover, an efficient parameter estimation algorithm is also developed based on alternating direction method of multipliers. The performance of the dynamic coefficient model is evaluated in numerical studies and three real-data examples. In Chapter 4, we develop a structured variable selection method within the best subset selection framework. In the literature, many techniques within the LASSO framework have been developed to address structured variable selection issues. However, less attention has been spent on structured best subset selection problems. In this work, we propose a sparse Ridge regression method to address structured variable selection issues. The key idea of the proposed method is to re-construct the regression matrix in the angle of experimental designs. We employ the estimation-maximization algorithm to formulate the best subset selection problem as an iterative linear integer optimization (LIO) problem. the mixed integer optimization algorithm as the selection step. We demonstrate the power of the proposed method in various structured variable selection problems. Moverover, the proposed method can be extended to the ridge penalized best subset selection problems. The performance of the proposed method is evaluated in numerical studies. / Doctor of Philosophy / The advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. The proposed approaches have been applied to real-world problems to demonstrate their model performance. Model Selection Nonnegative Garrote Method Mixture Experiments Dynamic Coefficient Fused Lasso Group Lasso Integer Optimization Expectation-Maximization (EM) LASSO.
43	Predictor Selection in Linear Regression: L1 regularization of a subset of parameters and Comparison of L1 regularization and stepwise selection Hu, Qing 11 May 2007 (has links) Background: Feature selection, also known as variable selection, is a technique that selects a subset from a large collection of possible predictors to improve the prediction accuracy in regression model. First objective of this project is to investigate in what data structure LASSO outperforms forward stepwise method. The second objective is to develop a feature selection method, Feature Selection by L1 Regularization of Subset of Parameters (LRSP), which selects the model by combining prior knowledge of inclusion of some covariates, if any, and the information collected from the data. Mathematically, LRSP minimizes the residual sum of squares subject to the sum of the absolute value of a subset of the coefficients being less than a constant. In this project, LRSP is compared with LASSO, Forward Selection, and Ordinary Least Squares to investigate their relative performance for different data structures. Results: simulation results indicate that for moderate number of small sized effects, forward selection outperforms LASSO in both prediction accuracy and the performance of variable selection when the variance of model error term is smaller, regardless of the correlations among the covariates; forward selection also works better in the performance of variable selection when the variance of error term is larger, but the correlations among the covariates are smaller. LRSP was shown to be an efficient method to deal with the problems when prior knowledge of inclusion of covariates is available, and it can also be applied to problems with nuisance parameters, such as linear discriminant analysis. L1 regularization Lasso Feature selection Covariate selection Regression analysis
44	Room Correction for Smart Speakers Mårtensson, Simon January 2019 (has links) Portable smart speakers with wireless connections have in recent years become more popular. These speakers are often moved to new locations and placed in different positions in different rooms, which affects the sound a listener is hearing from the speaker. These speakers usually have microphones on them, typically used for voice recording. This thesis aims to provide a way to compensate for the speaker position’s effect on the sound (so called room correction) using the microphones on the speaker and the speaker itself. Firstly, the room frequency response is estimated for several different speaker positions in a room. The room frequency response is the frequency response between the speaker and the listener. From these estimates, the relationship between the speaker’s position and the room frequency response is modeled. Secondly,an algorithm that estimates the speaker’s position is developed. The algorithm estimates the position by detecting reflections from nearby walls using the microphones on the speaker. The acquired position estimates are used as input for the room frequency response model, which makes it possible to automatically apply room correction when placing the speaker in new positions. The room correction is shown to correct the room frequency response so that the bass has the same power as the mid- and high frequency sounds from the speaker, which is according to the research aim. Also, the room correction is shown to make the room frequency response vary less with respect to the speaker’s position. Room Correction Smart Speakers Lasso Bass control Control Engineering Reglerteknik
45	Análise e comparação de alguns métodos alternativos de seleção de variáveis preditoras no modelo de regressão linear / Analysis and comparison of some alternative methods of selection of predictor variables in linear regression models. Marques, Matheus Augustus Pumputis 04 June 2018 (has links) Neste trabalho estudam-se alguns novos métodos de seleção de variáveis no contexto da regressão linear que surgiram nos últimos 15 anos, especificamente o LARS - Least Angle Regression, o NAMS - Noise Addition Model Selection, a Razão de Falsa Seleção - RFS (FSR em inglês), o LASSO Bayesiano e o Spike-and-Slab LASSO. A metodologia foi a análise e comparação dos métodos estudados e aplicações. Após esse estudo, realizam-se aplicações em bases de dados reais e um estudo de simulação, em que todos os métodos se mostraram promissores, com os métodos Bayesianos apresentando os melhores resultados. / In this work, some new variable selection methods that have appeared in the last 15 years in the context of linear regression are studied, specifically the LARS - Least Angle Regression, the NAMS - Noise Addition Model Selection, the False Selection Rate - FSR, the Bayesian LASSO and the Spike-and-Slab LASSO. The methodology was the analysis and comparison of the studied methods. After this study, applications to real data bases are made, as well as a simulation study, in which all methods are shown to be promising, with the Bayesian methods showing the best results. Bayesian LASSO FSR FSR LARS LARS LASSO Bayesiano Linear Models Linear Regression Model Selection Modelos lineares NAMS NAMS Regressão linear RFS Seleção de modelos Seleção de variáveis Spike-and-Slab LASSO Spike-and-Slab LASSO Variable Selection
46	Penalised regression for high-dimensional data : an empirical investigation and improvements via ensemble learning Wang, Fan January 2019 (has links) In a wide range of applications, datasets are generated for which the number of variables p exceeds the sample size n. Penalised likelihood methods are widely used to tackle regression problems in these high-dimensional settings. In this thesis, we carry out an extensive empirical comparison of the performance of popular penalised regression methods in high-dimensional settings and propose new methodology that uses ensemble learning to enhance the performance of these methods. The relative efficacy of different penalised regression methods in finite-sample settings remains incompletely understood. Through a large-scale simulation study, consisting of more than 1,800 data-generating scenarios, we systematically consider the influence of various factors (for example, sample size and sparsity) on method performance. We focus on three related goals --- prediction, variable selection and variable ranking --- and consider six widely used methods. The results are supported by a semi-synthetic data example. Our empirical results complement existing theory and provide a resource to compare performance across a range of settings and metrics. We then propose a new ensemble learning approach for improving the performance of penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse penalised regression approach as the ``base learner''. In simulations, we show that STRANDS typically improves upon its base learner, and demonstrate that taking account of the correlation structure in the first step can help to improve the efficiency with which the model space may be explored. We propose another ensemble learning method to improve the prediction performance of Ridge Regression in sparse settings. Specifically, we combine Bayesian Ridge Regression with a probabilistic forward selection procedure, where inclusion of a variable at each stage is probabilistically determined by a Bayes factor. We compare the prediction performance of the proposed method to penalised regression methods using simulated data.
47	Approaches to modelling functional time series with an application to electricity generation data Jin, Zehui January 2018 (has links) We study the half-hourly electricity generation by coal and by gas in the UK over a period of three years from 2012 to 2014. As a highly frequent time series, daily cycles along with seasonality and trend across days can be seen in the data for each fuel. Taylor (2003), Taylor et al. (2006), and Taylor (2008) studied time series of the similar features by introducing double seasonality into the methods for a single univariate time series. As we are interested in the continuous variation in the generation within a day, the half-hourly observations within a day are considered as a continuous function. In this way, a time series of half-hourly discrete observations is transformed into a time series of daily functions. The idea of a time series of functions can also seen in Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007). We improve their methods in a few ways. Firstly, we identify the systematic effect due to the factors that take effect in a long term, such as weather and prices of fuels, and the intrinsic differences between the days of the week. The systematic effect is modeled and removed before we study the day-by-day random variation in the functions. Secondly, we extend functional principal component analysis (PCA), which was applied on one group of functions in Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007), into partial common PCA, in order to consider the covariance structures of two groups of functions and their similarities. A test on the goodness of the approximation to the functions given by the common eigenfunctions is also proposed. The idea of bootstrapping residuals from the approximation seen in Shang (2014) is employed but is improved with non-overlapping blocks and moving blocks of residuals. Thirdly, we use a vector autoregressive (VAR) model, which is a multivariate approach, to model the scores on common eigenfunctions of a group such that the cross-correlation between the scores can be considered. We include Lasso penalties in the VAR model to select the significant covariates and refit the selection with ordinary least squares to reduce the bias. Our method is compared with the stepwise procedure by Pfaff (2007), and is proved to be less variable and more accurate on estimation and prediction. Finally, we propose the method to give the point forecasts of the daily functions. It is more complicated than the methods of Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007) as the systematic effect needs to be included. An adjustment interval is also given along with a point forecast, which represents the range within which the true function might vary. Our methods to give the point forecast and the adjustment interval include the information updating after the training period, which is not considered in the classical predicting equations of VAR and GARCH seen in Tsay (2013) and Engle and Bollerslev (1986). 510
48	Análise e comparação de alguns métodos alternativos de seleção de variáveis preditoras no modelo de regressão linear / Analysis and comparison of some alternative methods of selection of predictor variables in linear regression models. Matheus Augustus Pumputis Marques 04 June 2018 (has links) Neste trabalho estudam-se alguns novos métodos de seleção de variáveis no contexto da regressão linear que surgiram nos últimos 15 anos, especificamente o LARS - Least Angle Regression, o NAMS - Noise Addition Model Selection, a Razão de Falsa Seleção - RFS (FSR em inglês), o LASSO Bayesiano e o Spike-and-Slab LASSO. A metodologia foi a análise e comparação dos métodos estudados e aplicações. Após esse estudo, realizam-se aplicações em bases de dados reais e um estudo de simulação, em que todos os métodos se mostraram promissores, com os métodos Bayesianos apresentando os melhores resultados. / In this work, some new variable selection methods that have appeared in the last 15 years in the context of linear regression are studied, specifically the LARS - Least Angle Regression, the NAMS - Noise Addition Model Selection, the False Selection Rate - FSR, the Bayesian LASSO and the Spike-and-Slab LASSO. The methodology was the analysis and comparison of the studied methods. After this study, applications to real data bases are made, as well as a simulation study, in which all methods are shown to be promising, with the Bayesian methods showing the best results. FSR LARS LASSO Bayesiano Modelos lineares NAMS Regressão linear RFS Seleção de modelos Seleção de variáveis Spike-and-Slab LASSO Bayesian LASSO FSR LARS Linear Models Linear Regression Model Selection NAMS Spike-and-Slab LASSO Variable Selection
49	High-Dimensional Analysis of Convex Optimization-Based Massive MIMO Decoders Ben Atitallah, Ismail 04 1900 (has links) A wide range of modern large-scale systems relies on recovering a signal from noisy linear measurements. In many applications, the useful signal has inherent properties, such as sparsity, low-rankness, or boundedness, and making use of these properties and structures allow a more efficient recovery. Hence, a significant amount of work has been dedicated to developing and analyzing algorithms that can take advantage of the signal structure. Especially, since the advent of Compressed Sensing (CS) there has been significant progress towards this direction. Generally speaking, the signal structure can be harnessed by solving an appropriate regularized or constrained M-estimator. In modern Multi-input Multi-output (MIMO) communication systems, all transmitted signals are drawn from finite constellations and are thus bounded. Besides, most recent modulation schemes such as Generalized Space Shift Keying (GSSK) or Generalized Spatial Modulation (GSM) yield signals that are inherently sparse. In the recovery procedure, boundedness and sparsity can be promoted by using the ℓ1 norm regularization and by imposing an ℓ∞ norm constraint respectively. In this thesis, we propose novel optimization algorithms to recover certain classes of structured signals with emphasis on MIMO communication systems. The exact analysis permits a clear characterization of how well these systems perform. Also, it allows an automatic tuning of the parameters. In each context, we define the appropriate performance metrics and we analyze them exactly in the High Dimentional Regime (HDR). The framework we use for the analysis is based on Gaussian process inequalities; in particular, on a new strong and tight version of a classical comparison inequality (due to Gordon, 1988) in the presence of additional convexity assumptions. The new framework that emerged from this inequality is coined as Convex Gaussian Min-max Theorem (CGMT). MIMO Decoders RLS Box relaxation CGMT LASSO Error analysis
50	Regularization for Sparseness and Smoothness : Applications in System Identification and Signal Processing Ohlsson, Henrik January 2010 (has links) In system identification, the Akaike Information Criterion (AIC) is a well known method to balance the model fit against model complexity. Regularization here acts as a price on model complexity. In statistics and machine learning, regularization has gained popularity due to modeling methods such as Support Vector Machines (SVM), ridge regression and lasso. But also when using a Bayesian approach to modeling, regularization often implicitly shows up and can be associated with the prior knowledge. Regularization has also had a great impact on many applications, and very much so in clinical imaging. In e.g., breast cancer imaging, the number of sensors is physically restricted which leads to long scantimes. Regularization and sparsity can be used to reduce that. In Magnetic Resonance Imaging (MRI), the number of scans is physically limited and to obtain high resolution images, regularization plays an important role. Regularization shows-up in a variety of different situations and is a well known technique to handle ill-posed problems and to control for overfit. We focus on the use of regularization to obtain sparseness and smoothness and discuss novel developments relevant to system identification and signal processing. In regularization for sparsity a quantity is forced to contain elements equal to zero, or to be sparse. The quantity could e.g., be the regression parameter vectorof a linear regression model and regularization would then result in a tool for variable selection. Sparsity has had a huge impact on neighboring disciplines, such as machine learning and signal processing, but rather limited effect on system identification. One of the major contributions of this thesis is therefore the new developments in system identification using sparsity. In particular, a novel method for the estimation of segmented ARX models using regularization for sparsity is presented. A technique for piecewise-affine system identification is also elaborated on as well as several novel applications in signal processing. Another property that regularization can be used to impose is smoothness. To require the relation between regressors and predictions to be a smooth function is a way to control for overfit. We are here particularly interested in regression problems with regressors constrained to limited regions in the regressor-space e.g., a manifold. For this type of systems we develop a new regression technique, Weight Determination by Manifold Regularization (WDMR). WDMR is inspired byapplications in biology and developments in manifold learning and uses regularization for smoothness to obtain smooth estimates. The use of regularization for smoothness in linear system identification is also discussed. The thesis also presents a real-time functional Magnetic Resonance Imaging (fMRI) bio-feedback setup. The setup has served as proof of concept and been the foundation for several real-time fMRI studies. Regularization sparsity smothness lasso l1 fMRI bio-feedback TECHNOLOGY TEKNIKVETENSKAP

Search results