Global ETD Search

101	Statistical analysis of L1-penalized linear estimation with applications Ávila Pires, Bernardo Unknown Date No description available. linear estimation linear regression reinforcement learning Lasso excess risk machine learning
102	Machine Learning Techniques for Large-Scale System Modeling Lv, Jiaqing 31 August 2011 (has links) This thesis is about some issues in system modeling: The first is a parsimonious representation of MISO Hammerstein system, which is by projecting the multivariate linear function into a univariate input function space. This leads to the so-called semiparamtric Hammerstein model, which overcomes the commonly known “Curse of dimensionality” for nonparametric estimation on MISO systems. The second issue discussed in this thesis is orthogonal expansion analysis on a univariate Hammerstein model and hypothesis testing for the structure of the nonlinear subsystem. The generalization of this technique can be used to test the validity for parametric assumptions of the nonlinear function in Hammersteim models. It can also be applied to approximate a general nonlinear function by a certain class of parametric function in the Hammerstein models. These techniques can also be extended to other block-oriented systems, e.g, Wiener systems, with slight modification. The third issue in this thesis is applying machine learning and system modeling techniques to transient stability studies in power engineering. The simultaneous variable section and estimation lead to a substantially reduced complexity and yet possesses a stronger prediction power than techniques known in the power engineering literature so far. nonparametric estimation semiparametric MISO Hammerstein model curse of dimensionality model selection Lasso transient stability boundary machine learning
103	Topics on Regularization of Parameters in Multivariate Linear Regression Chen, Lianfu 2011 December 1900 (has links) My dissertation mainly focuses on the regularization of parameters in the multivariate linear regression under different assumptions on the distribution of the errors. It consists of two topics where we develop iterative procedures to construct sparse estimators for both the regression coefficient and scale matrices simultaneously, and a third topic where we develop a method for testing if the skewness parameter in the skew-normal distribution is parallel to one of the eigenvectors of the scale matrix. In the first project, we propose a robust procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for the correlations of the response variables. Robustness to outliers is achieved using heavy-tailed t distributions for the multivariate response, and shrinkage is introduced by adding to the negative log-likelihood l1 penalties on the entries of both the regression coefficient matrix and the precision matrix of the responses. Taking advantage of the hierarchical representation of a multivariate t distribution as the scale mixture of normal distributions and the EM algorithm, the optimization problem is solved iteratively where at each EM iteration suitably modified multivariate regression with covariance estimation (MRCE) algorithms proposed by Rothman, Levina and Zhu are used. We propose two new optimization algorithms for the penalized likelihood, called MRCEI and MRCEII, which differ from MRCE in the way that the tuning parameters for the two matrices are selected. Estimating the degrees of freedom when penalizing the entries of the matrices presents new computational challenges. A simulation study and real data analysis demonstrate that the MRCEII, which selects the tuning parameter of the precision matrix of the multiple responses using the Cp criterion, generally does the best among all methods considered in terms of the prediction error, and MRCEI outperforms the MRCE methods when the regression coefficient matrix is less sparse. The second project is motivated by the existence of the skewness in the data for which the symmetric distribution assumption on the errors does not hold. We extend the procedure we have proposed to the case where the errors in the multivariate linear regression follow a multivariate skew-normal or skew-t distribution. Based on the convenient representation of skew-normal and skew-t as well as the EM algorithm, we develop an optimization algorithm, called MRST, to iteratively minimize the negative penalized log-likelihood. We also carry out a simulation study to assess the performance of the method and illustrate its application with one real data example. In the third project, we discuss the asymptotic distributions of the eigenvalues and eigenvectors for the MLE of the scale matrix in a multivariate skew-normal distribution. We propose a statistic for testing whether the skewness vector is proportional to one of the eigenvectors of the scale matrix based on the likelihood ratio. Under the alternative, the likelihood is maximized numerically with two different ways of parametrization for the scale matrix: Modified Cholesky Decomposition (MCD) and Givens Angle. We conduct a simulation study and show that the statistic obtained using Givens Angle parametrization performs well and is more reliable than that obtained using MCD. Eigenvector EM algorithm Lasso regression Regularization.
104	Machine Learning Techniques for Large-Scale System Modeling Lv, Jiaqing 31 August 2011 (has links) This thesis is about some issues in system modeling: The first is a parsimonious representation of MISO Hammerstein system, which is by projecting the multivariate linear function into a univariate input function space. This leads to the so-called semiparamtric Hammerstein model, which overcomes the commonly known “Curse of dimensionality” for nonparametric estimation on MISO systems. The second issue discussed in this thesis is orthogonal expansion analysis on a univariate Hammerstein model and hypothesis testing for the structure of the nonlinear subsystem. The generalization of this technique can be used to test the validity for parametric assumptions of the nonlinear function in Hammersteim models. It can also be applied to approximate a general nonlinear function by a certain class of parametric function in the Hammerstein models. These techniques can also be extended to other block-oriented systems, e.g, Wiener systems, with slight modification. The third issue in this thesis is applying machine learning and system modeling techniques to transient stability studies in power engineering. The simultaneous variable section and estimation lead to a substantially reduced complexity and yet possesses a stronger prediction power than techniques known in the power engineering literature so far. nonparametric estimation semiparametric MISO Hammerstein model curse of dimensionality model selection Lasso transient stability boundary machine learning
105	Sparse Learning Package with Stability Selection and Application to Alzheimer's Disease January 2011 (has links) abstract: Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights. / Dissertation/Thesis / M.S. Computer Science 2011 Computer Science Statistics Mathematics Data Mining Lasso Machine Learning Sparse Learning
106	Uma nova aplicação para o método Lasso : index tracking no mercado brasileiro Monteiro, Lucas da Silva January 2017 (has links) Diante das evidências registradas na literatura de que, de forma geral, os fundos ativos não têm sido bem-sucedidos na tarefa de bater seus benchmarks, os fundos passivos – que buscam reproduzir as características de risco e retorno de um índice de mercado definido – vem ganhando espaço como alternativa de investimento na carteira dos investidores. A estratégia de reproduzir um índice é chamada de index tracking. Nesse sentido, o objetivo deste trabalho consiste em introduzir a técnica LASSO como método endógeno de seleção e otimização de ativos para a execução de um index tracking no mercado brasileiro e compara-lo com a execução de um index tracking pela técnica de seleção por participação dos ativos no índice de referência (otimizado por cointegração). A utilização da técnica LASSO, tal como proposta, constitui uma novidade na aplicação para o mercado financeiro brasileiro. Os testes comparativos foram realizados com as ações do índice Ibovespa entre os anos de 2010 e 2016. Sabendo das limitações relativas ao período de análise, os resultados sugerem, entre outros pontos, que o método LASSO gera tracking errors mais voláteis do que o método ad hoc tradicional e, dessa forma, gera menor aderência da carteira de réplica ao benchmark ao longo do tempo. / Given the evidence in the literature that, in general, the active funds have not been successful in the task of hitting their benchmarks, the passive funds - which seek to reproduce the risk and return characteristics of a defined market index - come gaining space as an investment alternative in the investor portfolio. The strategy of reproducing an index is called index tracking. In this sense, the objective of this study is to introduce the LASSO technique as an endogenous method of selection and optimization of assets for the execution of an index tracking in the Brazilian market and to compare it with the performance of an index tracking by the technique of selection by participation in benchmark index (optimized by cointegration). The LASSO technique, as proposed, is innovative as application to the Brazilian financial market. The comparative tests were carried out with the stocks of the Ibovespa index between 2010 and 2016. Regarding the limitations related to the analysis period, the results suggest, among others, that the LASSO method generates more volatile tracking than the traditional ad hoc proceding, and thus, generates a portfolio that is less adhered to the benchmark over time. Mercado de capitais Ativos financeiros Métodos de regressão Capital markets Passive strategy LASSO Optimization Index tracking
107	Penalizações tipo lasso na seleção de covariáveis em séries temporais Konzen, Evandro January 2014 (has links) Este trabalho aplica algumas formas de penalização tipo LASSO aos coeficientes para reduzir a dimensionalidade do espaço paramétrico em séries temporais, no intuito de melhorar as previsões fora da amostra. Particularmente, o método denominado aqui como WLadaLASSO atribui diferentes pesos para cada coeficiente e para cada defasagem. Nas implementações de Monte Carlo deste trabalho, quando comparado a outros métodos de encolhimento do conjunto de coeficientes, essencialmente nos casos de pequenas amostras, o WLadaLASSO mostra superioridade na seleção das covariáveis, na estimação dos parâmetros e nas previsões. Uma aplicação a séries macroeconômicas brasileiras também mostra que tal abordagem apresenta a melhor performance de previsão do PIB brasileiro comparada a outras abordagens. / This dissertation applies some forms of LASSO-type penalty on the coefficients to reduce the dimensionality of the parameter space in time series, in order to improve the out-of-sample forecasting. Particularly, the method named here as WLadaLASSO assigns different weights to each coefficient and lag period. In Monte Carlo implementations in this study, when compared to other shrinkage methods, essentially for small samples, the WLadaLASSO shows superiority in the covariable selection, in the parameter estimation and in forecasting. An application to Brazilian macroeconomic series also shows that this approach has the best forecasting performance of the Brazilian GDP compared to other approaches. Series temporais Modelo de previsão Modelo econométrico Time series LASSO AdaLASSO Variable selection Forecasting
108	An Exploration of Statistical Modelling Methods on Simulation Data Case Study: Biomechanical Predator–Prey Simulations January 2018 (has links) abstract: Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”. As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics. / Dissertation/Thesis / Masters Thesis Industrial Engineering 2018 Biostatistics Biomechanics Ecology Classification Trees Data Science LASSO Logistic Regression Simulation Variable Selection
109	En analys av bränslefraktioners påverkan på ett kraftverks emissioner Wilhelmsson, Kasper, Kroon, Ludvig January 2018 (has links) Tekniska verken är en regional koncern som verkar inom många områden. Den här rapporten specificerar sig på avfallshantering och emissionerna av dessa. Tekniska verken har som mål att bli så miljövänliga som möjligt och med hjälp av denna rapport få en bättre insikt i vilka avfall som är bättre och sämre för miljön. Rapporten använder statistiska metoder för att visa vilka avfall eller bränslen som ger upphov till höga eller låga halter av farliga emissioner samt vilka av dem som har högt respektive lågt energiinnehåll. Metoder som används är Lasso-regression och korsvalidering för variabelselektion. Multipel linjär regression används för tolkning och förklaringsgrad. För kontroll av extremvärden och autokorrelation har Cook ́s distance respektive Durbin-Watson test används. Ett av resultaten som metoderna genererar är att den importerade bränslefraktionen RDFBAL ger upphov till höga vätekloridvärden. Under våren genomfördes en revision, alltså en medveten nedstängning där kraftverket renades och reparerades. Det visar sig att detta påverkar emissionerna både positivt och negativt. / Tekniska verken is a regional corporation involved in many areas. This report has focused on waste management and their emissions. Tekniska verken has as a goal of becoming as environmentally friendly as possible and with the help of this report aim to get better insight in which waste that is better and worse for the environment. The report wishes to show which fuels that produces high or low emissions of hazardous gases and which of those have high or low energy content respectively. Methods used for this purpose are Lasso-regression, cross-validation and multiple linear regression for interpretation and explanation. To control outliers and autocorrelation, Cook’sdistance respectively Durbin-Watson test have been used. One of the results generated by the methods is that the imported fuel fraction “RDFBAL”causes high hydrogen chloride emissions. During the spring, a revision is carried out, that is an intentional shutdown where the power plant is cleaned and repaired. It turned out that this impacted emissions both positively and negatively. Probability Theory and Statistics Sannolikhetsteori och statistik
110	Uma nova aplicação para o método Lasso : index tracking no mercado brasileiro Monteiro, Lucas da Silva January 2017 (has links) Diante das evidências registradas na literatura de que, de forma geral, os fundos ativos não têm sido bem-sucedidos na tarefa de bater seus benchmarks, os fundos passivos – que buscam reproduzir as características de risco e retorno de um índice de mercado definido – vem ganhando espaço como alternativa de investimento na carteira dos investidores. A estratégia de reproduzir um índice é chamada de index tracking. Nesse sentido, o objetivo deste trabalho consiste em introduzir a técnica LASSO como método endógeno de seleção e otimização de ativos para a execução de um index tracking no mercado brasileiro e compara-lo com a execução de um index tracking pela técnica de seleção por participação dos ativos no índice de referência (otimizado por cointegração). A utilização da técnica LASSO, tal como proposta, constitui uma novidade na aplicação para o mercado financeiro brasileiro. Os testes comparativos foram realizados com as ações do índice Ibovespa entre os anos de 2010 e 2016. Sabendo das limitações relativas ao período de análise, os resultados sugerem, entre outros pontos, que o método LASSO gera tracking errors mais voláteis do que o método ad hoc tradicional e, dessa forma, gera menor aderência da carteira de réplica ao benchmark ao longo do tempo. / Given the evidence in the literature that, in general, the active funds have not been successful in the task of hitting their benchmarks, the passive funds - which seek to reproduce the risk and return characteristics of a defined market index - come gaining space as an investment alternative in the investor portfolio. The strategy of reproducing an index is called index tracking. In this sense, the objective of this study is to introduce the LASSO technique as an endogenous method of selection and optimization of assets for the execution of an index tracking in the Brazilian market and to compare it with the performance of an index tracking by the technique of selection by participation in benchmark index (optimized by cointegration). The LASSO technique, as proposed, is innovative as application to the Brazilian financial market. The comparative tests were carried out with the stocks of the Ibovespa index between 2010 and 2016. Regarding the limitations related to the analysis period, the results suggest, among others, that the LASSO method generates more volatile tracking than the traditional ad hoc proceding, and thus, generates a portfolio that is less adhered to the benchmark over time. Mercado de capitais Ativos financeiros Métodos de regressão Capital markets Passive strategy LASSO Optimization Index tracking

Search results