Global ETD Search

51	Comparação de métodos de estimação para problemas com colinearidade e/ou alta dimensionalidade (p > n ) / Comparison of estimation methods for problems with collinear and/or high dimensionality (p > n) Casagrande, Marcelo Henrique 29 April 2016 (has links) Este trabalho apresenta um estudo comparativo do poder de predição de quatro métodos de regressão adequados para situações nas quais os dados, dispostos na matriz de planejamento, apresentam sérios problemas de multicolinearidade e/ou de alta dimensionalidade, em que o número de covariáveis é maior do que o número de observações. No presente trabalho, os métodos abordados são: regressão por componentes principais, regressão por mínimos quadrados parciais, regressão ridge e LASSO. O trabalho engloba simulações, em que o poder preditivo de cada uma das técnicas é avaliado para diferentes cenários definidos por número de covariáveis, tamanho de amostra e quantidade e intensidade de coeficientes (efeitos) significativos, destacando as principais diferenças entre os métodos e possibilitando a criação de um guia para que o usuário possa escolher qual metodologia usar com base em algum conhecimento prévio que o mesmo possa ter. Uma aplicação em dados reais (não simulados) também é abordada. / This paper presents a comparative study of the predictive power of four suitable regression methods for situations in which data, arranged in the planning matrix, are very poorly multicolinearity and / or highdimensionality, wherein the number of covariatesis greater the number of observations. In this study, the methods discussed are: principal component regression,partial least squares regression,ridge regression and LASSO. The work includes simulations, where in the predictive power of each of the techniques is evaluated for different scenarios defined by the number of covariates, sample size and quantity and intensity ratios (effects) significant, high lighting the main dffierences between the methods and allowing for the creating a guide for the user to choose which method to use based on some prior knowledge that it may have. An applicationon real data (not simulated) is also addressed. Alta dimensionalidade Highdimensionality LASSO LASSO Mínimos quadrados parciais Partial least squares Principal component regression Regressão por componentes principais Regressão ridge Ridge regression
52	Model-based calibration of a non-invasive blood glucose monitor Shulga, Yelena A 11 January 2006 (has links) This project was dedicated to the problem of improving a non-invasive blood glucose monitor being developed by the VivaScan Corporation. The company has made some progress in the non-invasive blood glucose device development and approached WPI for a statistical assistance in the improvement of their model in order to predict the glucose level more accurately. The main goal of this project was to improve the ability of the non-invasive blood glucose monitor to predict the glucose values more precisely. The goal was achieved by finding and implementing the best regression model. The methods included ordinary least squared regression, partial least squares regression, robust regression method, weighted least squares regression, local regression, and ridge regression. VivaScan calibration data for seven patients were analyzed in this project. For each of these patients, the individual regression models were built and compared based on the two factors that evaluate the model prediction ability. It was determined that partial least squares and ridge regressions are two best methods among the others that were considered in this work. Using these two methods gave better glucose prediction. The additional problem of data reduction to minimize the data collection time was also considered in this work. robust regression partial least squares regression glucose non-invasive ordinary least squared regression ridge regression local regression weighted least squares regression diabetes
53	Testing new genetic and genomic approaches for trait mapping and prediction in wheat (Triticum aestivum) and rice (Oryza spp) Ladejobi, Olufunmilayo Olubukola January 2018 (has links) Advances in molecular marker technologies have led to the development of high throughput genotyping techniques such as Genotyping by Sequencing (GBS), driving the application of genomics in crop research and breeding. They have also supported the use of novel mapping approaches, including Multi-parent Advanced Generation Inter-Cross (MAGIC) populations which have increased precision in identifying markers to inform plant breeding practices. In the first part of this thesis, a high density physical map derived from GBS was used to identify QTLs controlling key agronomic traits of wheat in a genome-wide association study (GWAS) and to demonstrate the practicability of genomic selection for predicting the trait values. The results from GBS were compared to a previous study conducted on the same association mapping panel using a less dense physical map derived from diversity arrays technology (DArT) markers. GBS detected more QTLs than DArT markers although some of the QTLs were detected by DArT markers alone. Prediction accuracies from the two marker platforms were mostly similar and largely dependent on trait genetic architecture. The second part of this thesis focused on MAGIC populations, which incorporate diversity and novel allelic combinations from several generations of recombination. Pedigrees representing a wild rice MAGIC population were used to model MAGIC populations by simulation to assess the level of recombination and creation of novel haplotypes. The wild rice species are an important reservoir of beneficial genes that have been variously introgressed into rice varieties using bi-parental population approaches. The level of recombination was found to be highly dependent on the number of crosses made and on the resulting population size. Creation of MAGIC populations require adequate planning in order to make sufficient number of crosses that capture optimal haplotype diversity. The third part of the thesis considers models that have been proposed for genomic prediction. The ridge regression best linear unbiased prediction (RR-BLUP) is based on the assumption that all genotyped molecular markers make equal contributions to the variations of a phenotype. Information from underlying candidate molecular markers are however of greater significance and can be used to improve the accuracy of prediction. Here, an existing Differentially Penalized Regression (DiPR) model which uses modifications to a standard RR-BLUP package and allows two or more marker sets from different platforms to be independently weighted was used. The DiPR model performed better than single or combined marker sets for predicting most of the traits both in a MAGIC population and an association mapping panel. Overall the work presented in this thesis shows that while these techniques have great promise, they should be carefully evaluated before introduction into breeding programmes.
54	Data-driven estimation for Aalen's additive risk model Boruvka, Audrey 02 August 2007 (has links) The proportional hazards model developed by Cox (1972) is by far the most widely used method for regression analysis of censored survival data. Application of the Cox model to more general event history data has become possible through extensions using counting process theory (e.g., Andersen and Borgan (1985), Therneau and Grambsch (2000)). With its development based entirely on counting processes, Aalen’s additive risk model offers a flexible, nonparametric alternative. Ordinary least squares, weighted least squares and ridge regression have been proposed in the literature as estimation schemes for Aalen’s model (Aalen (1989), Huffer and McKeague (1991), Aalen et al. (2004)). This thesis develops data-driven parameter selection criteria for the weighted least squares and ridge estimators. Using simulated survival data, these new methods are evaluated against existing approaches. A survey of the literature on the additive risk model and a demonstration of its application to real data sets are also provided. / Thesis (Master, Mathematics & Statistics) -- Queen's University, 2007-07-18 22:13:13.243 Aalen's additive model Bandwidth selection Data-driven estimation Event history analysis Generalized cross-validation l-curve Ridge regression Weighted least squares
55	Comparação de métodos de estimação para problemas com colinearidade e/ou alta dimensionalidade (p > n) Casagrande, Marcelo Henrique 29 April 2016 (has links) Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-10-06T11:48:12Z No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:41Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:47Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Made available in DSpace on 2016-10-20T13:58:52Z (GMT). No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) Previous issue date: 2016-04-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / This paper presents a comparative study of the predictive power of four suitable regression methods for situations in which data, arranged in the planning matrix, are very poorly multicolinearity and / or high dimensionality, wherein the number of covariates is greater the number of observations. In this study, the methods discussed are: principal component regression, partial least squares regression, ridge regression and LASSO. The work includes simulations, wherein the predictive power of each of the techniques is evaluated for di erent scenarios de ned by the number of covariates, sample size and quantity and intensity ratios (e ects) signi cant, highlighting the main di erences between the methods and allowing for the creating a guide for the user to choose which method to use based on some prior knowledge that it may have. An application on real data (not simulated) is also addressed. / Este trabalho apresenta um estudo comparativo do poder de predi c~ao de quatro m etodos de regress~ao adequados para situa c~oes nas quais os dados, dispostos na matriz de planejamento, apresentam s erios problemas de multicolinearidade e/ou de alta dimensionalidade, em que o n umero de covari aveis e maior do que o n umero de observa c~oes. No presente trabalho, os m etodos abordados s~ao: regress~ao por componentes principais, regress~ao por m nimos quadrados parciais, regress~ao ridge e LASSO. O trabalho engloba simula c~oes, em que o poder preditivo de cada uma das t ecnicas e avaliado para diferentes cen arios de nidos por n umero de covari aveis, tamanho de amostra e quantidade e intensidade de coe cientes (efeitos) signi cativos, destacando as principais diferen cas entre os m etodos e possibilitando a cria c~ao de um guia para que o usu ario possa escolher qual metodologia usar com base em algum conhecimento pr evio que o mesmo possa ter. Uma aplica c~ao em dados reais (n~ao simulados) tamb em e abordada Regressão ridge LASSO Mínimos quadrados parciais Regressão por componentes principais Alta dimensionalidade Ridge regression Partial least squares Principal component regression High dimensionality
56	Comparison of different models for forecasting of Czech electricity market / Comparison of different models for forecasting of Czech electricity market Kunc, Vladimír January 2017 (has links) There is a demand for decision support tools that can model the electricity markets and allows to forecast the hourly electricity price. Many different ap- proach such as artificial neural network or support vector regression are used in the literature. This thesis provides comparison of several different estima- tors under one settings using available data from Czech electricity market. The resulting comparison of over 5000 different estimators led to a selection of several best performing models. The role of historical weather data (temper- ature, dew point and humidity) is also assesed within the comparison and it was found that while the inclusion of weather data might lead to overfitting, it is beneficial under the right circumstances. The best performing approach was the Lasso regression estimated using modified Lars. 1
57	Extending covariance structure analysis for multivariate and functional data Sheppard, Therese January 2010 (has links) For multivariate data, when testing homogeneity of covariance matrices arising from two or more groups, Bartlett's (1937) modified likelihood ratio test statistic is appropriate to use under the null hypothesis of equal covariance matrices where the null distribution of the test statistic is based on the restrictive assumption of normality. Zhang and Boos (1992) provide a pooled bootstrap approach when the data cannot be assumed to be normally distributed. We give three alternative bootstrap techniques to testing homogeneity of covariance matrices when it is both inappropriate to pool the data into one single population as in the pooled bootstrap procedure and when the data are not normally distributed. We further show that our alternative bootstrap methodology can be extended to testing Flury's (1988) hierarchy of covariance structure models. Where deviations from normality exist, we show, by simulation, that the normal theory log-likelihood ratio test statistic is less viable compared with our bootstrap methodology. For functional data, Ramsay and Silverman (2005) and Lee et al (2002) together provide four computational techniques for functional principal component analysis (PCA) followed by covariance structure estimation. When the smoothing method for smoothing individual profiles is based on using least squares cubic B-splines or regression splines, we find that the ensuing covariance matrix estimate suffers from loss of dimensionality. We show that ridge regression can be used to resolve this problem, but only for the discretisation and numerical quadrature approaches to estimation, and that choice of a suitable ridge parameter is not arbitrary. We further show the unsuitability of regression splines when deciding on the optimal degree of smoothing to apply to individual profiles. To gain insight into smoothing parameter choice for functional data, we compare kernel and spline approaches to smoothing individual profiles in a nonparametric regression context. Our simulation results justify a kernel approach using a new criterion based on predicted squared error. We also show by simulation that, when taking account of correlation, a kernel approach using a generalized cross validatory type criterion performs well. These data-based methods for selecting the smoothing parameter are illustrated prior to a functional PCA on a real data set. 519.5
58	Prediction with Penalized Logistic Regression : An Application on COVID-19 Patient Gender based on Case Series Data Schwarz, Patrick January 2021 (has links) The aim of the study was to evaluate dierent types of logistic regression to find the optimal model to predict the gender of hospitalized COVID-19 patients. The models were based on COVID-19 case series data from Pakistan using a set of 18 explanatory variables out of which patient age and BMI were numerical and the rest were categorical variables, expressing symptoms and previous health issues. Compared were a logistic regression using all variables, a logistic regression that used stepwise variable selection with 4 explanatory variables, a logistic Ridge regression model, a logistic Lasso regression model and a logistic Elastic Net regression model. Based on several metrics assessing the goodness of fit of the models and the evaluation of predictive power using the area under the ROC curve the Elastic Net that was only using the Lasso penalty had the best result and was able to predict 82.5% of the test cases correctly. Covid-19 Logistic Regression Penalized Regression Ridge Regression Elastic Net Classification Predictive Modeling Statistical Modeling glmnet Probability Theory and Statistics Sannolikhetsteori och statistik
59	Machine Learning of Crystal Formation Energies with Novel Structural Descriptors / Maskininlärning av kristallers formationsenergier Bratu, Claudia January 2017 (has links) To assist technology advancements, it is important to continue the search for new materials. The stability of a crystal structures is closely connected to its formation energy. By calculating the formation energies of theoretical crystal structures it is possible to find new stable materials. However, the number of possible structures are so many that traditional methods relying on quantum mechanics, such as Density Functional Theory (DFT), require too much computational time to be viable in such a project. A presented alternative to such calculations is machine learning. Machine learning is an umbrella term for algorithms that can use information gained from one set of data to predict properties of new, similar data. Feature vector representations (descriptors) are used to present data in an appropriate manner to the machine. Thus far, no combination of machine learning method and feature vector representation has been established as general and accurate enough to be of practical use for accelerating the phase diagram calculations necessary for predicting material stability. It is important that the method predicts all types of structures equally well, regardless of stability, composition, or geometrical structure. In this thesis, the performances of different feature vector representations were compared to each other. The machine learning method used was primarily Kernel Ridge Regression, implemented in Python. The training and validation were performed on two different datasets and subsets of these. The representation which consistently yielded the lowest cross-validated error was a representation using the Voronoi tessellation of the structure by Ward et. al. [Phys. Rev. B 96, 024104 (2017)]. Following up was an experimental representation called the SLATM representation presented by Huang and von Lilienfeld [arXiv:1707.04146], which is partially based on the Radial Distribution Function. The Voronoi representation achieved an MAE of 0.16 eV/atom at 3534 training set size for one of the sets, and 0.28 eV/atom at 10086 training set size for the other set. The effect of separating linear and non-linear energy contributions was evaluated using the sinusoidal and Coulomb representations. The result was that separating these improved the error for small training set sizes, but the effect diminishes as the training set size increases. The results from this thesis implicate that further work is still required for machine learning to be used effectively in the search for new materials. Machine learning crystal formation energy kernel ridge regression KRR representation descriptor feature vector representation Condensed Matter Physics Den kondenserade materiens fysik
60	Predicting deliveries from suppliers : A comparison of predictive models Sawert, Marcus January 2020 (has links) In the highly competitive environment that companies find themselves in today, it is key to have a well-functioning supply chain. For manufacturing companies, having a good supply chain is dependent on having a functioning production planning. The production planning tries to fulfill the demand while considering the resources available. This is complicated by the uncertainties that exist, such as the uncertainty in demand, in manufacturing and in supply. Several methods and models have been created to deal with production planning under uncertainty, but they often overlook the complexity in the supply uncertainty, by considering it as a stochastic uncertainty. To improve these models, a prediction based on earlier data regarding the supplier or item could be used to see when the delivery is likely to arrive. This study looked to compare different predictive models to see which one could best be suited for this purpose. Historic data regarding earlier deliveries was gathered from a large international manufacturing company and was preprocessed before used in the models. The target value that the models were to predict was the actual delivery time from the supplier. The data was then tested with the following four regression models in Python: Linear regression, ridge regression, Lasso and Elastic net. The results were calculated by cross-validation and presented in the form of the mean absolute error together with the standard deviation. The results showed that the Elastic net was the overall best performing model, and that the linear regression performed worst. Production planning Supply Deliveries Prediction Linear regression Ridge regression Lasso Elastic net Övrig annan teknik

Search results