Global ETD Search

41	Data-driven estimation for Aalen's additive risk model Boruvka, Audrey 02 August 2007 (has links) The proportional hazards model developed by Cox (1972) is by far the most widely used method for regression analysis of censored survival data. Application of the Cox model to more general event history data has become possible through extensions using counting process theory (e.g., Andersen and Borgan (1985), Therneau and Grambsch (2000)). With its development based entirely on counting processes, Aalen’s additive risk model offers a flexible, nonparametric alternative. Ordinary least squares, weighted least squares and ridge regression have been proposed in the literature as estimation schemes for Aalen’s model (Aalen (1989), Huffer and McKeague (1991), Aalen et al. (2004)). This thesis develops data-driven parameter selection criteria for the weighted least squares and ridge estimators. Using simulated survival data, these new methods are evaluated against existing approaches. A survey of the literature on the additive risk model and a demonstration of its application to real data sets are also provided. / Thesis (Master, Mathematics & Statistics) -- Queen's University, 2007-07-18 22:13:13.243 Aalen's additive model Bandwidth selection Data-driven estimation Event history analysis Generalized cross-validation l-curve Ridge regression Weighted least squares
42	Comparação de métodos de estimação para problemas com colinearidade e/ou alta dimensionalidade (p > n) Casagrande, Marcelo Henrique 29 April 2016 (has links) Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-10-06T11:48:12Z No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:41Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:47Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Made available in DSpace on 2016-10-20T13:58:52Z (GMT). No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) Previous issue date: 2016-04-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / This paper presents a comparative study of the predictive power of four suitable regression methods for situations in which data, arranged in the planning matrix, are very poorly multicolinearity and / or high dimensionality, wherein the number of covariates is greater the number of observations. In this study, the methods discussed are: principal component regression, partial least squares regression, ridge regression and LASSO. The work includes simulations, wherein the predictive power of each of the techniques is evaluated for di erent scenarios de ned by the number of covariates, sample size and quantity and intensity ratios (e ects) signi cant, highlighting the main di erences between the methods and allowing for the creating a guide for the user to choose which method to use based on some prior knowledge that it may have. An application on real data (not simulated) is also addressed. / Este trabalho apresenta um estudo comparativo do poder de predi c~ao de quatro m etodos de regress~ao adequados para situa c~oes nas quais os dados, dispostos na matriz de planejamento, apresentam s erios problemas de multicolinearidade e/ou de alta dimensionalidade, em que o n umero de covari aveis e maior do que o n umero de observa c~oes. No presente trabalho, os m etodos abordados s~ao: regress~ao por componentes principais, regress~ao por m nimos quadrados parciais, regress~ao ridge e LASSO. O trabalho engloba simula c~oes, em que o poder preditivo de cada uma das t ecnicas e avaliado para diferentes cen arios de nidos por n umero de covari aveis, tamanho de amostra e quantidade e intensidade de coe cientes (efeitos) signi cativos, destacando as principais diferen cas entre os m etodos e possibilitando a cria c~ao de um guia para que o usu ario possa escolher qual metodologia usar com base em algum conhecimento pr evio que o mesmo possa ter. Uma aplica c~ao em dados reais (n~ao simulados) tamb em e abordada Regressão ridge LASSO Mínimos quadrados parciais Regressão por componentes principais Alta dimensionalidade Ridge regression Partial least squares Principal component regression High dimensionality
43	Comparison of different models for forecasting of Czech electricity market / Comparison of different models for forecasting of Czech electricity market Kunc, Vladimír January 2017 (has links) There is a demand for decision support tools that can model the electricity markets and allows to forecast the hourly electricity price. Many different ap- proach such as artificial neural network or support vector regression are used in the literature. This thesis provides comparison of several different estima- tors under one settings using available data from Czech electricity market. The resulting comparison of over 5000 different estimators led to a selection of several best performing models. The role of historical weather data (temper- ature, dew point and humidity) is also assesed within the comparison and it was found that while the inclusion of weather data might lead to overfitting, it is beneficial under the right circumstances. The best performing approach was the Lasso regression estimated using modified Lars. 1
44	Extending covariance structure analysis for multivariate and functional data Sheppard, Therese January 2010 (has links) For multivariate data, when testing homogeneity of covariance matrices arising from two or more groups, Bartlett's (1937) modified likelihood ratio test statistic is appropriate to use under the null hypothesis of equal covariance matrices where the null distribution of the test statistic is based on the restrictive assumption of normality. Zhang and Boos (1992) provide a pooled bootstrap approach when the data cannot be assumed to be normally distributed. We give three alternative bootstrap techniques to testing homogeneity of covariance matrices when it is both inappropriate to pool the data into one single population as in the pooled bootstrap procedure and when the data are not normally distributed. We further show that our alternative bootstrap methodology can be extended to testing Flury's (1988) hierarchy of covariance structure models. Where deviations from normality exist, we show, by simulation, that the normal theory log-likelihood ratio test statistic is less viable compared with our bootstrap methodology. For functional data, Ramsay and Silverman (2005) and Lee et al (2002) together provide four computational techniques for functional principal component analysis (PCA) followed by covariance structure estimation. When the smoothing method for smoothing individual profiles is based on using least squares cubic B-splines or regression splines, we find that the ensuing covariance matrix estimate suffers from loss of dimensionality. We show that ridge regression can be used to resolve this problem, but only for the discretisation and numerical quadrature approaches to estimation, and that choice of a suitable ridge parameter is not arbitrary. We further show the unsuitability of regression splines when deciding on the optimal degree of smoothing to apply to individual profiles. To gain insight into smoothing parameter choice for functional data, we compare kernel and spline approaches to smoothing individual profiles in a nonparametric regression context. Our simulation results justify a kernel approach using a new criterion based on predicted squared error. We also show by simulation that, when taking account of correlation, a kernel approach using a generalized cross validatory type criterion performs well. These data-based methods for selecting the smoothing parameter are illustrated prior to a functional PCA on a real data set. 519.5
45	Prediction with Penalized Logistic Regression : An Application on COVID-19 Patient Gender based on Case Series Data Schwarz, Patrick January 2021 (has links) The aim of the study was to evaluate dierent types of logistic regression to find the optimal model to predict the gender of hospitalized COVID-19 patients. The models were based on COVID-19 case series data from Pakistan using a set of 18 explanatory variables out of which patient age and BMI were numerical and the rest were categorical variables, expressing symptoms and previous health issues. Compared were a logistic regression using all variables, a logistic regression that used stepwise variable selection with 4 explanatory variables, a logistic Ridge regression model, a logistic Lasso regression model and a logistic Elastic Net regression model. Based on several metrics assessing the goodness of fit of the models and the evaluation of predictive power using the area under the ROC curve the Elastic Net that was only using the Lasso penalty had the best result and was able to predict 82.5% of the test cases correctly. Covid-19 Logistic Regression Penalized Regression Ridge Regression Elastic Net Classification Predictive Modeling Statistical Modeling glmnet Probability Theory and Statistics Sannolikhetsteori och statistik
46	Machine Learning of Crystal Formation Energies with Novel Structural Descriptors / Maskininlärning av kristallers formationsenergier Bratu, Claudia January 2017 (has links) To assist technology advancements, it is important to continue the search for new materials. The stability of a crystal structures is closely connected to its formation energy. By calculating the formation energies of theoretical crystal structures it is possible to find new stable materials. However, the number of possible structures are so many that traditional methods relying on quantum mechanics, such as Density Functional Theory (DFT), require too much computational time to be viable in such a project. A presented alternative to such calculations is machine learning. Machine learning is an umbrella term for algorithms that can use information gained from one set of data to predict properties of new, similar data. Feature vector representations (descriptors) are used to present data in an appropriate manner to the machine. Thus far, no combination of machine learning method and feature vector representation has been established as general and accurate enough to be of practical use for accelerating the phase diagram calculations necessary for predicting material stability. It is important that the method predicts all types of structures equally well, regardless of stability, composition, or geometrical structure. In this thesis, the performances of different feature vector representations were compared to each other. The machine learning method used was primarily Kernel Ridge Regression, implemented in Python. The training and validation were performed on two different datasets and subsets of these. The representation which consistently yielded the lowest cross-validated error was a representation using the Voronoi tessellation of the structure by Ward et. al. [Phys. Rev. B 96, 024104 (2017)]. Following up was an experimental representation called the SLATM representation presented by Huang and von Lilienfeld [arXiv:1707.04146], which is partially based on the Radial Distribution Function. The Voronoi representation achieved an MAE of 0.16 eV/atom at 3534 training set size for one of the sets, and 0.28 eV/atom at 10086 training set size for the other set. The effect of separating linear and non-linear energy contributions was evaluated using the sinusoidal and Coulomb representations. The result was that separating these improved the error for small training set sizes, but the effect diminishes as the training set size increases. The results from this thesis implicate that further work is still required for machine learning to be used effectively in the search for new materials. Machine learning crystal formation energy kernel ridge regression KRR representation descriptor feature vector representation Condensed Matter Physics Den kondenserade materiens fysik
47	Predicting deliveries from suppliers : A comparison of predictive models Sawert, Marcus January 2020 (has links) In the highly competitive environment that companies find themselves in today, it is key to have a well-functioning supply chain. For manufacturing companies, having a good supply chain is dependent on having a functioning production planning. The production planning tries to fulfill the demand while considering the resources available. This is complicated by the uncertainties that exist, such as the uncertainty in demand, in manufacturing and in supply. Several methods and models have been created to deal with production planning under uncertainty, but they often overlook the complexity in the supply uncertainty, by considering it as a stochastic uncertainty. To improve these models, a prediction based on earlier data regarding the supplier or item could be used to see when the delivery is likely to arrive. This study looked to compare different predictive models to see which one could best be suited for this purpose. Historic data regarding earlier deliveries was gathered from a large international manufacturing company and was preprocessed before used in the models. The target value that the models were to predict was the actual delivery time from the supplier. The data was then tested with the following four regression models in Python: Linear regression, ridge regression, Lasso and Elastic net. The results were calculated by cross-validation and presented in the form of the mean absolute error together with the standard deviation. The results showed that the Elastic net was the overall best performing model, and that the linear regression performed worst. Production planning Supply Deliveries Prediction Linear regression Ridge regression Lasso Elastic net Övrig annan teknik
48	Forecasting hourly electricity consumption for sets of households using machine learning algorithms Linton, Thomas January 2015 (has links) To address inefficiency, waste, and the negative consequences of electricity generation, companies and government entities are looking to behavioural change among residential consumers. To drive behavioural change, consumers need better feedback about their electricity consumption. A monthly or quarterly bill provides the consumer with almost no useful information about the relationship between their behaviours and their electricity consumption. Smart meters are now widely dispersed in developed countries and they are capable of providing electricity consumption readings at an hourly resolution, but this data is mostly used as a basis for billing and not as a tool to assist the consumer in reducing their consumption. One component required to deliver innovative feedback mechanisms is the capability to forecast hourly electricity consumption at the household scale. The work presented by this thesis is an evaluation of the effectiveness of a selection of kernel based machine learning methods at forecasting the hourly aggregate electricity consumption for different sized sets of households. The work of this thesis demonstrates that k-Nearest Neighbour Regression and Gaussian process Regression are the most accurate methods within the constraints of the problem considered. In addition to accuracy, the advantages and disadvantages of each machine learning method are evaluated, and a simple comparison of each algorithms computational performance is made. / För att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod. machine learning kernel methods k-nearest neighbour kernel ridge regression gaussian processes support vector regression electricity forecasting Computer and Information Sciences Data- och informationsvetenskap
49	House Price Prediction Aghi, Nawar, Abdulal, Ahmad January 2020 (has links) This study proposes a performance comparison between machine learning regression algorithms and Artificial Neural Network (ANN). The regression algorithms used in this study are Multiple linear, Least Absolute Selection Operator (Lasso), Ridge, Random Forest. Moreover, this study attempts to analyse the correlation between variables to determine the most important factors that affect house prices in Malmö, Sweden. There are two datasets used in this study which called public and local. They contain house prices from Ames, Iowa, United States and Malmö, Sweden, respectively.The accuracy of the prediction is evaluated by checking the root square and root mean square error scores of the training model. The test is performed after applying the required pre-processing methods and splitting the data into two parts. However, one part will be used in the training and the other in the test phase. We have also presented a binning strategy that improved the accuracy of the models.This thesis attempts to show that Lasso gives the best score among other algorithms when using the public dataset in training. The correlation graphs show the variables' level of dependency. In addition, the empirical results show that crime, deposit, lending, and repo rates influence the house prices negatively. Where inflation, year, and unemployment rate impact the house prices positively. Multiple linear regression Lasso Regression Ridge Regression Random Forest Regression Artificial Neural Network Machine Learning House Price Prediction Computer Sciences Datavetenskap (datalogi)
50	Genomic selection can replace phenotypic selection in early generation wheat breeding Borrenpohl, Daniel January 2019 (has links) No description available. Agriculture Plant Sciences Genetics

Search results