Global ETD Search

11	Optimal estimation of head scan data with generalized cross validation Fang, Haian January 1995 (has links) No description available. Optimal Estimation Head Scan Data Generalized Cross Validation
12	Case and covariate influence: implications for model assessment Duncan, Kristin A. 12 October 2004 (has links) No description available. Statistics Bayesian models model assessment cross-validation item response theory
13	Bias reduction studies in nonparametric regression with applications : an empirical approach / Marike Krugell Krugell, Marike January 2014 (has links) The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with crossvalidation bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao (2012). The di erent resulting regression estimates are evaluated by minimising a global discrepancy measure, i.e. the mean integrated squared error (MISE). In the machine learning context various improvement methods, in terms of the precision and accuracy of an estimator, exist. The rst two improvement methods introduced in this study are bootstrapped based. Bagging is an acronym for bootstrap aggregating and was introduced by Breiman (1996a) from a machine learning viewpoint and by Swanepoel (1988, 1990) in a functional context. Bagging is primarily a variance reduction tool, i.e. bagging is implemented to reduce the variance of an estimator and in this way improve the precision of the estimation process. Bagging is performed by drawing repetitive bootstrap samples from the original sample and generating multiple versions of an estimator. These replicates of the estimator are then used to obtain an aggregated estimator. Bragging stands for bootstrap robust aggregating. A robust estimator is obtained by using the sample median over the B bootstrap estimates instead of the sample mean as in bagging. The third improvement method aims to reduce the bias component of the estimator and is referred to as boosting. Boosting is a general method for improving the accuracy of any given learning algorithm. The method starts of with a sensible estimator and improves iteratively, based on its performance on a training dataset. Results and conclusions verifying existing literature are provided, as well as new results for the new methods. / MSc (Statistics), North-West University, Potchefstroom Campus, 2015 Kernel regression estimators Cross-validation bandwidth Plug-in bandwidth Bagging Bragging Boosting
14	Bias reduction studies in nonparametric regression with applications : an empirical approach / Marike Krugell Krugell, Marike January 2014 (has links) The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with crossvalidation bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao (2012). The di erent resulting regression estimates are evaluated by minimising a global discrepancy measure, i.e. the mean integrated squared error (MISE). In the machine learning context various improvement methods, in terms of the precision and accuracy of an estimator, exist. The rst two improvement methods introduced in this study are bootstrapped based. Bagging is an acronym for bootstrap aggregating and was introduced by Breiman (1996a) from a machine learning viewpoint and by Swanepoel (1988, 1990) in a functional context. Bagging is primarily a variance reduction tool, i.e. bagging is implemented to reduce the variance of an estimator and in this way improve the precision of the estimation process. Bagging is performed by drawing repetitive bootstrap samples from the original sample and generating multiple versions of an estimator. These replicates of the estimator are then used to obtain an aggregated estimator. Bragging stands for bootstrap robust aggregating. A robust estimator is obtained by using the sample median over the B bootstrap estimates instead of the sample mean as in bagging. The third improvement method aims to reduce the bias component of the estimator and is referred to as boosting. Boosting is a general method for improving the accuracy of any given learning algorithm. The method starts of with a sensible estimator and improves iteratively, based on its performance on a training dataset. Results and conclusions verifying existing literature are provided, as well as new results for the new methods. / MSc (Statistics), North-West University, Potchefstroom Campus, 2015 Kernel regression estimators Cross-validation bandwidth Plug-in bandwidth Bagging Bragging Boosting
15	Systematic ensemble learning and extensions for regression / Méthodes d'ensemble systématiques et extensions en apprentissage automatique pour la régression Aldave, Roberto January 2015 (has links) Abstract : The objective is to provide methods to improve the performance, or prediction accuracy of standard stacking approach, which is an ensemble method composed of simple, heterogeneous base models, through the integration of the diversity generation, combination and/or selection stages for regression problems. In Chapter 1, we propose to combine a set of level-1 learners into a level-2 learner, or ensemble. We also propose to inject a diversity generation mechanism into the initial cross-validation partition, from which new cross-validation partitions are generated, and sub-sequent ensembles are trained. Then, we propose an algorithm to select best partition, or corresponding ensemble. In Chapter 2, we formulate the partition selection as a Pareto-based multi-criteria optimization problem, as well as an algorithm to make the partition selection iterative with the aim to improve more the ensemble prediction accuracy. In Chapter 3, we propose to generate multiple populations or partitions by injecting a diversity mechanism to the original dataset. Then, an algorithm is proposed to select the best partition among all partitions generated by the multiple populations. All methods designed and implemented in this thesis get encouraging, and favorably results across different dataset against both state-of-the-art models, and ensembles for regression. / Résumé : L’objectif est de fournir des techniques permettant d’améliorer la performance de l’algorithme de stacking, une méthode ensembliste composée de modèles de base simples et hétérogènes, à travers l’intégration de la génération de la diversité, la sélection et combinaison des modèles. Dans le chapitre 1, nous proposons de combiner différents sous-ensembles de modèles de base obtenus au primer niveau. Nous proposons un mécanisme pour injecter de la diversité dans la partition croisée initiale, à partir de laquelle de nouvelles partitions de validation croisée sont générées, et les ensembles correspondant sont formés. Ensuite, nous proposons un algorithme pour sélectionner la meilleure partition. Dans le chapitre 2, nous formulons la sélection de la partition comme un problème d’optimisation multi-objectif fondé sur un principe de Pareto, ainsi que d’un algorithme pour faire une application itérative de la sélection avec l’objectif d’améliorer d’avantage la précision d’ensemble. Dans le chapitre 3, nous proposons de générer plusieurs populations en injectant un mécanisme de diversité à l’ensemble de données original. Ensuite, un algorithme est proposé pour sélectionner la meilleur partition entre toutes les partitions produite par les multiples populations. Nous avons obtenu des résultats encourageants avec ces algorithmes lors de comparaisons avec des modèles reconnus sur plusieurs bases de données. Ensemble learning Stacked generalization Systematic cross-validation Ensemble selection Regression Pareto non-dominated alternatives Multi-criteria optimization
16	Supervised Learning Techniques : A comparison of the Random Forest and the Support Vector Machine Arnroth, Lukas, Fiddler Dennis, Jonni January 2016 (has links) This thesis examines the performance of the support vector machine and the random forest models in the context of binary classification. The two techniques are compared and the outstanding one is used to construct a final parsimonious model. The data set consists of 33 observations and 89 biomarkers as features with no known dependent variable. The dependent variable is generated through k-means clustering, with a predefined final solution of two clusters. The training of the algorithms is performed using five-fold cross-validation repeated twenty times. The outcome of the training process reveals that the best performing versions of the models are a linear support vector machine and a random forest with six randomly selected features at each split. The final results of the comparison on the test set of these optimally tuned algorithms show that the random forest outperforms the linear kernel support vector machine. The former classifies all observations in the test set correctly whilst the latter classifies all but one correctly. Hence, a parsimonious random forest model using the top five features is constructed, which, to conclude, performs equally well on the test set compared to the original random forest model using all features. machine learning biomarkers cross-validation receiver operating characteristic k-means clustering feature selection binary classification
17	Tuning Parameter Selection in L1 Regularized Logistic Regression Shi, Shujing 05 December 2012 (has links) Variable selection is an important topic in regression analysis and is intended to select the best subset of predictors. Least absolute shrinkage and selection operator (Lasso) was introduced by Tibshirani in 1996. This method can serve as a tool for variable selection because it shrinks some coefficients to exact zero by a constraint on the sum of absolute values of regression coefficients. For logistic regression, Lasso modifies the traditional parameter estimation method, maximum log likelihood, by adding the L1 norm of the parameters to the negative log likelihood function, so it turns a maximization problem into a minimization one. To solve this problem, we first need to give the value for the parameter of the L1 norm, called tuning parameter. Since the tuning parameter affects the coefficients estimation and variable selection, we want to find the optimal value for the tuning parameter to get the most accurate coefficient estimation and best subset of predictors in the L1 regularized regression model. There are two popular methods to select the optimal value of the tuning parameter that results in a best subset of predictors, Bayesian information criterion (BIC) and cross validation (CV). The objective of this paper is to evaluate and compare these two methods for selecting the optimal value of tuning parameter in terms of coefficients estimation accuracy and variable selection through simulation studies. Variable Selection Logistic Regression Lasso Tuning Parameter BIC Cross Validation Physical Sciences and Mathematics
18	Image-Based Non-Contact Conductivity Prediction for Inkjet Printed Electrodes and Follow-Up Work of Toner Usage Prediction for Laser Electro-Phorographic Printers Yang Yan (6861362) 16 August 2019 (has links) <div>This thesis includes two parts. The main part is on the topic of conductivity prediction for Inkjet printed silver electrodes. The second part is about the follow-up work of toner usage prediction of laser electro-photographic printers. </div><div><br></div><div>For conductivity prediction of Inkjet printed silver electrodes part, the brief introduction is described below. Recently, electronic devices made with Inkjet printing technique and flexible thin films have attracted great attention due to their potential applications in sensor manufacturing. This imaging system has become a great tool to monitor the quality of Inkjet printed electrodes due to the fact that most thickness or resistance measuring devices can destroy the surface of a printed electrode or even whole electrode. Thus, a non-contact image-based approach to estimate sheet resistance of Inkjet printed electrodes is developed.</div><div><br></div><div>The approach has two stages. Firstly, strip-shaped electrodes are systematically printed with various printing parameters. The sheet resistance measurement data as</div><div>well as images of the electrodes are acquired. Then, based on the real experimental data, the fitting model is constructed and further used in predicting the sheet</div><div>resistance of the Inkjet printed silver electrodes.</div><div><br></div><div>For toner usage prediction part, the introduction is described below. With the widespread use of laser electro-photographic printers in both industry and households fields, estimation of toner usage has great significance to ensuring the full utilization of each cartridge. The follow-up work is focused on testing and improving feasibility, reliability, and adaptability of the Black Box Model (BBM) based two-stage strategy in estimating the toner usage. Comparing with previous methods, the training process for the firrst stage requires less time and disk storage, all while maintaining high accuracy. For the second stage, experiments are performed on various models of printers, with cyan(C), magenta(M), yellow(Y), and black(K) color cartridges.</div> inkjet printing Machine learning Electro-photographic printers Cross Validation
19	Predicting rifle shooting accuracy from context and sensor data : A study of how to perform data mining and knowledge discovery in the target shooting domain / Prediktering av skytteträffsäkerhet baserat på kontext och sensordata. Pettersson, Max, Jansson, Viktor January 2019 (has links) The purpose of this thesis is to develop an interpretable model that gives predictions for what factors impacted a shooter’s results. Experiment is our chosen research method. Our three independent variables are weapon movement, trigger pull force and heart rate. Our dependent variable is shooting accuracy. A random forest regression model is trained with the experiment data to produce predictions of shooting accuracy and to show correlation between independent and dependent variables. Our method shows that an increase in weapon movement, trigger pull force and heart rate decrease the predicted accuracy score. Weapon movement impacted shooting results the most with 53.61%, while trigger pull force and heart rateimpacted shooting results 22.20% and 24.18% respectively. We have also shown that LIME can be a viable method to give explanations on how the measured factors impacted shooting results. The results from this thesis lay the groundwork for better training tools for target shooting using explainable prediction models with sensors. Interpretability Target shooting Regression trees Feature selection Cross-validation Computer Sciences Datavetenskap (datalogi)
20	Time Series Forecasting of House Prices: An evaluation of a Support Vector Machine and a Recurrent Neural Network with LSTM cells Rostami, Jako, Hansson, Fredrik January 2019 (has links) In this thesis, we examine the performance of different forecasting methods. We use dataof monthly house prices from the larger Stockholm area and the municipality of Uppsalabetween 2005 and early 2019 as the time series to be forecast. Firstly, we compare theperformance of two machine learning methods, the Long Short-Term Memory, and theSupport Vector Machine methods. The two methods forecasts are compared, and themodel with the lowest forecasting error measured by three metrics is chosen to be comparedwith a classic seasonal ARIMA model. We find that the Long Short-Term Memorymethod is the better performing machine learning method for a twelve-month forecast,but that it still does not forecast as well as the ARIMA model for the same forecast period. machine learning cross-validation seasonality sliding window sequential model supervised learning Probability Theory and Statistics Sannolikhetsteori och statistik

Search results