Global ETD Search

1	Bagging Regularizes Poggio, Tomaso, Rifkin, Ryan, Mukherjee, Sayan, Rakhlin, Alex 01 March 2002 (has links) Intuitively, we expect that averaging --- or bagging --- different regressors with low correlation should smooth their behavior and be somewhat similar to regularization. In this note we make this intuition precise. Using an almost classical definition of stability, we prove that a certain form of averaging provides generalization bounds with a rate of convergence of the same order as Tikhonov regularization --- similar to fashionable RKHS-based learning algorithms. AI Bagging stability regularization
2	Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification Almaadeed, Noor January 2014 (has links) The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem of identifying a speaker from its voice regardless of the content (i.e. text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system. A novel approach towards speaker identification is developed using wavelet analysis, and multiple neural networks including Probabilistic Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state- of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA). Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear. Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that the proposed scheme is one of the best candidates for the fusion of face and voice due to its low computational time and high recognition accuracy. 621.39
3	[pt] EXPLORANDO NOVOS MÉTODOS PARA REALIZAR BAGGING COM AMORTECIMENTO EXPONENCIAL / [en] EXPLORING NEW METHODS TO PERFORM BAGGING WITH EXPONENTIAL SMOOTHING DAVID SOUZA PINTO 07 December 2020 (has links) [pt] Métodos de amortecimento exponencial são formulações versáteis para a previsão de séries temporais univariadas, desenvolvidas na década de 1960. Modelos mais recentes têm feito uso do bagging para melhorar a qualidade das previsões. Um destes, o BaggedETS, desenvolvido em 2016, trouxe melhorias na qualidade de previsão e está disponível na biblioteca forecast para R. Uma proposta posterior, BaggedClusterETS, adicionou uma etapa de clustering e validação para tratar o efeito da covariância associada ao uso do bagging, resultando em ganhos adicionais de performance. Este trabalho explora três extensões dos métodos supracitados e seus efeitos: o primeiro estuda os efeitos do maximum entropy bootstrap na realização do BaggedETS. O segundo explora diferentes medidas de dissimilaridade para construir os clusters do BaggedClusterETS. O terceiro emprega uma versão simplificada do BaggedClusterETS, removendo as etapas de validação e seleção, empregando apenas os medóides para realizar o bagging. Para testar estas propostas, 21 séries temporais da aviação civil e demanda energética foram empregadas. / [en] Exponential smoothing methods are flexible procedures for univariate time series forecasting, developed in the 1960 s. Most recent developments based on these models use bagging to improve forecast quality. One of these implementations, BaggedETS, developed in 2016, brought improvements in forecast quality and is distributed through the forecast package for R. A posterior implementation, BaggedClusterETS, adds clustering and validation steps to address the covariance effect associated with bagging. The proposal resulted in further accuracy improvements. This work delves into three extensions of the aforementioned methods: the first studies the effects of the maximum entropy bootstrap on the BaggedETS. The second explores different dissimilarity measures to construct the clusters in BaggedClusterETS. The third studies a simplified version of BaggedClusterETS, where the validation and selection steps are removed, and using only the medoids for bagging. To test these proposals, 21 time series from civil aviation and energy consumption were used. [pt] SERIE TEMPORAL [en] TIME SERIE [pt] CLUSTERIZACAO [en] CLUSTERING [pt] BAGGING [en] BAGGING
4	Predicting Patient Satisfaction With Ensemble Methods Rosales, Elisa Renee 30 April 2015 (has links) Health plans are constantly seeking ways to assess and improve the quality of patient experience in various ambulatory and institutional settings. Standardized surveys are a common tool used to gather data about patient experience, and a useful measurement taken from these surveys is known as the Net Promoter Score (NPS). This score represents the extent to which a patient would, or would not, recommend his or her physician on a scale from 0 to 10, where 0 corresponds to "Extremely unlikely" and 10 to "Extremely likely". A large national health plan utilized automated calls to distribute such a survey to its members and was interested in understanding what factors contributed to a patient's satisfaction. Additionally, they were interested in whether or not NPS could be predicted using responses from other questions on the survey, along with demographic data. When the distribution of various predictors was compared between the less satisfied and highly satisfied members, there was significant overlap, indicating that not even the Bayes Classifier could successfully differentiate between these members. Moreover, the highly imbalanced proportion of NPS responses resulted in initial poor prediction accuracy. Thus, due to the non-linear structure of the data, and high number of categorical predictors, we have leveraged flexible methods, such as decision trees, bagging, and random forests, for modeling and prediction. We further altered the prediction step in the random forest algorithm in order to account for the imbalanced structure of the data. patient satisfaction random forests bagging ensemble methods decision trees
5	Missing imputation methods explored in big data analytics Brydon, Humphrey Charles January 2018 (has links) Philosophiae Doctor - PhD (Statistics and Population Studies) / The aim of this study is to look at the methods and processes involved in imputing missing data and more specifically, complete missing blocks of data. A further aim of this study is to look at the effect that the imputed data has on the accuracy of various predictive models constructed on the imputed data and hence determine if the imputation method involved is suitable. The identification of the missingness mechanism present in the data should be the first process to follow in order to identify a possible imputation method. The identification of a suitable imputation method is easier if the mechanism can be identified as one of the following; missing completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR). Predictive models constructed on the complete imputed data sets are shown to be less accurate for those models constructed on data sets which employed a hot-deck imputation method. The data sets which employed either a single or multiple Monte Carlo Markov Chain (MCMC) or the Fully Conditional Specification (FCS) imputation methods are shown to result in predictive models that are more accurate. The addition of an iterative bagging technique in the modelling procedure is shown to produce highly accurate prediction estimates. The bagging technique is applied to variants of the neural network, a decision tree and a multiple linear regression (MLR) modelling procedure. A stochastic gradient boosted decision tree (SGBT) is also constructed as a comparison to the bagged decision tree. Final models are constructed from 200 iterations of the various modelling procedures using a 60% sampling ratio in the bagging procedure. It is further shown that the addition of the bagging technique in the MLR modelling procedure can produce a MLR model that is more accurate than that of the other more advanced modelling procedures under certain conditions. The evaluation of the predictive models constructed on imputed data is shown to vary based on the type of fit statistic used. It is shown that the average squared error reports little difference in the accuracy levels when compared to the results of the Mean Absolute Prediction Error (MAPE). The MAPE fit statistic is able to magnify the difference in the prediction errors reported. The Normalized Mean Bias Error (NMBE) results show that all predictive models constructed produced estimates that were an over-prediction, although these did vary depending on the data set and modelling procedure used. The Nash Sutcliffe efficiency (NSE) was used as a comparison statistic to compare the accuracy of the predictive models in the context of imputed data. The NSE statistic showed that the estimates of the models constructed on the imputed data sets employing a multiple imputation method were highly accurate. The NSE statistic results reported that the estimates from the predictive models constructed on the hot-deck imputed data were inaccurate and that a mean substitution of the fully observed data would have been a better method of imputation. The conclusion reached in this study shows that the choice of imputation method as well as that of the predictive model is dependent on the data used. Four unique combinations of imputation methods and modelling procedures were concluded for the data considered in this study.
6	Development Of A Scada Control System For A Weighing And Bagging Machine Aykac, Emel Sinem 01 May 2010 (has links) (PDF) In this thesis study, a prototype is designed in order to improve the weighing accuracy of the weighing and packaging machine that used in sugar factories. The unavoidable factory conditions cause weighing and packaging machine to do weighing errors. In order to correct these errors, the prototype produced in this study was designed as a quality control unit which will take the excess sugar and fill the deficient sugar in the sacks. Because of being small and having an easy installation, the application of the prototype was done considering 1-kilogram bags rather than available 50-kilogram ones. So as to correct the faulty weighing, sugar extraction and filling processes are provided from a bunker which is designed on the basis of data obtained by statistical analysis. For suction, vacuum is used and filling is realized by a ball valve. Upwards and downwards movement of the bunker is carried out with a pneumatic cylinder. Weighing information is received via a load cell and an indicator. Control of all these devices is provided by PLC hardware and SCADA interface. TJ Pneumatic Machinery 950-1030 weighing, bagging, sugar, PLC, SCADA
7	Bagged clustering Leisch, Friedrich January 1999 (has links) (PDF) A new ensemble method for cluster analysis is introduced, which can be interpreted in two different ways: As complexity-reducing preprocessing stage for hierarchical clustering and as combination procedure for several partitioning results. The basic idea is to locate and combine structurally stable cluster centers and/or prototypes. Random effects of the training set are reduced by repeatedly training on resampled sets (bootstrap samples). We discuss the algorithm both from a more theoretical and an applied point of view and demonstrate it on several data sets. (author's abstract) / Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
8	Ensemble learning metody pro vývoj skóringových modelů / Ensemble learning methods for scoring models development Nožička, Michal January 2018 (has links) Credit scoring is very important process in banking industry during which each potential or current client is assigned credit score that in certain way expresses client's probability of default, i.e. failing to meet his or her obligations on time or in full amount. This is a cornerstone of credit risk management in banking industry. Traditionally, statistical models (such as logistic regression model) are used for credit scoring in practice. Despite many advantages of such approach, recent research shows many alternatives that are in some ways superior to those traditional models. This master thesis is focused on introducing ensemble learning models (in particular constructed by using bagging, boosting and stacking algorithms) with various base models (in particular logistic regression, random forest, support vector machines and artificial neural network) as possible alternatives and challengers to traditional statistical models used for credit scoring and compares their advantages and disadvantages. Accuracy and predictive power of those scoring models is examined using standard measures of accuracy and predictive power in credit scoring field (in particular GINI coefficient and LIFT coefficient) on a real world dataset and obtained results are presented. The main result of this comparative study is that...
9	Boosting, Bagging, and Classification Analysis to Improve Noninvasive Liver Fibrosis Prediction in HCV/HIV Coinfected Subjects: An Analysis of the AIDS Clinical Trials Group (ACTG) 5178 Shire, Norah J. 03 April 2007 (has links) No description available. Coinfection Boosting and bagging Classification analysis HIV Viral hepatitis
10	Agrégation de modèles en apprentissage statistique pour l'estimation de la densité et la classification multiclasse / Aggregate statistical learning methods for density estimation and multiclass problems Bourel, Mathias 31 October 2013 (has links) Les méthodes d'agrégation en apprentissage statistique combinent plusieurs prédicteurs intermédiaires construits à partir du même jeu de données dans le but d'obtenir un prédicteur plus stable avec une meilleure performance. Celles-ci ont été amplement étudiées et ont données lieu à plusieurs travaux, théoriques et empiriques dans plusieurs contextes, supervisés et non supervisés. Dans ce travail nous nous intéressons dans un premier temps à l'apport de ces méthodes au problème de l'estimation de la densité. Nous proposons plusieurs estimateurs simples obtenus comme combinaisons linéaires d'histogrammes. La principale différence entre ceux-ci est quant à la nature de l'aléatoire introduite à chaque étape de l'agrégation. Nous comparons ces techniques à d'autres approches similaires et aux estimateurs classiques sur un choix varié de modèles, et nous démontrons les propriétés asymptotiques pour un de ces algorithmes (Random Averaged Shifted Histogram). Une seconde partie est consacrée aux extensions du Boosting pour le cas multiclasse. Nous proposons un nouvel algorithme (Adaboost.BG) qui fournit un classifieur final en se basant sur un calcul d'erreur qui prend en compte la marge individuelle de chaque modèle introduit dans l'agrégation. Nous comparons cette méthode à d'autres algorithmes sur plusieurs jeu de données artificiels classiques. / Ensemble methods in statistical learning combine several base learners built from the same data set in order to obtain a more stable predictor with better performance. Such methods have been extensively studied in the supervised context for regression and classification. In this work we consider the extension of these approaches to density estimation. We suggest several new algorithms in the same spirit as bagging and boosting. We show the efficiency of combined density estimators by extensive simulations. We give also the theoretical results for one of our algorithms (Random Averaged Shifted Histogram) by mean of asymptotical convergence under milmd conditions. A second part is devoted to the extensions of the Boosting algorithms for the multiclass case. We propose a new algorithm (Adaboost.BG) accounting for the margin of the base classifiers and show its efficiency by simulations and comparing it to the most used methods in this context on several datasets from the machine learning benchmark. Partial theoretical results are given for our algorithm, such as the exponential decrease of the learning set misclassification error to zero. Apprentissage Statistique Agrégation Bagging Boosting Histogramm Estimation de la densité Machine Learning Agregation Bagging Boosting Histogram Density estimation 510

Search results