Spelling suggestions: "subject:"cross validation"" "subject:"gross validation""
11 |
A Review of Cross Validation and Adaptive Model SelectionSyed, Ali R 27 April 2011 (has links)
We perform a review of model selection procedures, in particular various cross validation procedures and adaptive model selection. We cover important results for these procedures and explore the connections between different procedures and information criteria.
|
12 |
PREDICTIVE VALIDITY OF THE YOUTH LEVEL OF SERVICE/CASE MANAGEMENT INVENTORY AMONG JAPANESE JUVENILE OFFENDERSTakahashi, Masaru 01 December 2010 (has links)
The main purpose of the present study is to examine the predictive validity of the Youth Level of Service/Case Management Inventory (YLS/CMI) to Japanese juvenile offender population. Three hundred and eighty-nine juveniles who were released from the five Juvenile Classification Homes (JCHs) were followed for more than one year on average. Results demonstrate that those who show higher score on the YLS/CMI are more likely to recidivate than those who are not. A total score of the YLS/CMI also significantly contributes to predict faster time to recidivate. Furthermore, the superiority of actuarial risk measures over clinical risk judgment is confirmed. The overall findings support the applicability of the YLS/CMI among Japanese juvenile offenders. Practical implications and limitations to the current study are also discussed.
|
13 |
Cross-Language tweet classification using Bing TranslatorKrithivasan, Bhavani January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / Social media affects our daily lives. It is one of the first sources for finding breaking news. In particular, Twitter is one of the popular social media platforms, with around 330 million monthly users. From local events such as Fake Patty's Day to across the world happenings - Twitter gets there first. During a disaster, tweets can be used to post warnings, status of available medical and food supply, emergency personnel, and updates. Users were practically tweeting about the Hurricane Sandy, despite lack of network during the storm. Analysis of these tweets can help monitor the disaster, plan and manage the crisis, and aid in research.
In this research, we use the publicly available tweets posted during several disasters and identify the relevant tweets. As the languages in the datasets are different, Bing translation API has been used to detect and translate the tweets. The translations are then, used as training datasets for supervised machine learning algorithms. Supervised learning is the process of learning from a labeled training dataset. This learned classifier can then be used to predict the correct output for any valid input. When trained to more observations, the algorithm improves its predictive performance.
|
14 |
Trust Estimation of Real-Time Social Harm EventsPandey, Saurabh Pramod 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Social harm involves incidents resulting in physical, financial, and emotional hardships such as crime, drug overdoses and abuses, traffic accidents, and suicides. These incidents require various law-enforcement and emergency responding agencies to coordinate together for mitigating their impact on the society. With the advent of advanced networking and computing technologies together with data analytics, law-enforcement agencies and people in the community can work together to proactively reduce social harm. With the aim of effectively mitigating social harm events in communities, this thesis introduces a distributed web application, Community Data Analytic for Social Harm (CDASH). CDASH helps in collecting social harm data from heterogenous sources, analyzing the data for predicting social harm risks in the form of geographic hotspots and conveying the risks to law-enforcement agencies. Since various stakeholders including the police, community organizations and citizens can interact with CDASH, a need for a trust framework arises, to avoid fraudulent or mislabeled incidents from misleading CDASH. The enhanced system, called Trusted-CDASH (T-CDASH), superimposes a trust estimation framework on top of CDASH. This thesis discusses the importance and necessity of associating a degree of trust with each social harm incident reported to T-CDASH. It also describes the trust framework with different trust models that can be incorporated for assigning trust while examining their impact on prediction accuracy of future social harm events. The trust models are empirically validated by running simulations on historical social harm data of Indianapolis metro area.
|
15 |
A Penalized Approach to Mixed Model Selection Via Cross ValidationXiong, Jingwei 05 December 2017 (has links)
No description available.
|
16 |
Optimal estimation of head scan data with generalized cross validationFang, Haian January 1995 (has links)
No description available.
|
17 |
Case and covariate influence: implications for model assessmentDuncan, Kristin A. 12 October 2004 (has links)
No description available.
|
18 |
Bias reduction studies in nonparametric regression with applications : an empirical approach / Marike KrugellKrugell, Marike January 2014 (has links)
The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel
regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with crossvalidation
bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local
linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao
(2012). The di erent resulting regression estimates are evaluated by minimising a global discrepancy measure,
i.e. the mean integrated squared error (MISE).
In the machine learning context various improvement methods, in terms of the precision and accuracy of an
estimator, exist. The rst two improvement methods introduced in this study are bootstrapped based. Bagging
is an acronym for bootstrap aggregating and was introduced by Breiman (1996a) from a machine learning
viewpoint and by Swanepoel (1988, 1990) in a functional context. Bagging is primarily a variance reduction
tool, i.e. bagging is implemented to reduce the variance of an estimator and in this way improve the precision of
the estimation process. Bagging is performed by drawing repetitive bootstrap samples from the original sample
and generating multiple versions of an estimator. These replicates of the estimator are then used to obtain an
aggregated estimator. Bragging stands for bootstrap robust aggregating. A robust estimator is obtained by
using the sample median over the B bootstrap estimates instead of the sample mean as in bagging.
The third improvement method aims to reduce the bias component of the estimator and is referred to as boosting.
Boosting is a general method for improving the accuracy of any given learning algorithm. The method starts
of with a sensible estimator and improves iteratively, based on its performance on a training dataset.
Results and conclusions verifying existing literature are provided, as well as new results for the new methods. / MSc (Statistics), North-West University, Potchefstroom Campus, 2015
|
19 |
Bias reduction studies in nonparametric regression with applications : an empirical approach / Marike KrugellKrugell, Marike January 2014 (has links)
The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel
regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with crossvalidation
bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local
linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao
(2012). The di erent resulting regression estimates are evaluated by minimising a global discrepancy measure,
i.e. the mean integrated squared error (MISE).
In the machine learning context various improvement methods, in terms of the precision and accuracy of an
estimator, exist. The rst two improvement methods introduced in this study are bootstrapped based. Bagging
is an acronym for bootstrap aggregating and was introduced by Breiman (1996a) from a machine learning
viewpoint and by Swanepoel (1988, 1990) in a functional context. Bagging is primarily a variance reduction
tool, i.e. bagging is implemented to reduce the variance of an estimator and in this way improve the precision of
the estimation process. Bagging is performed by drawing repetitive bootstrap samples from the original sample
and generating multiple versions of an estimator. These replicates of the estimator are then used to obtain an
aggregated estimator. Bragging stands for bootstrap robust aggregating. A robust estimator is obtained by
using the sample median over the B bootstrap estimates instead of the sample mean as in bagging.
The third improvement method aims to reduce the bias component of the estimator and is referred to as boosting.
Boosting is a general method for improving the accuracy of any given learning algorithm. The method starts
of with a sensible estimator and improves iteratively, based on its performance on a training dataset.
Results and conclusions verifying existing literature are provided, as well as new results for the new methods. / MSc (Statistics), North-West University, Potchefstroom Campus, 2015
|
20 |
Systematic ensemble learning and extensions for regression / Méthodes d'ensemble systématiques et extensions en apprentissage automatique pour la régressionAldave, Roberto January 2015 (has links)
Abstract : The objective is to provide methods to improve the performance, or prediction accuracy of standard stacking approach, which is an ensemble method composed of simple, heterogeneous base models, through the integration of the diversity generation, combination and/or selection stages for regression problems. In Chapter 1, we propose to combine a set of level-1 learners into a level-2 learner, or ensemble. We also propose to inject a diversity generation mechanism into the initial cross-validation partition, from which new cross-validation partitions are generated, and sub-sequent ensembles are trained. Then, we propose an algorithm to select best partition, or corresponding ensemble. In Chapter 2, we formulate the partition selection as a Pareto-based multi-criteria optimization problem, as well as an algorithm to make the partition selection iterative with the aim to improve more the ensemble prediction accuracy. In Chapter 3, we propose to generate multiple populations or partitions by injecting a diversity mechanism to the original dataset. Then, an algorithm is proposed to select the best partition among all partitions generated by the multiple populations. All methods designed and implemented in this thesis get encouraging, and favorably results across different dataset against both state-of-the-art models, and ensembles for regression. / Résumé : L’objectif est de fournir des techniques permettant d’améliorer la performance de l’algorithme de stacking, une méthode ensembliste composée de modèles de base simples et hétérogènes, à travers l’intégration de la génération de la diversité, la sélection
et combinaison des modèles. Dans le chapitre 1, nous proposons de combiner différents sous-ensembles de modèles de base obtenus au primer niveau. Nous proposons
un mécanisme pour injecter de la diversité dans la partition croisée initiale, à partir de laquelle de nouvelles partitions de validation croisée sont générées, et les ensembles correspondant sont formés. Ensuite, nous proposons un algorithme pour sélectionner la meilleure partition. Dans le chapitre 2, nous formulons la sélection de la partition comme un problème d’optimisation multi-objectif fondé sur un principe de Pareto, ainsi que d’un algorithme pour faire une application itérative de la sélection avec l’objectif d’améliorer d’avantage la précision d’ensemble. Dans le chapitre 3, nous proposons de générer plusieurs populations en injectant un mécanisme de diversité à l’ensemble de données original. Ensuite, un algorithme est proposé pour sélectionner la meilleur partition entre toutes les partitions produite par les multiples populations. Nous avons obtenu des résultats encourageants avec ces algorithmes lors de comparaisons avec des modèles reconnus sur plusieurs bases de données.
|
Page generated in 0.1078 seconds