Global ETD Search

51	Stochastic Stepwise Ensembles for Variable Selection Xin, Lu 30 April 2009 (has links) Ensembles methods such as AdaBoost, Bagging and Random Forest have attracted much attention in the statistical learning community in the last 15 years. Zhu and Chipman (2006) proposed the idea of using ensembles for variable selection. Their implementation used a parallel genetic algorithm (PGA). In this thesis, I propose a stochastic stepwise ensemble for variable selection, which improves upon PGA. Traditional stepwise regression (Efroymson 1960) combines forward and backward selection. One step of forward selection is followed by one step of backward selection. In the forward step, each variable other than those already included is added to the current model, one at a time, and the one that can best improve the objective function is retained. In the backward step, each variable already included is deleted from the current model, one at a time, and the one that can best improve the objective function is discarded. The algorithm continues until no improvement can be made by either the forward or the backward step. Instead of adding or deleting one variable at a time, Stochastic Stepwise Algorithm (STST) adds or deletes a group of variables at a time, where the group size is randomly decided. In traditional stepwise, the group size is one and each candidate variable is assessed. When the group size is larger than one, as is often the case for STST, the total number of variable groups can be quite large. Instead of evaluating all possible groups, only a few randomly selected groups are assessed and the best one is chosen. From a methodological point of view, the improvement of STST ensemble over PGA is due to the use of a more structured way to construct the ensemble; this allows us to better control over the strength-diversity tradeoff established by Breiman (2001). In fact, there is no mechanism to control this fundamental tradeoff in PGA. Empirically, the improvement is most prominent when a true variable in the model has a relatively small coefficient (relative to other true variables). I show empirically that PGA has a much higher probability of missing that variable. Stochastic Stepwise Ensemble Parallel Genetic Algorithm Variable Selection statistical learning Statistics
52	Fundamental Limitations of Semi-Supervised Learning Lu, Tyler (Tian) 30 April 2009 (has links) The emergence of a new paradigm in machine learning known as semi-supervised learning (SSL) has seen benefits to many applications where labeled data is expensive to obtain. However, unlike supervised learning (SL), which enjoys a rich and deep theoretical foundation, semi-supervised learning, which uses additional unlabeled data for training, still remains a theoretical mystery lacking a sound fundamental understanding. The purpose of this research thesis is to take a first step towards bridging this theory-practice gap. We focus on investigating the inherent limitations of the benefits SSL can provide over SL. We develop a framework under which one can analyze the potential benefits, as measured by the sample complexity of SSL. Our framework is utopian in the sense that a SSL algorithm trains on a labeled sample and an unlabeled distribution, as opposed to an unlabeled sample in the usual SSL model. Thus, any lower bound on the sample complexity of SSL in this model implies lower bounds in the usual model. Roughly, our conclusion is that unless the learner is absolutely certain there is some non-trivial relationship between labels and the unlabeled distribution (``SSL type assumption''), SSL cannot provide significant advantages over SL. Technically speaking, we show that the sample complexity of SSL is no more than a constant factor better than SL for any unlabeled distribution, under a no-prior-knowledge setting (i.e. without SSL type assumptions). We prove that for the class of thresholds in the realizable setting the sample complexity of SL is at most twice that of SSL. Also, we prove that in the agnostic setting for the classes of thresholds and union of intervals the sample complexity of SL is at most a constant factor larger than that of SSL. We conjecture this to be a general phenomenon applying to any hypothesis class. We also discuss issues regarding SSL type assumptions, and in particular the popular cluster assumption. We give examples that show even in the most accommodating circumstances, learning under the cluster assumption can be hazardous and lead to prediction performance much worse than simply ignoring the unlabeled data and doing supervised learning. We conclude with a look into future research directions that build on our investigation. artificial intelligence machine learning semi-supervised learning statistical learning theory Computer Science
53	Secession and Survival: Nations, States and Violent Conflict Siroky, David S. January 2009 (has links) <p>Secession is a watershed event not only for the new state that is created and the old state that is dissolved, but also for neighboring states, proximate ethno-political groups and major powers. This project examines the problem of violent secessionist conflict and addresses an important debate at the intersection of comparative and international politics about the conditions under which secession is a peaceful solution to ethnic conflict. It demonstrates that secession is rarely a solution to ethnic conflict, does not assure the protection of remaining minorities and produces new forms of violence. To explain why some secessions produce peace, while others generate violence, the project develops a theoretical model of the conditions that produce internally coherent, stable and peaceful post-secessionist states rather than recursive secession (i.e., secession from a new secessionist state) or interstate disputes between the rump and secessionist state. Theoretically, the analysis reveals a curvilinear relationship between ethno-territorial heterogeneity and conflict, explains disparate findings in the literature on ethnic conflict and conclusively links ethnic structure and violence. The project also contributes to the literature on secessionist violence, and civil war more generally, by linking intrastate and interstate causes, showing that what is frequently thought of as a domestic phenomenon is in fact mostly a phenomenon of international politics. Drawing upon original data, methodological advances at the interface of statistics, computer science and probability theory, and qualitative methods such as elite interviews and archival research, the project offers a comprehensive, comparative and contextual treatment of secession and violence.</p> / Dissertation Political Science International Law Statistics Conflict Ethnicity Heterogeneity Secession Statistical Learning Theory Violence
54	Model-based Learning: t-Families, Variable Selection, and Parameter Estimation Andrews, Jeffrey Lambert 27 August 2012 (has links) The phrase model-based learning describes the use of mixture models in machine learning problems. This thesis focuses on a number of issues surrounding the use of mixture models in statistical learning tasks: including clustering, classification, discriminant analysis, variable selection, and parameter estimation. After motivating the importance of statistical learning via mixture models, five papers are presented. For ease of consumption, the papers are organized into three parts: mixtures of multivariate t-families, variable selection, and parameter estimation. / Natural Sciences and Engineering Research Council of Canada through a doctoral postgraduate scholarship. Computational Statistics Cluster Analysis Multivariate Statistics Classification Statistical Learning Mixture Models
55	Contributions to statistical learning and its applications in personalized medicine Valencia Arboleda, Carlos Felipe 16 May 2013 (has links) This dissertation, in general, is about finding stable solutions to statistical models with very large number of parameters and to analyze their asymptotic statistical properties. In particular, it is centered in the study of regularization methods based on penalized estimation. Those procedures find an estimator that is the result of an optimization problem balancing out the fitting to the data with the plausability of the estimation. The first chapter studies a smoothness regularization estimator for an infinite dimensional parameter in an exponential family model with functional predictors. We focused on the Reproducing Kernel Hilbert space approach and show that regardless the generality of the method, minimax optimal convergence rates are achieved. In order to derive the asymptotic analysis of the estimator, we developed a simultaneous diagonalization tool for two positive definite operators: the kernel operator and the operator defined by the second Frechet derivative of the expected data t functional. By using the proposed simultaneous diagonalization tool sharper bounds on the minimax rates are obtained. The second chapter studies the statistical properties of the method of regularization using Radial Basis Functions in the context of linear inverse problems. The regularization here serves two purposes, one is creating a stable solution for the inverse problem and the other is prevent the over-fitting on the nonparametric estimation of the functional target. Different degrees for the ill-posedness in the inversion of the operator A are considered: mildly and severely ill-posed. Also, we study different types for radial basis kernels classifieded by the strength of the penalization norm: Gaussian, Multiquadrics and Spline type of kernels. The third chapter deals with the problem of Individualized Treatment Rule (ITR) and analyzes the solution of it through Discriminant Analysis. In the ITR problem, the treatment assignment is done based on the particular patient's prognosis covariates in order to maximizes some reward function. Data generated from a random clinical trial is considered. Maximizing the empirical value function is an NP-hard computational problem. We consider estimating directly the decision rule by maximizing the expected value, using a surrogate function in order to make the optimization problem computationally feasible (convex programming). Necessary and sufficient conditions for Infinite Sample Consistency on the surrogate function are found for different scenarios: binary treatment selection, treatment selection with withholding and multi-treatment selection. Statistical learning Regularization Penalized estimation Individualized treatment rule Statistics Multivariate analysis Regression analysis
56	Inference Of Switching Networks By Using A Piecewise Linear Formulation Akcay, Didem 01 December 2005 (has links) (PDF) Inference of regulatory networks has received attention of researchers from many fields. The challenge offered by this problem is its being a typical modeling problem under insufficient information about the process. Hence, we need to derive the apriori unavailable information from the empirical observations. Modeling by inference consists of selecting or defining the most appropriate model structure and inferring the parameters. An appropriate model structure should have the following properties. The model parameters should be inferable. Given the observation and the model class, all parameters used in the model should have a unique solution restriction of the solution space). The forward model should be accurately computable (restriction of the solution space). The model should be capable of exhibiting the essential qualitative features of the system (limit of the restriction). The model should be relevant with the process (limit of the restriction). A piecewise linear formulation, described by a switching state transition matrix and a switching state transition vector with a Boolean function indicating the switching conditions is proposed for the inference of gene regulatory networks. This thesis mainly concerns using a formulation of switching networks obeying all the above mentioned requirements and developing an inference algorithm for estimating the parameters of the formulation. The methodologies used or developed during this study are applicable to various fields of science and engineering. QA General 15707
57	Mathematical Theories of Interaction with Oracles Yang, Liu 01 October 2013 (has links) No description available. Property Testing Active Learning Computational Learning Theory Learning DNF Statistical Learning Theory Transfer Learning
58	[en] A SPATIO-TEMPORAL MODEL FOR AVERAGE SPEED PREDICTION ON ROADS / [pt] UM MODELO ESPAÇO-TEMPORAL PARA A PREVISÃO DE VELOCIDADE MÉDIA EM ESTRADAS PEDRO HENRIQUE FONSECA DA SILVA DINIZ 06 June 2016 (has links) [pt] Muitos fatores podem in uenciar a velocidade de um veículo numa rodovia ou estrada, mas dois deles são observados diariamente pelos motoristas: sua localização e o momento do dia. Obter modelos que retornem a velocidade média como uma função do espaço e do tempo é ainda uma tarefa desafiadora. São muitas as aplicações para esses tipos de modelos, como por exemplo: tempo estimado de chegada, caminho mais curto e previsão de tráfico, deteccção de acidente, entre outros. Este estudo propõe um modelo de previsão baseado em uma média espaço-temporal da velocidade média/instantânea coletada de dados históricos de GPS. A grande vantagem do modelo proposto é a sua simplicidade. Além disso, os resultados experimentais obtidos de caminhões de entrega de combustíveis, por todo o ano de 2013 no Brasil, indicaram que a maioria das observações podem ser preditas usando esse modelo dentro de uma tolerância de erro aceitável. / [en] Many factors may inuence a vehicle speed in a road, but two of them are usually observed by many drivers: its location and the time of the day. To obtain a model that returns the average speed as a function of position and time is still a challenging task. The application of such models can be in different scenarios, such as: estimated time of arrival, shortest route paths, traffic prediction, and accident detection, just to cite a few. This study proposes a prediction model based on a spatio-temporal partition and mean/instantaneous speeds collected from historic GPS data. The main advantage of the proposed model is that it is very simple to compute. Moreover, experimental results obtained from fuel delivery trucks, along the whole year of 2013 in Brazil, indicate that most of the observations can be predicted using this model within an acceptable error tolerance. [pt] MODELAGEM ESPACO-TEMPORAL [en] SPATIOTEMPORAL MODELING [pt] APRENDIZADO ESTATISTICO [en] STATISTICAL LEARNING
59	Alternative Methods via Random Forest to Identify Interactions in a General Framework and Variable Importance in the Context of Value-Added Models January 2013 (has links) abstract: This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions. / Dissertation/Thesis / Ph.D. Statistics 2013 Statistics Data Mining Interactions Random Forest Statistical Learning Value Added Models Variable Importance
60	Data-driven identification of endophenotypes of Alzheimer’s disease progression: implications for clinical trials and therapeutic interventions Geifman, Nophar, Kennedy, Richard E., Schneider, Lon S., Buchan, Iain, Brinton, Roberta Diaz 15 January 2018 (has links) Background: Given the complex and progressive nature of Alzheimer's disease (AD), a precision medicine approach for diagnosis and treatment requires the identification of patient subgroups with biomedically distinct and actionable phenotype definitions. Methods: Longitudinal patient-level data for 1160 AD patients receiving placebo or no treatment with a follow-up of up to 18 months were extracted from an integrated clinical trials dataset. We used latent class mixed modelling (LCMM) to identify patient subgroups demonstrating distinct patterns of change over time in disease severity, as measured by the Alzheimer's Disease Assessment Scale-cognitive subscale score. The optimal number of subgroups (classes) was selected by the model which had the lowest Bayesian Information Criterion. Other patient-level variables were used to define these subgroups' distinguishing characteristics and to investigate the interactions between patient characteristics and patterns of disease progression. Results: The LCMM resulted in three distinct subgroups of patients, with 10.3% in Class 1, 76.5% in Class 2 and 13.2% in Class 3. While all classes demonstrated some degree of cognitive decline, each demonstrated a different pattern of change in cognitive scores, potentially reflecting different subtypes of AD patients. Class 1 represents rapid decliners with a steep decline in cognition over time, and who tended to be younger and better educated. Class 2 represents slow decliners, while Class 3 represents severely impaired slow decliners: patients with a similar rate of decline to Class 2 but with worse baseline cognitive scores. Class 2 demonstrated a significantly higher proportion of patients with a history of statins use; Class 3 showed lower levels of blood monocytes and serum calcium, and higher blood glucose levels. Conclusions: Our results, 'learned' from clinical data, indicate the existence of at least three subgroups of Alzheimer's patients, each demonstrating a different trajectory of disease progression. This hypothesis-generating approach has detected distinct AD subgroups that may prove to be discrete endophenotypes linked to specific aetiologies. These findings could enable stratification within a clinical trial or study context, which may help identify new targets for intervention and guide better care. Alzheimer's disease Precision medicine Endophenotypes Machine learning Statistical learning Latent class mixed models

Search results