Spelling suggestions: "subject:"feature value byelection"" "subject:"feature value dielection""
1 |
Public Health Surveillance in High-Dimensions with Supervised LearningJanuary 2010 (has links)
abstract: Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods are typically limited to spatial and/or temporal covariate information and often fail to use all the information available in modern data that can be paramount in unveiling these subtle changes. Additional complexities associated with modern health data that are often not accounted for by traditional methods include: covariates of mixed type, missing values, and high-order interactions among covariates. This work proposes a transform of public health surveillance to supervised learning, so that an appropriate learner can inherently address all the complexities described previously. At the same time, quantitative measures from the learner can be used to define signal criteria to detect changes in rates of events. A Feature Selection (FS) method is used to identify covariates that contribute to a model and to generate a signal. A measure of statistical significance is included to control false alarms. An alternative Percentile method identifies the specific cases that lead to changes using class probability estimates from tree-based ensembles. This second method is intended to be less computationally intensive and significantly simpler to implement. Finally, a third method labeled Rule-Based Feature Value Selection (RBFVS) is proposed for identifying the specific regions in high-dimensional space where the changes are occurring. Results on simulated examples are used to compare the FS method and the Percentile method. Note this work emphasizes the application of the proposed methods on public health surveillance. Nonetheless, these methods can easily be extended to a variety of applications where counts (or rates) of events are monitored for changes. Such problems commonly occur in domains such as manufacturing, economics, environmental systems, engineering, as well as in public health. / Dissertation/Thesis / Ph.D. Industrial Engineering 2010
|
2 |
Modeling Frameworks for Supply Chain AnalyticsJanuary 2012 (has links)
abstract: Supply chains are increasingly complex as companies branch out into newer products and markets. In many cases, multiple products with moderate differences in performance and price compete for the same unit of demand. Simultaneous occurrences of multiple scenarios (competitive, disruptive, regulatory, economic, etc.), coupled with business decisions (pricing, product introduction, etc.) can drastically change demand structures within a short period of time. Furthermore, product obsolescence and cannibalization are real concerns due to short product life cycles. Analytical tools that can handle this complexity are important to quantify the impact of business scenarios/decisions on supply chain performance. Traditional analysis methods struggle in this environment of large, complex datasets with hundreds of features becoming the norm in supply chains. We present an empirical analysis framework termed Scenario Trees that provides a novel representation for impulse and delayed scenario events and a direction for modeling multivariate constrained responses. Amongst potential learners, supervised learners and feature extraction strategies based on tree-based ensembles are employed to extract the most impactful scenarios and predict their outcome on metrics at different product hierarchies. These models are able to provide accurate predictions in modeling environments characterized by incomplete datasets due to product substitution, missing values, outliers, redundant features, mixed variables and nonlinear interaction effects. Graphical model summaries are generated to aid model understanding. Models in complex environments benefit from feature selection methods that extract non-redundant feature subsets from the data. Additional model simplification can be achieved by extracting specific levels/values that contribute to variable importance. We propose and evaluate new analytical methods to address this problem of feature value selection and study their comparative performance using simulated datasets. We show that supply chain surveillance can be structured as a feature value selection problem. For situations such as new product introduction, a bottom-up approach to scenario analysis is designed using an agent-based simulation and data mining framework. This simulation engine envelopes utility theory, discrete choice models and diffusion theory and acts as a test bed for enacting different business scenarios. We demonstrate the use of machine learning algorithms to analyze scenarios and generate graphical summaries to aid decision making. / Dissertation/Thesis / Ph.D. Industrial Engineering 2012
|
Page generated in 0.1081 seconds