241 |
The development of authentic virtual reality scenarios to measure individuals’ level of systems thinking skills and learning abilitiesDayarathna, Vidanelage L. 10 December 2021 (has links) (PDF)
This dissertation develops virtual reality modules to capture individuals’ learning abilities and systems thinking skills in dynamic environments. In the first chapter, an immersive queuing theory teaching module is developed using virtual reality technology. The objective of the study is to present systems engineering concepts in a more sophisticated environment and measure students learning abilities. Furthermore, the study explores the performance gaps between male and female students in manufacturing systems concepts. To investigate the gender biases toward the performance of developed VR module, three efficacy measures (simulation sickness questionnaire, systems usability scale, and presence questionnaire) and two effectiveness measures (NASA TLX assessment and post-motivation questionnaire) were used. The second and third chapter aims to assess individuals’ systems thinking skills when they engage in complex multidimensional problems. A modern complex system comprises many interrelated subsystems and various dynamic attributes. Understanding and handling large complex problems requires holistic critical thinkers in modern workplaces. Systems Thinking (ST) is an interdisciplinary domain that offers different ways to better understand the behavior and structure of a complex system. The developed scenario-based instrument measures students’ cognitive tendency for complexity, change, and interaction when making decisions in a turbulent environment. The proposed complex systems scenarios are developed based on an established systems thinking instrument that can measure important aspects of systems thinking skills. The systems scenarios are built in a virtual environment that facilitate students to react to real-world situations and make decisions. The construct validity of the VR scenarios is assessed by comparing the high systematic scores between ST instrument and developed VR scenarios. Furthermore, the efficacy of the VR scenarios is investigated using the simulation sickness questionnaire, systems usability scale, presence questionnaire, and NASA TLX assessment.
|
242 |
Toward a Theory of Auto-modelingYiran Jiang (16632711) 25 July 2023 (has links)
<p>Statistical modeling aims at constructing a mathematical model for an existing data set. As a comprehensive concept, statistical modeling leads to a wide range of interesting problems. Modern parametric models, such as deepnets, have achieved remarkable success in quite a few application areas with massive data. Although being powerful in practice, many fitted over-parameterized models potentially suffer from losing good statistical properties. For this reason, a new framework named the Auto-modeling (AM) framework is proposed. Philosophically, the mindset is to fit models to future observations rather than the observed sample. Technically, choosing an imputation model for generating future observations, we fit models to future observations via optimizing an approximation to the desired expected loss function based on its sample counterpart and what we call an adaptive {\it duality function}.</p>
<p><br></p>
<p>The first part of the dissertation (Chapter 2 to 7) focuses on the new philosophical perspective of the method, as well as the details of the main framework. Technical details, including essential theoretical properties of the method are also investigated. We also demonstrate the superior performance of the proposed method via three applications: Many-normal-means problem, $n < p$ linear regression and image classification.</p>
<p><br></p>
<p>The second part of the dissertation (Chapter 8) focuses on the application of the AM framework to the construction of linear regression models. Our primary objective is to shed light on the stability issue associated with the commonly used data-driven model selection methods such as cross-validation (CV). Furthermore, we highlight the philosophical distinctions between CV and AM. Theoretical properties and numerical examples presented in the study demonstrate the potential and promise of AM-based linear model selection. Additionally, we have devised a conformal prediction method specifically tailored for quantifying the uncertainty of AM predictions in the context of linear regression.</p>
|
243 |
Incorporating Shear Resistance Into Debris Flow Triggering Model StatisticsLyman, Noah J 01 December 2020 (has links) (PDF)
Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and implementation into new areas.
Generally, existing models rely on inputs as a function of rainfall intensity, fire effects, terrain type, and surface characteristics. However, no variable in these models directly accounts for the shear stiffness of the soil. This property when considered with the respect to the state of the loading of the sediment informs the likelihood of particle dislocation, contractive or dilative volume changes, and downslope movement that triggers debris flows. This study proposes incorporating shear wave velocity (in the form of slope-based thirty-meter shear wave velocity, Vs30) to account for this shear stiffness. As commonly used in seismic soil liquefaction analysis, the shear stiffness is measured via shear wave velocity which is the speed of the vertically propagating horizontal shear wave through sediment. This spatially mapped variable allows for broad coverage in the watersheds of interest. A logistic regression is used to then compare the new variable against what is currently used in predictive post-fire debris flow triggering models.
Resulting models indicated improvement in some measures of statistical utility through receiver operating characteristic curves (ROC) and threat score analysis, a method of ranking models based on true/false positive and negative results. However, the integration of Vs30 offers similar utility to current models in additional metrics, suggesting that this input can benefit from further refinement. Further suggestions are additionally offered to further improve the use of Vs30 through in-situ measurements of surface shear wave propagation and integration into Vs30 datasets through a possible transfer function. Additional discussion into input variables and their impact on created models is also included.
|
244 |
Regression Analysis for Ordinal Outcomes in Matched Study Design: Applications to Alzheimer's Disease StudiesAustin, Elizabeth 09 July 2018 (has links) (PDF)
Alzheimer's Disease (AD) affects nearly 5.4 million Americans as of 2016 and is the most common form of dementia. The disease is characterized by the presence of neurofibrillary tangles and amyloid plaques [1]. The amount of plaques are measured by Braak stage, post-mortem. It is known that AD is positively associated with hypercholesterolemia [16]. As statins are the most widely used cholesterol-lowering drug, there may be associations between statin use and AD. We hypothesize that those who use statins, specifically lipophilic statins, are more likely to have a low Braak stage in post-mortem analysis.
In order to address this hypothesis, we wished to fit a regression model for ordinal outcomes (e.g., high, moderate, or low Braak stage) using data collected from the National Alzheimer's Coordinating Center (NACC) autopsy cohort. As the outcomes were matched on the length of follow-up, a conditional likelihood-based method is often used to estimate the regression coefficients. However, it can be challenging to solve the conditional-likelihood based estimating equation numerically, especially when there are many matching strata. Given that the likelihood of a conditional logistic regression model is equivalent to the partial likelihood from a stratified Cox proportional hazard model, the existing R function for a Cox model, coxph( ), can be used for estimation of a conditional logistic regression model. We would like to investigate whether this strategy could be extended to a regression model for ordinal outcomes.
More specifically, our aims are to (1) demonstrate the equivalence between the exact partial likelihood of a stratified discrete time Cox proportional hazards model and the likelihood of a conditional logistic regression model, (2) prove equivalence, or lack there-of, between the exact partial likelihood of a stratified discrete time Cox proportional hazards model and the conditional likelihood of models appropriate for multiple ordinal outcomes: an adjacent categories model, a continuation-ratio model, and a cumulative logit model, and (3) clarify how to set up stratified discrete time Cox proportional hazards model for multiple ordinal outcomes with matching using the existing coxph( ) R function and interpret the regression coefficient estimates that result. We verified this theoretical proof through simulation studies. We simulated data from the three models of interest: an adjacent categories model, a continuation-ratio model, and a cumulative logit model. We fit a Cox model using the existing coxph( ) R function to the simulated data produced by each model. We then compared the coefficient estimates obtained. Lastly, we fit a Cox model to the NACC dataset. We used Braak stage as the outcome variables, having three ordinal categories. We included predictors for age at death, sex, genotype, education, comorbidities, number of days having taken lipophilic statins, number of days having taken hydrophilic statins, and time to death. We matched cases to controls on the length of follow up. We have discussed all findings and their implications in detail.
|
245 |
Proteomics and Machine Learning for Pulmonary Embolism Risk with Protein MarkersAwuah, Yaa Amankwah 01 December 2023 (has links) (PDF)
This thesis investigates protein markers linked to pulmonary embolism risk using proteomics and statistical methods, employing unsupervised and supervised machine learning techniques. The research analyzes existing datasets, identifies significant features, and observes gender differences through MANOVA. Principal Component Analysis reduces variables from 378 to 59, and Random Forest achieves 70% accuracy. These findings contribute to our understanding of pulmonary embolism and may lead to diagnostic biomarkers. MANOVA reveals significant gender differences, and applying proteomics holds promise for clinical practice and research.
|
246 |
GENERAL-PURPOSE STATISTICAL INFERENCE WITH DIFFERENTIAL PRIVACY GUARANTEESZhanyu Wang (13893375) 06 December 2023 (has links)
<p dir="ltr">Differential privacy (DP) uses a probabilistic framework to measure the level of privacy protection of a mechanism that releases data analysis results to the public. Although DP is widely used by both government and industry, there is still a lack of research on statistical inference under DP guarantees. On the one hand, existing DP mechanisms mainly aim to extract dataset-level information instead of population-level information. On the other hand, DP mechanisms introduce calibrated noises into the released statistics, which often results in sampling distributions more complex and intractable than the non-private ones. This dissertation aims to provide general-purpose methods for statistical inference, such as confidence intervals (CIs) and hypothesis tests (HTs), that satisfy the DP guarantees. </p><p dir="ltr">In the first part of the dissertation, we examine a DP bootstrap procedure that releases multiple private bootstrap estimates to construct DP CIs. We present new DP guarantees for this procedure and propose to use deconvolution with DP bootstrap estimates to derive CIs for inference tasks such as population mean, logistic regression, and quantile regression. Our method achieves the nominal coverage level in both simulations and real-world experiments and offers the first approach to private inference for quantile regression.</p><p dir="ltr">In the second part of the dissertation, we propose to use the simulation-based ``repro sample'' approach to produce CIs and HTs based on DP statistics. Our methodology has finite-sample guarantees and can be applied to a wide variety of private inference problems. It appropriately accounts for biases introduced by DP mechanisms (such as by clamping) and improves over other state-of-the-art inference methods in terms of the coverage and type I error of the private inference. </p><p dir="ltr">In the third part of the dissertation, we design a debiased parametric bootstrap framework for DP statistical inference. We propose the adaptive indirect estimator, a novel simulation-based estimator that is consistent and corrects the clamping bias in the DP mechanisms. We also prove that our estimator has the optimal asymptotic variance among all well-behaved consistent estimators, and the parametric bootstrap results based on our estimator are consistent. Simulation studies show that our framework produces valid DP CIs and HTs in finite sample settings, and it is more efficient than other state-of-the-art methods.</p>
|
247 |
Fitting Statistical Models with Multiphase Mean Structures for Longitudinal DataBishop, Brenden 13 August 2015 (has links)
No description available.
|
248 |
Using Oxygen Depletion and Chlorophyll-a as Proxies for Estimates of Cyanobacteria Blooms to Create Predictive Lake Erie Hazardous Algae Bloom ModelsJaffee, Brian Alexander 23 July 2015 (has links)
No description available.
|
249 |
Statistical Improvements for Ecological Learning about Spatial ProcessesDupont, Gaetan L 20 October 2021 (has links) (PDF)
Ecological inquiry is rooted fundamentally in understanding population abundance, both to develop theory and improve conservation outcomes. Despite this importance, estimating abundance is difficult due to the imperfect detection of individuals in a sample population. Further, accounting for space can provide more biologically realistic inference, shifting the focus from abundance to density and encouraging the exploration of spatial processes. To address these challenges, Spatial Capture-Recapture (“SCR”) has emerged as the most prominent method for estimating density reliably. The SCR model is conceptually straightforward: it combines a spatial model of detection with a point process model of the spatial distribution of individuals, using data collected on individuals within a spatially referenced sampling design. These data are often coarse in spatial and temporal resolution, though, motivating research into improving the quality of the data available for analysis. Here I explore two related approaches to improve inference from SCR: sampling design and data integration. Chapter 1 describes the context of this thesis in more detail. Chapter 2 presents a framework to improve sampling design for SCR through the development of an algorithmic optimization approach. Compared to pre-existing recommendations, these optimized designs perform just as well but with far more flexibility to account for available resources and challenging sampling scenarios. Chapter 3 presents one of the first methods of integrating an explicit movement model into the SCR model using telemetry data, which provides information at a much finer spatial scale. The integrated model shows significant improvements over the standard model to achieve a specific inferential objective, in this case: the estimation of landscape connectivity. In Chapter 4, I close by providing two broader conclusions about developing statistical methods for ecological inference. First, simulation-based evaluation is integral to this process, but the circularity of its use can, unfortunately, be understated. Second, and often underappreciated: statistical solutions should be as intuitive as possible to facilitate their adoption by a diverse pool of potential users. These novel approaches to sampling design and data integration represent essential steps in advancing SCR and offer intuitive opportunities to advance ecological learning about spatial processes.
|
250 |
Explorations into Machine Learning Techniques for Precipitation NowcastingNagarajan, Aditya 24 March 2017 (has links) (PDF)
Recent advances in cloud-based big-data technologies now makes data driven solutions feasible for increasing numbers of scientific computing applications. One such data driven solution approach is machine learning where patterns in large data sets are brought to the surface by finding complex mathematical relationships within the data. Nowcasting or short-term prediction of rainfall in a given region is an important problem in meteorology. In this thesis we explore the nowcasting problem through a data driven approach by formulating it as a machine learning problem.
State-of-the-art nowcasting systems today are based on numerical models which describe the physical processes leading to precipitation or on weather radar extrapolation techniques that predict future radar precipitation maps by advecting from a sequence of past maps. These techniques, while they can perform well over very short prediction horizons (minutes) or very long horizons (hours to days), tend not to perform well over medium horizons (1-2 hours) due to lack of input data at the necessary spatial and temporal scales for the numerical prediction methods or due to the inability of radar extrapolation methods to predict storm growth and decay. Given that water must first concentrate in the atmosphere as water vapor before it can fall to the ground as rain, one goal of this thesis is to understand if water vapor information can improve radar extrapolation techniques by giving the information needed to infer growth and decay. To do so, we use the GPS-Meteorology technique to measure the water vapor in the atmosphere and weather radar reflectivity to measure rainfall. By training a machine learning nowcasting algorithm using both variables and comparing its performance against a nowcasting algorithm trained on reflectivity alone, we draw conclusions as to the predictive power of adding water vapor information.
Another goal of this thesis is to compare different machine learning techniques, viz., the random forest ensemble learning technique, which has shown success on a number of other weather prediction problems, and the current state-of-the-art machine learning technique for images and image sequences, convolutional neural network (CNN). We compare these in terms of problem representation, training complexity, and nowcasting performance.
A final goal is to compare the nowcasting performance of our machine learning techniques against published results for current state-of-the-art model based nowcasting techniques.
|
Page generated in 0.0751 seconds