Spelling suggestions: "subject:"applied estatistics"" "subject:"applied cstatistics""
271 |
A tale of two applications: closed-loop quality control for 3D printing, and multiple imputation and the bootstrap for the analysis of big data with missingnessWenbin Zhu (12226001) 20 April 2022 (has links)
<div><b>1. A Closed-Loop Machine Learning and Compensation Framework for Geometric Accuracy Control of 3D Printed Products</b></div><div><b><br></b></div>Additive manufacturing (AM) systems enable direct printing of three-dimensional (3D) physical products from computer-aided design (CAD) models. Despite the many advantages that AM systems have over traditional manufacturing, one of their significant limitations that impedes their wide adoption is geometric inaccuracies, or shape deviations between the printed product and the nominal CAD model. Machine learning for shape deviations can enable geometric accuracy control of 3D printed products via the generation of compensation plans, which are modifications of CAD models informed by the machine learning algorithm that reduce deviations in expectation. However, existing machine learning and compensation frameworks cannot accommodate deviations of fully 3D shapes with different geometries. The feasibility of existing frameworks for geometric accuracy control is further limited by resource constraints in AM systems that prevent the printing of multiple copies of new shapes.<div><br></div><div>We present a closed-loop machine learning and compensation framework that can improve geometric accuracy control of 3D shapes in AM systems. Our framework is based on a Bayesian extreme learning machine (BELM) architecture that leverages data and deviation models from previously printed products to transfer deviation models, and more accurately capture deviation patterns, for new 3D products. The closed-loop nature of compensation under our framework, in which past compensated products that do not adequately meet dimensional specifications are fed into the BELMs to re-learn the deviation model, enables the identification of effective compensation plans and satisfies resource constraints by printing only one new shape at a time. The power and cost-effectiveness of our framework are demonstrated with two validation experiments that involve different geometries for a Markforged Metal X AM machine printing 17-4 PH stainless steel products. As demonstrated in our case studies, our framework can reduce shape inaccuracies by 30% to 60% (depending on a shape's geometric complexity) in at most two iterations, with three training shapes and one or two test shapes for a specific geometry involved across the iterations. We also perform an additional validation experiment using a third geometry to establish the capabilities of our framework for prospective shape deviation prediction of 3D shapes that have never been printed before. This third experiment indicates that choosing one suitable class of past products for prospective prediction and model transfer, instead of including all past printed products with different geometries, could be sufficient for obtaining deviation models with good predictive performance. Ultimately, our closed-loop machine learning and compensation framework provides an important step towards accurate and cost-efficient deviation modeling and compensation for fully 3D printed products using a minimal number of printed training and test shapes, and thereby can advance AM as a high-quality manufacturing paradigm.<br></div><div><br></div><div><b>2. Multiple Imputation and the Bootstrap for the Analysis of Big Data with Missingness</b></div><div><br></div><div>Inference can be a challenging task for Big Data. Two significant issues are that Big Data frequently exhibit complicated missing data patterns, and that the complex statistical models and machine learning algorithms typically used to analyze Big Data do not have convenient quantification of uncertainties for estimators. These two difficulties have previously been addressed using multiple imputation and the bootstrap, respectively. However, it is not clear how multiple imputation and bootstrap procedures can be effectively combined to perform statistical inferences on Big Data with missing values. We investigate a practical framework for the combination of multiple imputation and bootstrap methods. Our framework is based on two principles: distribution of multiple imputation and bootstrap calculations across parallel computational cores, and the quantification of sources of variability involved in bootstrap procedures that use subsampling techniques via random effects or hierarchical models. This framework effectively extends the scope of existing methods for multiple imputation and the bootstrap to a broad range of Big Data settings. We perform simulation studies for linear and logistic regression across Big Data settings with different rates of missingness to characterize the frequentist properties and computational efficiencies of the combinations of multiple imputation and the bootstrap. We further illustrate how effective combinations of multiple imputation and the bootstrap for Big Data analyses can be identified in practice by means of both the simulation studies and a case study on COVID infection status data. Ultimately, our investigation demonstrates how the flexible combination of multiple imputation and the bootstrap under our framework can enable valid statistical inferences in an effective manner for Big Data with missingness.<br></div>
|
272 |
Statistical practice in preclinical neurosciences: Implications for successful translation of research evidence from humans to animalsHogue, Olivia 23 May 2022 (has links)
No description available.
|
273 |
The Structural Basis for the Interdependence of Drug Resistance in the HIV-1 ProteaseRagland, Debra A. 13 December 2016 (has links)
The human immunodeficiency virus type 1 (HIV-1) protease (PR) is a critical drug target as it is responsible for virion maturation. Mutations within the active site (1°) of the PR directly interfere with inhibitor binding while mutations distal to the active site (2°) to restore enzymatic fitness. Increasing mutation number is not directly proportional to the severity of resistance, suggesting that resistance is not simply additive but that it is interdependent. The interdependency of both primary and secondary mutations to drive protease inhibitor (PI) resistance is grossly understudied.
To structurally and dynamically characterize the direct role of secondary mutations in drug resistance, I selected a panel of single-site mutant protease crystal structures complexed with the PI darunavir (DRV). From these studies, I developed a network hypothesis that explains how mutations outside the active site are able to perpetuate changes to the active site of the protease to disrupt inhibitor binding.
I then expanded the panel to include highly mutated multi-drug resistant variants. To elucidate the interdependency between primary and secondary mutations I used statistical and machine-learning techniques to determine which specific mutations underlie the perturbations of key inter-molecular interactions. From these studies, I have determined that mutations distal to the active site are able to perturb the global PR hydrogen bonding patterns, while primary and secondary mutations cooperatively perturb hydrophobic contacts between the PR and DRV. Discerning and exploiting the mechanisms that underlie drug resistance in viral targets could proactively ameliorate both current treatment and inhibitor design for HIV-1 targets.
|
274 |
SVD-BAYES: A SINGULAR VALUE DECOMPOSITION-BASED APPROACH UNDER BAYESIAN FRAMEWORK FOR INDIRECT ESTIMATION OF AGE-SPECIFIC FERTILITY AND MORTALITYChu, Yue January 2020 (has links)
No description available.
|
275 |
The Development of the Fundamental Concepts in Applied Statistics Test and Validation of Its UseMauck, Susan Anderson 21 June 2019 (has links)
No description available.
|
276 |
Spatial and Temporal Correlations of Freeway Link Speeds: An Empirical StudyRachtan, Piotr J 01 January 2012 (has links) (PDF)
Congestion on roadways and high level of uncertainty of traffic conditions are major considerations for trip planning. The purpose of this research is to investigate the characteristics and patterns of spatial and temporal correlations and also to detect other variables that affect correlation in a freeway setting. 5-minute speed aggregates from the Performance Measurement System (PeMS) database are obtained for two directions of an urban freeway – I-10 between Santa Monica and Los Angeles, California. Observations are for all non-holiday weekdays between January 1st and June 30th, 2010. Other variables include traffic flow, ramp locations, number of lanes and the level of congestion at each detector station. A weighted least squares multilinear regression model is fitted to the data; the dependent variable is Fisher Z transform of correlation coefficient.
Estimated coefficients of the general regression model indicate that increasing spatial and temporal distances reduces correlations. The positive parameters of spatial and temporal distance interaction term show that the reduction rate diminishes with spatial or temporal distance. Higher congestion tends to retain higher expected value of correlation; corrections to the model due to variations in road geometry tend to be minor. The general model provides a framework for building a family of more responsive and better-fitting models for a 6.5 mile segment of the freeway during three times of day: morning, midday, and afternoon.
Each model is cross-validated on two locations: the opposite direction of the freeway, and a different location on the direction used for estimation. Cross-validation results show that models are able to retain 75% or more of their original predictive capability on independent samples. Incorporation of predictor variables that describe road geometry and traffic conditions into the model works beneficially in capturing a significant portion of variance of the response. The developed regression models are thus transferrable and are apt to predict correlation on other freeway locations.
|
277 |
Defining viable solar resource locations in the Southeast United States using the satellite-based GLASS productKavanagh, Jolie 09 August 2022 (has links) (PDF)
This research uses satellite data and the moment statistics to determine if solar farms can be placed in the Southeast US. From 2001-2019, the data are analyzed in reference to the Southwest US, where solar farms are located. The clean energy need is becoming more common; therefore, more locations than arid environments must be observed. The Southeast US is the main location of interest due to the warm, moist environment throughout the year. This research uses the Global Land Surface Satellite (GLASS) photosynthetically active radiation product (PAR) to determine viable locations for solar panels. A probability density function (PDF) along with the moment statistics are utilized to determine statistic thresholds from solar farms in the Southwest US. For the Southeast US, three major locations were determined to be a viable option: Mississippi Delta, Northwest Florida, and Southwestern Alabama. This research shows that solar farms can be efficient in areas with more convective cloud cover, such as the Southeast US.
|
278 |
Impact of climate oscillations/indices on hydrological variables in the Mississippi River Valley Alluvial Aquifer.Raju, Meena 13 May 2022 (has links) (PDF)
The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis were used to evaluate the relationship between climate indices and hydrological variables and the combined effect of climate indices with hydrological variables. The results of the trend analysis indicated spatial variability within the aquifer, increase in streamflow and rainfall in the Northern region of the aquifer, while a decrease was observed in the southern region of the aquifer. Change point analysis of annual maximum, annual mean streamflow and annual precipitation revealed that statistically decreasing shifts occurred in 2001, 1998 and 1995, respectively. Results of PCC analysis indicated that streamflow and rainfall has a strong positive relationship between them with PCC values more than 0.6 in most of the locations within the basin. Results of the streamflow elasticity for the locations ranged from 0.987 to 2.33 for the various locations in the basin. Results of the PCC analysis for monthly maximum and mean streamflow showed significant maximum positive correlation coefficient for Nino 3.4. Monthly maximum rainfall showed a maximum significant positive correlation coefficient for PNA and Nino3.4 and the monthly mean rainfall showed a maximum significant positive correlation coefficient of 0.18 for Nino3.4. Results of the MLR analysis showed a maximum significant positive correlation coefficient of 0.31 for monthly maximum and mean streamflow of 0.21 and 0.23 for monthly maximum and mean rainfall, respectively. Overall, results from this research will help in understanding the impacts of global climate indices on rainfall and subsequently on streamflow discharge, so as to mitigate and manage water resource availability in the MRVAA underlying the LMRB.
|
279 |
Evaluating soil health changes following cover crop and no-till integration into a soybean (Glycine max) cropping system in the Mississippi Alluvial ValleyFirth, Alexandra Gwin 13 May 2022 (has links)
The transition of natural landscapes to intensive agricultural uses has resulted in severe loss of soil organic carbon (SOC), increased CO₂ emissions, river depletion, and groundwater overdraft. Despite negative documented effects of agricultural land use (i.e., soil erosion, nutrient runoff) on critical natural resources (i.e., water, soil), food production must increase to meet the demands of a rising human population. Given the environmental and agricultural productivity concerns of intensely managed soils, it is critical to implement conservation practices that mitigate the negative effects of crop production and enhance environmental integrity. In the Mississippi Alluvial Valley (MAV) region of Mississippi, USA, the adoption of cover crop (CC) and no-tillage (NT) management practices has been low because of a lack of research specific to the regional nuances. Therefore, this study assessed the long-term soil physiochemical and biological responses from integrating CC and NT management to agricultural soils of the region. Research plots were established in a split-block design with two tillage treatments: NT and reduced tillage (RT) and three CC treatments: no cover (NC), rye (RY) and a rye+clover (RC) mix. Soil samples were taken during the growing season of 2019 and 2020. Bulk density was found to be significantly lower in NT plots and aggregate stability was greatest in plots with a single CC species. Moisture retention increased in NT.. Soil organic carbon was greater in NT and CC treatments and there was no difference in CO₂ flux. Bacterial abundance had a positive effect on SOC but a negative effect on CO₂. The rate of proportional change and pattern of variability in C pools suggested loss of SOC in reduced tillage (RT) treatments. Microbial abundance, functional genes and enzyme activity was greater in NT with CC, but diversity was greater in RT. No-tillage practices lower diversity and influence long-term community changes while CC practices enact a seasonal response to environmental conditions. I conclude that in heavy clay soils of the mid-South region of the MAV, RT with a CC is optimal for soil health traits associated with crop sustainability, however the management will still contribute to increased CO₂ emissions.
|
280 |
Expeditious Causal Inference for Big Observational DataYumin Zhang (13163253) 28 July 2022 (has links)
<p>This dissertation address two significant challenges in the causal inference workflow for Big Observational Data. The first is designing Big Observational Data with high-dimensional and heterogeneous covariates. The second is performing uncertainty quantification for estimates of causal estimands that are obtained from the application of black box machine learning algorithms on the designed Big Observational Data. The methodologies developed by addressing these challenges are applied for the design and analysis of Big Observational Data from a large public university in the United States. </p>
<h4>Distributed Design</h4>
<p>A fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the study prior to analysis. The design ensures that subjects in the different treatment groups that have comparable covariates are subclassified or matched together. Analyzing such a designed study helps to reduce biases arising from the confounding of covariates with treatment. Existing design methods, developed for traditional observational studies consisting of a single designer, can yield unsatisfactory designs with sub-optimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned covariates and the summaries they receive. The final design is selected by comparing balance measures for all covariates across the candidates and identifying the best amongst the candidates. We perform simulation studies and analyze datasets from the 2016 Atlantic Causal Inference Conference Data Challenge to demonstrate the flexibility and power of our framework for constructing designs with good covariate balance from Big Observational Data.</p>
<h4>Designed Bootstrap</h4>
<p>The combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the observational study prior to the application of machine learning algorithms. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational study. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life case study.</p>
<h4>Case Study</h4>
<p>We apply the distributed design and designed bootstrap methodologies in a case study involving institutional data from a large public university in the United States. The institutional data contains comprehensive information about the undergraduate students in the university, ranging from their academic records to on-campus activities. We study the causal effects of undergraduate students’ attempted course load on their academic performance based on a selection of covariates from these data. Ultimately, our real-life case study demonstrates how our methodologies enable researchers to effectively use straightforward design procedures to obtain valid causal inferences with reduced computational efforts from the application of machine learning algorithms on Big Observational Data.</p>
<p><br></p>
|
Page generated in 0.0521 seconds