• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 3
  • Tagged with
  • 12
  • 12
  • 6
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Contributions to Structured Variable Selection Towards Enhancing Model Interpretation and Computation Efficiency

Shen, Sumin 07 February 2020 (has links)
The advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques, such as the best subset selection and the Lasso, often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. Specifically, this thesis proposal consists of three parts: an additive heredity model with coefficients incorporating the multi-level data, a regularized dynamic generalized linear model with piecewise constant functional coefficients, and a structured variable selection method within the best subset selection framework. In Chapter 2, an additive heredity model is proposed for analyzing mixture-of-mixtures (MoM) experiments. The MoM experiment is different from the classical mixture experiment in that the mixture component in MoM experiments, known as the major component, is made up of sub-components, known as the minor components. The proposed model considers an additive structure to inherently connect the major components with the minor components. To enable a meaningful interpretation for the estimated model, we apply the hierarchical and heredity principles by using the nonnegative garrote technique for model selection. The performance of the additive heredity model was compared to several conventional methods in both unconstrained and constrained MoM experiments. The additive heredity model was then successfully applied in a real problem of optimizing the Pringlestextsuperscript{textregistered} potato crisp studied previously in the literature. In Chapter 3, we consider the dynamic effects of variables in the generalized linear model such as logistic regression. This work is motivated from the engineering problem with varying effects of process variables to product quality caused by equipment degradation. To address such challenge, we propose a penalized dynamic regression model which is flexible to estimate the dynamic coefficient structure. The proposed method considers modeling the functional coefficient parameter as piecewise constant functions. Specifically, under the penalized regression framework, the fused lasso penalty is adopted for detecting the changes in the dynamic coefficients. The group lasso penalty is applied to enable a sparse selection of variables. Moreover, an efficient parameter estimation algorithm is also developed based on alternating direction method of multipliers. The performance of the dynamic coefficient model is evaluated in numerical studies and three real-data examples. In Chapter 4, we develop a structured variable selection method within the best subset selection framework. In the literature, many techniques within the LASSO framework have been developed to address structured variable selection issues. However, less attention has been spent on structured best subset selection problems. In this work, we propose a sparse Ridge regression method to address structured variable selection issues. The key idea of the proposed method is to re-construct the regression matrix in the angle of experimental designs. We employ the estimation-maximization algorithm to formulate the best subset selection problem as an iterative linear integer optimization (LIO) problem. the mixed integer optimization algorithm as the selection step. We demonstrate the power of the proposed method in various structured variable selection problems. Moverover, the proposed method can be extended to the ridge penalized best subset selection problems. The performance of the proposed method is evaluated in numerical studies. / Doctor of Philosophy / The advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. The proposed approaches have been applied to real-world problems to demonstrate their model performance.
12

<b>MODEL BASED TRANSFER LEARNING ACROSS NANOMANUFACTURING PROCESSES AND BAYESIAN OPTIMIZATION FOR ADVANCED MODELING OF MIXTURE DATA</b>

Yueyun Zhang (18183583) 24 June 2024 (has links)
<p dir="ltr">Broadly, the focus of this work is on efficient statistical estimation and optimization of data arising from experimental data, particularly motivated by nanomanufacturing experiments on the material tellurene. Tellurene is a novel material for transistors with reliable attributes that enhance the performance of electronics (e.g., nanochip). As a solution-grown product, two-dimensional (2D) tellurene can be manufactured through a scalable process at a low cost. There are three main throughlines to this work, data augmentation, optimization, and equality constraint, and three distinct methodological projects, each of which addresses a subset of these throughlines. For the first project, I apply transfer learning in the analysis of data from a new tellurene experiment (process B) using the established linear regression model from a prior experiment (process A) from a similar study to combine the information from both experiments. The key of this approach is to incorporate the total equivalent amounts (TEA) of a lurking variable (experimental process changes) in terms of an observed (base) factor that appears in both experimental designs into the prespecified linear regression model. The results of the experimental data are presented including the optimal PVP chain length for scaling up production through a larger autoclave size. For the second project, I develop a multi-armed bandit Bayesian optimization (BO) approach to incorporate the equality constraint that comes from a mixture experiment on tellurium nanoproduct and account for factors with categorical levels. A more complex optimization approach was necessitated by the experimenters’ use of a neural network regression model to estimate the response surface. Results are presented on synthetic data to validate the ability of BO to recover the optimal response and its efficiency is compared to Monte Carlo random sampling to understand the level of experimental design complexity at which BO begins to pay off. The third project examines the potential enhancement of parameter estimation by utilizing synthetic data generated through Generative Adversarial Networks (GANs) to augment experimental data coming from a mixture experiment with a small to moderate number of runs. Transfer learning shows high promise for aiding in tellurene experiments, BO’s value increases with the complexity of the experiment, and GANs performed poorly on smaller experiments introducing bias to parameter estimates.</p>

Page generated in 0.0819 seconds