Global ETD Search

1	Statistical Discovery of Biomarkers in Metagenomics Abdul Wahab, Ahmad Hakeem January 2015 (has links) Metagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field. Adaptive Lasso Biomarker Metagenomics Variable Selection Statistics Adaptive Elastic Net
2	Contributions to the Interface between Experimental Design and Machine Learning Lian, Jiayi 31 July 2023 (has links) In data science, machine learning methods, such as deep learning and other AI algorithms, have been widely used in many applications. These machine learning methods often have complicated model structures with a large number of model parameters and a set of hyperparameters. Moreover, these machine learning methods are data-driven in nature. Thus, it is not easy to provide a comprehensive evaluation on the performance of these machine learning methods with respect to the data quality and hyper-parameters of the algorithms. In the statistical literature, design of experiments (DoE) is a set of systematical methods to effectively investigate the effects of input factors for the complex systems. There are few works focusing on the use of DoE methodology for evaluating the quality assurance of AI algorithms, while an AI algorithm is naturally a complex system. An understanding of the quality of Artificial Intelligence (AI) algorithms is important for confidently deploying them in real applications such as cybersecurity, healthcare, and autonomous driving. In this proposal, I aim to develop a set of novel methods on the interface between experimental design and machine learning, providing a systematical framework of using DoE methodology for AI algorithms. This proposal contains six chapters. Chapter 1 provides a general introduction of design of experiments, machine learning, and surrogate modeling. Chapter 2 focuses on investigating the robustness of AI classification algorithms by conducting a comprehensive set of mixture experiments. Chapter 3 proposes a so-called Do-AIQ framework of using DoE for evaluating the AI algorithm’s quality assurance. I establish a design-of-experiment framework to construct an efficient space-filling design in a high-dimensional constraint space and develop an effective surrogate model using additive Gaussian process to enable the quality assessment of AI algorithms. Chapter 4 introduces a framework to generate continual learning (CL) datsets for cybersecurity applications. Chapter 5 presents a variable selection method under cumulative exposure model for time-to-event data with time-varying covariates. Chapter 6 provides the summary of the entire dissertation. / Doctor of Philosophy / Artificial intelligence (AI) techniques, including machine learning and deep learning algorithms, are widely used in various applications in the era of big data. While these algorithms have impressed the public with their remarkable performance, their underlying mechanisms are often highly complex and difficult to interpret. As a result, it becomes challenging to comprehensively evaluate the overall performance and quality of these algorithms. The Design of Experiments (DoE) offers a valuable set of tools for studying and understanding the underlying mechanisms of complex systems, thereby facilitating improvements. DoE has been successfully applied in diverse areas such as manufacturing, agriculture, and healthcare. The use of DoE has played a crucial role in enhancing processes and ensuring high quality. However, there are few works focusing on the use of DoE methodology for evaluating the quality assurance of AI algorithms, where an AI algorithm can be naturally considered as a complex system. This dissertation aims to develop innovative methodologies on the interface between experimental design and machine learning. The research conducted in this dissertation can serve as practical tools to use DoE methodology in the context of AI algorithms.

Search results

Statistical Discovery of Biomarkers in Metagenomics

Contributions to the Interface between Experimental Design and Machine Learning