• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 86
  • 9
  • 9
  • 5
  • 4
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 148
  • 148
  • 39
  • 37
  • 36
  • 22
  • 20
  • 18
  • 18
  • 17
  • 17
  • 17
  • 16
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

ON TWO NEW ESTIMATORS FOR THE CMS THROUGH EXTENSIONS OF OLS

Zhang, Yongxu January 2017 (has links)
As a useful tool for multivariate analysis, sufficient dimension reduction (SDR) aims to reduce the predictor dimensionality while simultaneously keeping the full regression information, or some specific aspects of the regression information, between the response and the predictor. When the goal is to retain the information about the regression mean, the target of the inference is known as the central mean space (CMS). Ordinary least squares (OLS) is a popular estimator of the CMS, but it has the limitation that it can recover at most one direction in the CMS. In this dissertation, we introduce two new estimators of the CMS: the sliced OLS and the hybrid OLS. Both estimators can estimate multiple directions in the CMS. The dissertation is organized as follows. Chapter 1 provides a literature review about basic concepts and some traditional methods in SDR. Motivated from the popular SDR method called sliced inverse regression, sliced OLS is proposed as the first extension of OLS in Chapter 2. The asymptotic properties of sliced OLS, order determination, as well as testing predictor contribution through sliced OLS are studied in Chapter 2 as well. It is well-known that slicing methods such as sliced inverse regression may lead to different results with different number of slices. Chapter 3 proposes hybrid OLS as the second extension. Hybrid OLS shares the benefit of sliced OLS and recovers multiple directions in the CMS. At the same time, hybrid OLS improves over sliced OLS by avoiding slicing. Extensive numerical results are provided to demonstrate the desirable performances of the proposed estimators. We conclude the dissertation with some discussions about the future work in Chapter 4. / Statistics
32

Gradient-Based Sensitivity Analysis with Kernels

Wycoff, Nathan Benjamin 20 August 2021 (has links)
Emulation of computer experiments via surrogate models can be difficult when the number of input parameters determining the simulation grows any greater than a few dozen. In this dissertation, we explore dimension reduction in the context of computer experiments. The active subspace method is a linear dimension reduction technique which uses the gradients of a function to determine important input directions. Unfortunately, we cannot expect to always have access to the gradients of our black-box functions. We thus begin by developing an estimator for the active subspace of a function using kernel methods to indirectly estimate the gradient. We then demonstrate how to deploy the learned input directions to improve the predictive performance of local regression models by ``undoing" the active subspace. Finally, we develop notions of sensitivities which are local to certain parts of the input space, which we then use to develop a Bayesian optimization algorithm which can exploit locally important directions. / Doctor of Philosophy / Increasingly, scientists and engineers developing new understanding or products rely on computers to simulate complex phenomena. Sometimes, these computer programs are so detailed that the amount of time they take to run becomes a serious issue. Surrogate modeling is the problem of trying to predict a computer experiment's result without having to actually run it, on the basis of having observed the behavior of similar simulations. Typically, computer experiments have different settings which induce different behavior. When there are many different settings to tweak, typical surrogate modeling approaches can struggle. In this dissertation, we develop a technique for deciding which input settings, or even which combinations of input settings, we should focus our attention on when trying to predict the output of the computer experiment. We then deploy this technique both to prediction of computer experiment outputs as well as to trying to find which of the input settings yields a particular desired result.
33

Bridging Cognitive Gaps Between User and Model in Interactive Dimension Reduction

Wang, Ming 05 May 2020 (has links)
High-dimensional data is prevalent in all domains but is challenging to explore. Analysis and exploration of high-dimensional data are important for people in numerous fields. To help people explore and understand high-dimensional data, Andromeda, an interactive visual analytics tool, has been developed. However, our analysis uncovered several cognitive gaps relating to the Andromeda system: users do not realize the necessity of explicitly highlighting all the relevant data points; users are not clear about the dimensional information in the Andromeda visualization; and the Andromeda model cannot capture user intentions when constructing and deconstructing clusters. In this study, we designed and implemented solutions to address these gaps. Specifically, for the gap in highlighting all the relevant data points, we introduced a foreground and background view and distance lines. Our user study with a group of undergraduate students revealed that the foreground and background views and distance lines could significantly alleviate the highlighting issue. For the gap in understanding visualization dimensions, we implemented a dimension-assist feature. The results of a second user study with students with various backgrounds suggested that the dimension-assist feature could make it easier for users to find the extremum in one dimension and to describe correlations among multiple dimensions; however, the dimension-assist feature had only a small impact on characterizing the data distribution and assisting users in understanding the meanings of the weighted multidimensional scaling (WMDS) plot axes. Regarding the gap in creating and deconstructing clusters, we implemented a solution utilizing random sampling. A quantitative analysis of the random sampling strategy was performed, and the results demonstrated that the strategy improved Andromeda's capabilities in constructing and deconstructing clusters. We also applied the random sampling to two-point manipulations, making the Andromeda system more flexible and adaptable to differing data exploration tasks. Limitations are discussed, and potential future research directions are identified. / Master of Science / High-dimensional data is the dataset with hundreds or thousands of features. The animal dataset, which has been used in this study, is an example of high-dimensional dataset, since animals can be categorized by a lot of features, such as size, furry, behavior and so on. High-dimensional data is prevalent but difficult for people to analyze. For example, it is hard to find out the similarity among dozens of animals, or to find the relationship between different characterizations of animals. To help people with no statistical knowledge to analyze the high-dimensional dataset, our group developed a web-based visualization software called Andromeda, which can display data as points (such as animal data points) on a screen and allow people to interact with these points to express their similarity by dragging points on the screen (e.g., drag "Lion," "Wolf," and "Killer Whale" together because all three are hunters, forming a cluster of three animals). Therefore, it enables people to interactively analyze the hidden pattern of high-dimensional data. However, we identified several cognitive gaps that have negatively limited Andromeda's effectiveness in helping people understand high-dimensional data. Therefore, in this work, we intended to make improvements to the original Andromeda system to bridge these gaps, including designing new visual features to help people better understand how Andromeda processes and interacts with high-dimensional data and improving the underlying algorithm so that the Andromeda system can better understand people's intension during the data exploration process. We extensively evaluated our designs through both qualitative and quantitative analysis (e.g., user study on both undergraduate and graduate students and statistical testing) on our animal dataset, and the results confirmed that the improved Andromeda system outperformed the original version significantly in a series of high-dimensional data understanding tasks. Finally, the limitations and potential future research directions were discussed.
34

Designing and Evaluating Object-Level Interaction to Support Human-Model Communication in Data Analysis

Self, Jessica Zeitz 09 May 2016 (has links)
High-dimensional data appear in all domains and it is challenging to explore. As the number of dimensions in datasets increases, the harder it becomes to discover patterns and develop insights. Data analysis and exploration is an important skill given the amount of data collection in every field of work. However, learning this skill without an understanding of high-dimensional data is challenging. Users naturally tend to characterize data in simplistic one-dimensional terms using metrics such as mean, median, mode. Real-world data is more complex. To gain the most insight from data, users need to recognize and create high-dimensional arguments. Data exploration methods can encourage thinking beyond traditional one-dimensional insights. Dimension reduction algorithms, such as multidimensional scaling, support data explorations by reducing datasets to two dimensions for visualization. Because these algorithms rely on underlying parameterizations, they may be manipulated to assess the data from multiple perspectives. Manipulating can be difficult for users without a strong knowledge of the underlying algorithms. Visual analytics tools that afford object-level interaction (OLI) allow for generation of more complex insights, despite inexperience with multivariate data or the underlying algorithm. The goal of this research is to develop and test variations on types of interactions for interactive visual analytic systems that enable users to tweak model parameters directly or indirectly so that they may explore high-dimensional data. To study interactive data analysis, we present an interface, Andromeda, that enables non-experts of statistical models to explore domain-specific, high-dimensional data. This application implements interactive weighted multidimensional scaling (WMDS) and allows for both parametric and observation-level interaction to provide in-depth data exploration. We performed multiple user studies to answer how parametric and object-level interaction aid in data analysis. With each study, we found usability issues and then designed solutions for the next study. With each critique we uncovered design principles of effective, interactive, visual analytic tools. The final part of this research presents these principles supported by the results of our multiple informal and formal usability studies. The established design principles focus on human-centered usability for developing interactive visual analytic systems that enable users to analyze high-dimensional data through object-level interaction. / Ph. D.
35

Another Slice of Multivariate Dimension Reduction

Ekblad, Carl January 2022 (has links)
This thesis presents some methods of multivariate dimension reduction, with emphasis on methods guided by the work of R.A. Fisher. Some of the methods presented can be traced back to the 20th century, while some are much more recent. For the more recent methods, additional attention will paid to the foundational underpinnings. The presentation for each of the methods contains a brief introduction of its general philosophy, accompanied by some theorems and ends with the connection to the work of Fisher. / Den här kandidatuppsatsen presenterar ett antal metoder för dimensionsreducering, där betoning läggs på metoder some följer teori utvecklad av R.A. Fisher. En del av metoderna som presenteras utvecklades redan på tidigt 1900-tal, medan andra är utvecklade i närtid. För metoderna utvecklade i närtid, så kommer större vikt läggas vid den grundläggande teorin för metoden. Presentationen av varje metod består av en kortare beskrivning, följt av satser och slutligen beskrivs dess koppling to Fishers teorier.
36

Application of Influence Function in Sufficient Dimension Reduction Models

Shrestha, Prabha 28 September 2020 (has links)
No description available.
37

Predicting reliability in multidisciplinary engineering systems under uncertainty

Hwang, Sungkun 27 May 2016 (has links)
The proposed study develops a framework that can accurately capture and model input and output variables for multidisciplinary systems to mitigate the computational cost when uncertainties are involved. The dimension of the random input variables is reduced depending on the degree of correlation calculated by relative entropy. Feature extraction methods; namely Principal Component Analysis (PCA), the Auto-Encoder (AE) algorithm are developed when the input variables are highly correlated. The Independent Features Test (IndFeaT) is implemented as the feature selection method if the correlation is low to select a critical subset of model features. Moreover, Artificial Neural Network (ANN) including Probabilistic Neural Network (PNN) is integrated into the framework to correctly capture the complex response behavior of the multidisciplinary system with low computational cost. The efficacy of the proposed method is demonstrated with electro-mechanical engineering examples including a solder joint and stretchable patch antenna examples.
38

New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data

Zhao, Shiwen January 2016 (has links)
<p>Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.</p><p>Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.</p> / Dissertation
39

COPS: Cluster optimized proximity scaling

Rusch, Thomas, Mair, Patrick, Hornik, Kurt January 2015 (has links) (PDF)
Proximity scaling methods (e.g., multidimensional scaling) represent objects in a low dimensional configuration so that fitted distances between objects optimally approximate multivariate proximities. Next to finding the optimal configuration the goal is often also to assess groups of objects from the configuration. This can be difficult if the optimal configuration lacks clusteredness (coined c-clusteredness). We present Cluster Optimized Proximity Scaling (COPS), which attempts to solve this problem by finding a configuration that exhibts c-clusteredness. In COPS, a flexible scaling loss function (p-stress) is combined with an index that quantifies c-clusteredness in the solution, the OPTICS Cordillera. We present two variants of combining p-stress and Cordillera, one for finding the configuration directly and one for metaparameter selection for p-stress. The first variant is illustrated by scaling Californian counties with respect to climate change related natural hazards. We identify groups of counties with similar risk profiles and find that counties that are in high risk of drought are socially vulnerable. The second variant is illustrated by finding a clustered nonlinear representation of countries according to their history of banking crises from 1800 to 2010. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods
40

Rigorous justification of Taylor Dispersion via Center Manifold theory

Chaudhary, Osman 10 August 2017 (has links)
Imagine fluid moving through a long pipe or channel, and we inject dye or solute into this pipe. What happens to the dye concentration after a long time? Initially, the dye just moves along downstream with the fluid. However, it is also slowly diffusing down the pipe and towards the edges as well. It turns out that after a long time, the combined effect of transport via the fluid and this slow diffusion results in what is effectively a much more rapid diffusion process, lengthwise down the stream. If 0 <nu << 1 is the slow diffusion coeffcient, then the effective longitudinal diffusion coeffcient is inversely proportional to 1/nu, i.e. much larger. This phenomenon is called Taylor Dispersion, first studied by GI Taylor in the 1950s, and studied subsequently by many authors since, such as Aris, Chatwin, Smith, Roberts, and others. However, none of the approaches used in the past seem to have been mathematically rigorous. I'll propose a dynamical systems explanation of this phenomenon: specifically, I'll explain how one can use a Center Manifold reduction to obtain Taylor Dispersion as the dominant term in the long-time limit, and also explain how this Center Manifold can be used to provide any finite number of correction terms to Taylor Dispersion as well.

Page generated in 0.1301 seconds