Global ETD Search

31	Linear Mixed Model Selection by Partial Correlation Alabiso, Audry 29 April 2020 (has links) No description available. Statistics Linear Mixed Model High Dimensional Variable Selection Partial Faithfulness
32	A Review and Comparative Study on Univariate Feature Selection Techniques Ni, Weizeng January 2012 (has links) No description available. Computer Science univariate feature selection machine learning high-dimensional
33	Improving the Accuracy of Variable Selection Using the Whole Solution Path Liu, Yang 23 July 2015 (has links) No description available. Statistics variable selection high dimensional data SPSP AIS
34	Turbo implementation of high dimensional trellis-coded modulation Wang, Mingjing January 2001 (has links) No description available. Turbo Implementation High Dimensional Modulation Trellis-Coded Modulation
35	Improving Query Performance through Application-Driven Processing and Retrieval Gibas, Michael A. 11 September 2008 (has links) No description available. Computer Science Databases Query Processing Indexing High-Dimensional
36	Matching Based Diversity Modi, Amit 28 July 2011 (has links) No description available. Computer Science
37	Sparse Canonical Correlation Analysis (SCCA): A Comparative Study Pichika, Sathish chandra 04 1900 (has links) <p>Canonical Correlation Analysis (CCA) is one of the multivariate statistical methods that can be used to find relationship between two sets of variables. I highlighted challenges in analyzing high-dimensional data with CCA. Recently, Sparse CCA (SCCA) methods have been proposed to identify sparse linear combinations of two sets of variables with maximal correlation in the context of high-dimensional data. In my thesis, I compared three different SCCA approaches. I evaluated the three approaches as well as the classical CCA on simulated datasets and illustrated the methods with publicly available genomic and proteomic datasets.</p> / Master of Science (MSc) CCA SCCA High-Dimensional Multivariate Analysis Multivariate Analysis
38	TimeLink: Visualizing Diachronic Word Embeddings and Topics Williams, Lemara Faith 11 June 2024 (has links) The task of analyzing a collection of documents generated over time is daunting. A natural way to ease the task is by summarizing documents into the topics that exist within these documents. The temporal aspect of topics can frame relevance based on when topics are introduced and when topics stop being mentioned. It creates trends and patterns that can be traced by individual key terms taken from the corpus. If trends are being established, there must be a way to visualize them through the key terms. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique. However, creating a visual system for terms is not easy. Work has been done to develop word embeddings, allowing researchers to treat words like any number. This makes it possible to create simple charts based on word embeddings like scatter plots. However, these methods are inefficient due to loss of effectiveness with multiple time slices and point overlap. A visualization method that addresses these problems while also visualizing diachronic word embeddings in an interesting way with added semantic meaning is hard to find. These problems are managed through TimeLink. TimeLink is proposed as a dashboard system to help users gain insights from the movement of diachronic word embeddings. It comprises a Sankey diagram showing the path of a selected key term to a cluster in a time period. This local cluster is also mapped to a global topic based on an original corpus of documents from which the key terms are drawn. On the dashboard, different tools are given to users to aid in a focused analysis, such as filtering key terms and emphasizing specific clusters. TimeLink provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. / Master of Science / The task of analyzing documents collected over time is daunting. Grouping documents into topics can help frame relevancy based on when topics are introduced and hampered. The creation of topics also enables the ability to visualize trends and patterns. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique of browsing individual documents. A visualization system for this analysis typically focuses on the terms that affect established topics. Some visualization methods, like scatter plots, implement this but can be inefficient due to loss of effectiveness as more data is introduced. TimeLink is proposed as a dashboard system to aid users in drawing insights from the development of terms over time. In addition to addressing problems in other visualizations, it visualizes the movement of terms intuitively and adds semantic meaning. TimeLink provides insightful visualizations focused on the movement of terms while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. High Dimensional Visualizations Clustering Diachronic Word Embeddings Topic Modeling
39	Robust Feature Screening Procedures for Mixed Type of Data Sun, Jinhui 16 December 2016 (has links) High dimensional data have been frequently collected in many fields of scientific research and technological development. The traditional idea of best subset selection methods, which use penalized L_0 regularization, is computationally too expensive for many modern statistical applications. A large number of variable selection approaches via various forms of penalized least squares or likelihood have been developed to select significant variables and estimate their effects simultaneously in high dimensional statistical inference. However, in modern applications in areas such as genomics and proteomics, ultra-high dimensional data are often collected, where the dimension of data may grow exponentially with the sample size. In such problems, the regularization methods can become computationally unstable or even infeasible. To deal with the ultra-high dimensionality, Fan and Lv (2008) proposed a variable screening procedure via correlation learning to reduce dimensionality in sparse ultra-high dimensional models. Since then many authors further developed the procedure and applied to various statistical models. However, they all focused on single type of predictors, that is, the predictors are either all continuous or all discrete. In practice, we often collect mixed type of data, which contains both continuous and discrete predictors. For example, in genetic studies, we can collect information on both gene expression profiles and single nucleotide polymorphism (SNP) genotypes. Furthermore, outliers are often present in the observations due to experimental errors and other reasons. And the true trend underlying the data might not follow the parametric models assumed in many existing screening procedures. Hence a robust screening procedure against outliers and model misspecification is desired. In my dissertation, I shall propose a robust feature screening procedure for mixed type of data. To gain insights on screening for individual types of data, I first studied feature screening procedures for single type of data in Chapter 2 based on marginal quantities. For each type of data, new feature screening procedures are proposed and simulation studies are performed to compare their performances with existing procedures. The aim is to identify a best robust screening procedure for each type of data. In Chapter 3, I combine these best screening procedures to form the robust feature screening procedure for mixed type of data. Its performance will be assessed by simulation studies. I shall further illustrate the proposed procedure by the analysis of a real example. / Ph. D. / In modern applications in areas such as genomics and proteomics, ultra-high dimensional data are often collected, where the dimension of data may grow exponentially with the sample size. To deal with the ultra-high dimensionality, Fan and Lv (2008) proposed a variable screening procedure via correlation learning to reduce dimensionality in sparse ultra-high dimensional models. Since then many authors further developed the procedure and applied to various statistical models. However, they all focused on single type of predictors, that is, the predictors are either all continuous or all discrete. In practice, we often collect mixed type of data, which contains both continuous and discrete predictors. Furthermore, outliers are often present in the observations due to experimental errors and other reasons. Hence a robust screening procedure against outliers and model misspecification is desired. In my dissertation, I shall propose a robust feature screening procedure for mixed type of data. I first studied feature screening procedures for single type of data based on marginal quantities. For each type of data, new feature screening procedures are proposed and simulation studies are performed to compare their performances with existing procedures. The aim is to identify a best robust screening procedure for each type of data. Then i combined these best screening procedures to form the robust feature screening procedure for mixed type of data. Its performance will be assessed by simulation studies and the analysis of real examples. feature screening mixed type of data
40	Graph-based Modern Nonparametrics For High-dimensional Data Wang, Kaijun January 2019 (has links) Developing nonparametric statistical methods and inference procedures for high-dimensional large data have been a challenging frontier problem of statistics. To attack this problem, in recent years, a clear rising trend has been observed with a radically different viewpoint--``Graph-based Nonparametrics," which is the main research focus of this dissertation. The basic idea consists of two steps: (i) representation step: code the given data using graphs, (ii) analysis step: apply statistical methods on the graph-transformed problem to systematically tackle various types of data structures. Under this general framework, this dissertation develops two major research directions. Chapter 2—based on Mukhopadhyay and Wang (2019a)—introduces a new nonparametric method for high-dimensional k-sample comparison problem that is distribution-free, robust, and continues to work even when the dimension of the data is larger than the sample size. The proposed theory is based on modern LP-nonparametrics tools and unexplored connections with spectral graph theory. The key is to construct a specially-designed weighted graph from the data and to reformulate the k-sample problem into a community detection problem. The procedure is shown to possess various desirable properties along with a characteristic exploratory flavor that has practical consequences. The numerical examples show surprisingly well performance of our method under a broad range of realistic situations. Chapter 3—based on Mukhopadhyay and Wang (2019b)—revisits some foundational questions about network modeling that are still unsolved. In particular, we present unified statistical theory of the fundamental spectral graph methods (e.g., Laplacian, Modularity, Diffusion map, regularized Laplacian, Google PageRank model), which are often viewed as spectral heuristic-based empirical mystery facts. Despite half a century of research, this question has been one of the most formidable open issues, if not the core problem in modern network science. Our approach integrates modern nonparametric statistics, mathematical approximation theory (of integral equations), and computational harmonic analysis in a novel way to develop a theory that unifies and generalizes the existing paradigm. From a practical standpoint, it is shown that this perspective can provide adequate guidance for designing next-generation computational tools for large-scale problems. As an example, we have described the high-dimensional change-point detection problem. Chapter 4 discusses some further extensions and application of our methodologies to regularized spectral clustering and spatial graph regression problems. The dissertation concludes with the a discussion of two important areas of future studies. / Statistics Statistics Distribution-free Methods Graph-based Nonparametrics High-dimensional Exploratory Analysis High-dimensional K-sample Comparison Spectral Graph Analysis

Search results